[Proposed by Gerard Heuvelink]
We make use a lot of machine learning methods to build models that predict soil classes and soil properties. We almost only use these models to make predictions. Have we forgotten that the purpose of modelling is usually twofold: 1) to improve understanding; and 2) to make predictions? So can we use calibrated machine learning models to help us understand why soil varies the way it does? Can we open the black box? If yes, what will we learn? Will it confirm pedological knowledge or will it reveal new insights?
[From Philippe Lagacherie, INRA, France]
I do not expect getting new insights on the soil cover by using machine learning methods at the scale they are usually applied. Machine learning methods make only predictions, often more accurate than those of other models when dealing with large areas that include a lot of soil systems with many drivers acting differently from a place to another. Yes, we need to open the black box, but it is more for checking that the soil predictions match well our current pedological knowledge. If they do, it is enough for making me happy. Any so-called new insight has more chances to be an overfit than anything else. If we want to learn something about soil variations, we should work differently: delineating first well-identified soil systems, collect enough data and use statistical or mechanistic models that are easier to interpret.