The Pedometrics Awards committee for the best paper award (Grunwald, McBratney, Oliver, Rossiter, Yang) received only 12 nominations spread over six journals. These were scored by the committee. As per the published procedure, we present for your enjoyment and assessment the top five papers. Reading these papers will bring you up-to-date on some of the most exciting developments in pedometrics published in 2017.
The 2017 award will be presented at the 21st World Congress of Soil Science, Rio de Janeiro, 12—17 August 2018 (see information athttp://www.21wcss.org/).
Please send in your votes for the best paper 2017 by 01-August-2018.
Rank the papers in the “instant runoff” system: first choice, second choice, etc. up till the last paper you are willing to vote for, i.e., the last paper that you think would deserve the award.
Send your votes to David: d.g.rossiter(_at_)cornell.edu
The papers and their abstracts are listed here in order of DOI
- Vaysse, K., & Lagacherie, P. (2017). Using quantile regression forest to estimate uncertainty of digital soil mapping products. Geoderma, 291, 55–64.
Digital Soil Mapping (DSM) products are simplified representations of more complex and partially unknown patterns of soil variations. Therefore, any prediction of a soil property that can be derived from these products has an irreducible uncertainty that needs to be mapped. The objective of this study was to compare the most current DSM method – Regression Kriging (RK) – with a new approach derived from RandomForest – Quantile Regression Forest (QRF) – in regard to their ability of predicting the uncertainties of GlobalSoilMap soil property grids. The comparison was performed for three soil properties, pH, organic carbon and clay content at 5-15 cm depth in a 27,236 km(2) Mediterranean French region with sparse sets of measured soil profiles (1/13.5 km(2)) and for a set of environmental covariates characterizing the relief, climate, geology and land use of the region. Apart from classical performance indicators, comparisons involved accuracy plots and the visual examinations of the uncertainty maps provided by the two methods. The results obtained for the three soil properties showed that QRF provided more accurate and more interpretable predicted patterns of uncertainty than RK did, while having similar performances in predicting soil properties. The use of QRF in operational DSM is therefore recommended, especially when spatial sampling of soil observations are too sparse for applying RK.
- Rossiter, D. G., Zeng, R., & Zhang, G.-L. (2017). Accounting for taxonomic distance in accuracy assessment of soil class predictions. Geoderma, 292, 118–127.
Evaluating the accuracy of allocation to classes in monothetic hierarchical soil classification systems, including the World Reference Base for Soil Classification, US Soil Taxonomy, and Chinese Soil Taxonomy, is poorly-served by binomial methods (correct/incorrect allocation per evaluation observation), since some errors are more serious than others in terms of soil properties, map use, pedogenesis, and ease of mapping. Instead, evaluations should account for the taxonomic distance between classes, expressed as class similarities, giving partial credit to some incorrect allocations. These can then be used in weighted accuracy measures, either direct measures of agreement or measures that account for chance agreement, such as the tau index. Similarities can be determined in one of four ways: (1) by the expert opinion of a soil classification specialist; (2) by the distance between classes in a numerical taxonomy assessment; (3) by distance within a taxonomic hierarchy; or (4) by an error loss function. Expert opinion can be from the point of view of the map user, to assess map utility, or map producer, to assess mapping skill. Examples are given of determining similarity between a subset of Chinese Soil Taxonomy classes by expert opinion and by numerical taxonomy from soil spectra, and then using these for weighted accuracy assessment. A method for assessing the accuracy of probabilistic predictions of several classes at a location is also proposed.
- Angelini, M. E., Heuvelink, G. B. M., & Kempen, B. (2017). Multivariate mapping of soil with structural equation modelling. European Journal of Soil Science, 68(5), 575–591.
In a previous study we introduced structural equation modelling (SEM) for digital soil mapping in the Argentine Pampas. An attractive property of SEM is that it incorporates pedological knowledge explicitly through a mathematical implementation of a conceptual model. Many soil processes operate within the soil profile; therefore, SEM might be suitable for simultaneous prediction of soil properties for multiple soil layers. In this way, relations between soil properties in different horizons can be included that might result in more consistent predictions. The objectives of this study were therefore to apply SEM to multi-layer and multivariate soil mapping, and to test SEM functionality for suggestions to improve the modelling. We applied SEM to model and predict the lateral and vertical distribution of the cation exchange capacity (CEC), organic carbon (OC) and clay content of three major soil horizons, A, B and C, for a 23 000-km2 region in the Argentine Pampas. We developed a conceptual model based on pedological hypotheses. Next, we derived a mathematical model and calibrated it with environmental covariates and soil data from 320 soil profiles. Cross-validation of predicted soil properties showed that SEM explained only marginally more of the variance than a linear regression model. However, assessment of the covariation showed that SEM reproduces the covariance between variables much more accurately than linear regression. We concluded that SEM can be used to predict several soil properties in multiple layers by considering the interrelations between soil properties and layers.
- Hengl, T., Jesus, J. M. de, Heuvelink, G. B. M., Gonzalez, M. R., Kilibarda, M., Blagotić, A., … Kempen, B. (2017). SoilGrids250m: Global gridded soil information based on machine learning. PLOS ONE, 12(2), e0169748.
This paper describes the technical development and accuracy assessment of the most recent and improved version of the SoilGrids system at 250m resolution (June 2016 update). SoilGrids provides global predictions for standard numeric soil properties (organic carbon, bulk density, Cation Exchange Capacity (CEC), pH, soil texture fractions and coarse fragments) at seven standard depths (0, 5, 15, 30, 60, 100 and 200 cm), in addition to predictions of depth to bedrock and distribution of soil classes based on the World Reference Base (WRB) and USDA classification systems (ca. 280 raster layers in total). Predictions were based on ca. 150,000 soil profiles used for training and a stack of 158 remote sensing-based soil covariates (primarily derived from MODIS land products, SRTM DEM derivatives, climatic images and global landform and lithology maps), which were used to fit an ensemble of machine learning methods—random forest and gradient boosting and/or multinomial logistic regression—as implemented in the R packages ranger, xgboost, nnet and caret. The results of 10–fold cross-validation show that the ensemble models explain between 56% (coarse fragments) and 83% (pH) of variation with an overall average of 61%. Improvements in the relative accuracy considering the amount of variation explained, in comparison to the previous version of SoilGrids at 1 km spatial resolution, range from 60 to 230%. Improvements can be attributed to: (1) the use of machine learning instead of linear regression, (2) to considerable investments in preparing finer resolution covariate layers and (3) to insertion of additional soil profiles. Further development of SoilGrids could include refinement of methods to incorporate input uncertainties and derivation of posterior probability distributions (per pixel), and further automation of spatial modeling so that soil maps can be generated for potentially hundreds of soil variables. Another area of future research is the development of methods for multiscale merging of SoilGrids predictions with local and/or national gridded soil products (e.g. up to 50 m spatial resolution) so that increasingly more accurate, complete and consistent global soil information can be produced. SoilGrids are available under the Open Data Base License.
- Somarathna, P. D. S. N., Minasny, B., & Malone, B. P. (2017). More data or a better model? Figuring out what matters most for the spatial prediction of soil carbon. Soil Science Society of America Journal, 81, 1413–1426.
Modeling techniques used in digital soil carbon mapping encompass a variety of algorithms to address spatial prediction problems such as spatial non-stationarity, nonlinearity and multi-colinearity. A given study site can inherit one or more such spatial prediction problems, necessitating the use of a combination of statistical learning algorithms to improve the accuracy of predictions. In addition, the training sample size may affect the accuracy of the model predictions. The effect of varying sample size on model accuracy has not been widely studied in pedometrics. To help fill this gap, we examined the behavior of multiple linear regression (MLR), geographically weighted regression (GWR), linear mixed models (LMMs), Cubist regression trees, quantile regression forests (QRFs), and extreme learning machine regression (ELMR) under varying sample sizes. The results showed that for the study site in the Hunter Valley, Australia, the accuracy of spatial prediction of soil carbon is more sensitive to training sample size compared to the model type used. The prediction accuracy initially increases exponentially with increasing sample size, eventually reaching a plateau. Different models reach their maximum predictive potential at different sample sizes. Furthermore, the uncertainty of model predictions decreases with increasing training sample sizes.