# Best Paper 2008 Nomination

The six finalists for Best paper in Pedometrics 2008, sorted by journal title and reference:

**(1) Brus, D.J., Bogaert, P. and Heuvelink, G.B.M., 2008. Bayesian Maximum Entropy prediction of soil categories using a traditional soil map as soft information. European Journal of Soil Science, 59(2): 166-177.
**

Abstract: Bayesian Maximum Entropy was used to estimate the probabilities of occurrence of soil categories in the Netherlands, and to simulate realizations from the associated multi-point pdf. Besides the hard observations (H) of the categories at 8369 locations, the soil map of the Netherlands 1:50 000 was used as soft information (S). The category with the maximum estimated probability was used as the predicted category. The quality of the resulting BME(HS)-map was compared with that of the BME(H)-map obtained by using only the hard data in BME-estimation, and with the existing soil map. Validation with a probability sample showed that the use of the soft information in BME-estimation leads to a considerable and significant increase of map purity by 15%. This increase of map purity was due to the high purity of the existing soil map (71.3%). The purity of the BME(HS) was only slightly larger than that of the existing soil map. This was due to the small correlation length of the soil categories. The theoretical purity of the BME-maps overestimated the actual map purity, which can be partly explained by the biased estimates of the one-point bivariate probabilities of hard and soft categories of the same label. Part of the hard data is collected to describe characteristic soil profiles of the map units which explains the bias. Therefore, care must be taken when using the purposively selected data in soil information systems for calibrating the probability model. It is concluded that BME is a valuable method for spatial prediction and simulation of soil categories when the number of categories is rather small (say > 10). For larger numbers of categories, the computational burden becomes prohibitive, and large samples are needed for calibration of the probability model.

**(2) Brus, D.J. and Noij, I.G.A.M., 2008. Designing sampling schemes for effect monitoring of nutrient leaching from agricultural soils. European Journal of Soil Science, 59(2): 292-303.
**

Abstract: A general methodology for designing sampling schemes for monitoring is illustrated with a case study aimed at estimating the temporal change of the spatial mean P concentration in the topsoil of an agricultural field after implementation of the remediation measure. A before-after control-impact (BACI) sample-pattern is proposed, with stratified random sampling as a spatial sampling design. The strata are formed as compact blocks of equal area, so that the sample locations cover the field very well. Composite sampling, where the aliquots of a composite come from different strata, is proposed in order to save laboratory costs. The numbers of composites and aliquots per composite are optimized for testing the hypothesis that the mean P concentration didn’t change or has increased. Initially, this is done for a known variogram, temporal correlation, variance of laboratory measurement error, initial mean P concentration, and time needed for fieldwork. The optimal sample size to achieve a power of 0.90 at a 10% decrease of the mean P concentration is six composites of six aliquots each. Next, the effect of uncertainty about these model parameters on the optimal sample size and on the power of the test for a fixed sample size is analyzed. This analysis showed that, to obtain a probability of 95% that the power > 0.90, the sample size must be increased to 7 composites of 10 aliquots each.

**(3) Awiti, A.O., Walsh, M.G., Shepherd, K.D. and Kinyamario, J., 2008. Soil condition classification using infrared spectroscopy: A proposition for assessment of soil condition along a tropical forest-cropland chronosequence. Geoderma, 143(1-2): 73-84.
**

Abstract: Soil fertility depletion in smallholder agricultural systems in sub-Saharan Africa presents a formidable challenge both for food production and environmental sustainability. A critical constraint to managing soils in sub-Saharan Africa is poor targeting of soil management interventions. This is partly due to lack of diagnostic tools for screening soil condition that would lead to a robust and repeatable spatially explicit case definition of poor soil condition. The objectives of this study were to: (i) evaluate the ability of near infrared spectroscopy to detect changes in soil properties across a forest-cropland chronosequence; and (ii) develop a heuristic scheme for the application of infrared spectroscopy as a tool for case definition and diagnostic screening of soil condition for agricultural and environmental management. Soil reflectance was measured for 582 topsoil samples collected from forest-cropland chronosequence age classes namely; forest, recently converted, RC (17 years) and historically converted, HC (ca.70 years). 130 randomly selected samples were used to calibrate soil properties to soil reflectance using partial least-squares regression (PLSR). 64 randomly selected samples were withheld for validation. A proportional odds logistic model was applied to chronosequence age classes and 10 principal components of spectral reflectance to determine three soil condition classes namely; “good”, “average” and “poor” for 194 samples. Discriminant analysis was applied to classify the remaining 388 “unknown” samples into soil condition classes using the 194 samples as a training set. Validation r2 values were: total C, 0.91; total N, 0.90; effective cation exchange capacity (ECEC), 0.90; exchangeable Ca, 0.85; clay content, 0.77; silt content, 0.77 exchangeable Mg, 0.76; soil pH, 0.72; and K, 0.64. A spectral based definition of “good”, “average” and “poor” soil condition classes provided a basis for an explicitly quantitative case definition of poor or degraded soils. Estimates of probabilities of membership of a sample in a spectral soil condition class presents an approach for probabilistic risk-based assessments of soil condition over large spatial scales. The study concludes that reflectance spectroscopy is rapid and offers the possibility for major efficiency and cost saving, permitting spectral case definition to define poor or degraded soils, leading to better targeting of management interventions.

**(4) Grinand, C., Arrouays, D., Laroche, B. and Martin, M.P., 2008. Extrapolating regional soil landscapes from an existing soil map: Sampling intensity, validation procedures, and integration of spatial context. Geoderma, 143(1-2): 180-190.
**

Abstract: This paper aims to investigate the potential of using soil-landscape pattern extracted from a soil map to predict soil distribution at unvisited location. Recent machine learning advances used in previous studies showed that the knowledge embedded within soil units delineated by experts can be retrieved and explicitly formulated from environmental data layers However, the extent to which the models can yield valid prediction has been little studied. Our approach is based on a classification tree analysis which has underwent a recent statistics advance, namely, stochastic gradient boosting. We used an existing soil-landscape map to test our methodology. Explanatory variables included classical terrain factors (elevation, slope, curvature plan and profile, wetness index, etc.), various channels and combinations of channels from LANDSAT ETM imagery, land cover and lithology maps. Overall classification accuracy indexes were calculated under two validation schemes, either taken within the training area or from a separated validation area. We focused our study on the accuracy assessment and testing of two modelling parameters: sampling intensity and spatial context integration. First, we observed strong differences in accuracy between the training area and the extrapolated area. Second, sampling intensity, in proportion to the class extent, did not largely influence the classification accuracy. Spatial context integration by the use of a mean filtering algorithm on explanatory variables increased the Kappa index on the extrapolated area by more than ten points. The best accuracy measurements were obtained for a combination of the raw explanatory dataset with the filtered dataset representing regional trend. However, the predictive capacity of models remained quite low when extrapolated to an independent validation area. Nevertheless, this study offers encouragement for the success of extrapolating soil patterns from existing soil maps to fill the gaps in present soil map coverage and to increase efficiency of ongoing soil survey.

**(5) Lark, R., 2008. Some Results on the Spatial Breakdown Point of Robust Point Estimates of the Variogram. Mathematical Geosciences, 40(7): 729-751.
**

Abstract: The effect of outliers on estimates of the variogram depends on how they are distributed in space. The ‘spatial breakdown point’ is the largest proportion of observations which can be drawn from some arbitrary contaminating process without destroying a robust variogram estimator, when they are arranged in the most damaging spatial pattern. A numerical method is presented to find the spatial breakdown point for any sample array in two dimensions or more. It is shown by means of some examples that such a numerical approach is needed to determine the spatial breakdown point for two or more dimensions, even on a regular square sample grid, since previous conjectures about the spatial breakdown point in two dimensions do not hold. The ‘average spatial breakdown point’ has been used as a basis for practical guidelines on the intensity of contaminating processes that can be tolerated by robust variogram estimators. It is the largest proportion of contaminating observations in a data set such that the breakdown point of the variance estimator used to obtain point estimates of the variogram is not exceeded by the expected proportion of contaminated pairs of observations over any lag. In this paper the behaviour of the average spatial breakdown point is investigated for cases where the contaminating process is spatially dependent. It is shown that in two dimensions the average spatial breakdown point is 0.25. Finally, the ’empirical spatial breakdown point’, a tool for the exploratory analysis of spatial data thought to contain outliers, is introduced and demonstrated using data on metal content in the soils of Sheffield, England. The empirical spatial breakdown point of a particular data set can be used to indicate whether the distribution of possible contaminants is likely to undermine a robust variogram estimator.

**(6) Zimmermann, B., Zehe, E., Hartmann, N.K. and Elsenbeer, H., 2008. Analyzing spatial data: An assessment of assumptions, new methods, and uncertainty using soil hydraulic data. Water Resour. Res., 44: W10418.**

Abstract: Environmental scientists today enjoy an ever-increasing array of geostatistical methods to analyze spatial data. Our objective was to evaluate several of these recent developments in terms of their applicability to real-world data sets of the soil field-saturated hydraulic conductivity (Ks). The intended synthesis comprises exploratory data analyses to check for Gaussian data distribution and stationarity; evaluation of robust variogram estimation requirements; estimation of the covariance parameters by least-squares procedures and (restricted) maximum likelihood; use of the Matern correlation function. We furthermore discuss the spatial prediction uncertainty resulting from the different methods. The log-transformed data showed Gaussian uni- and bivariate distributions, and pronounced trends. Robust estimation techniques were not required, and anisotropic variation was not evident. Restricted maximum likelihood estimation versus the method-of-moments variogram of the residuals accounted for considerable differences in covariance parameters, whereas the Matern and standard models gave very similar results. In the framework of spatial prediction, the parameter differences were mainly reflected in the spatial connectivity of the Ks field. Ignoring the trend component and an arbitrary use of robust estimators would have the most severe consequences in this respect. Our results highlight the superior importance of a thorough exploratory data analysis and proper variogram modeling, and prompt us to encourage restricted maximum likelihood estimation, which is accurate in estimating fixed and random effects.

**Read the full report here**