Spatial prediction of soil moisture

Revision as of 15:51, 24 March 2009 by Hengl (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

his exercise is a part of the module Meetstrategieën (introduction to Environmetrics), which is taught at the 3rd year Earth sciences BSc programme (Bachelor Aardwetenschappen). It demonstrates how to generate maps from sampled point data, interpret the results and run an objective comparison between various geostatistical techniques. The specific objective was to produce a map of the soil moisture content for a given study area, and analyze how does the soil moisture changes in relation to land cover, elevation and surface brightness. The field data (88 points) was collected by students (18), that were split into groups. The study area is a regular block of 525x605 meters in the Waterleidingduinen park (see the map of the area). The soil moisture in the area is mainly controlled by the organic matter content in the top-soil. The texture of the soils is relatively homogenous (>60% of sand).

This article reviews some study questions and suggests ways to analyze/visualize this data and make interpretation. We focus on geostatistical mapping and assessment of the accuracy of the interpolation techniques: inverse distance interpolation, ordinary kriging and regression-kriging. To run the analysis, you only need to (1) install R with all necessary packages; (2) obtain this R script, and then run the exercise line-by-line. The sample codes shown below are only used for illustration. To understand the R syntax, consider obtaining some of the R manuals.

Spatial prediction of soil moisture; a study area in the Waterleidingduinen.
  • Icon R.png wduinen.R : R script to compare various spatial predictors and generate BCSP.
  • Icon zip.png : Auxiliary maps used to interpolate soil moisture (elevation, land cover and surface brightness).


Sampling plan and the fieldwork

Students were split in three groups, following the sampling designs shown below (generated in R using the spsample method). Each group received a GPS received with cca. 30 points, and then navigated to the sampling locations. At each location, groups made 5 measurements of soil moisture (SM), by randomly allocating measurements within 5x5 m blocks of land.

Figure: Three sampling designs used to collect measurements of soil moisture (left) random sampling, (center) grid sampling and (right) adaptive sampling.

For measurements of the soil moisture, we used the Theta Probe type ML2X device, which measures the volumetric soil moisture content (the number shown below refer to a uncorrected dimensionless parameter - a ratio m3 m-3). For more information on how to estimate the corrected volumetric soil moisture content, see the Theta Probe ML2X manual.

Figure: Students doing measurements on the field.
Figure: Theta-probe instrument used to make measurements of soil moisture.

We use three auxiliary maps (all at resolution of 5 m) to improve the spatial prediction of soil moisture: (1) dtm.asc a LiDAR-based digital elevation model (see AHN project), (2) geimg.asc the brightness values of a fine-resolution satellite image of the area, and (3) landcov.asc a map showing the land cover classes - forests, grasslands, water bodies, shrubs and roads. These maps can be obtained from the zipped file.

Data import and preliminary analysis

We start by reading the table (wduinen.txt) and converting it to a point map:

> wduinen <- read.delim("wduinen.txt")
> str(wduinen)
'data.frame':   88 obs. of  11 variables:
 $ Code     : Factor w/ 88 levels "C01","C02","C03",..: 1 2 3 4 5 ...
 $ Latitude : num  52.4 52.4 52.4 52.4 52.4 ...
 $ Longitude: num  4.55 4.55 4.55 4.55 4.55 ...
 $ SM1      : num  0.84 0.794 0.854 0.521 0.944 0.888 0.823  ...
 $ SM2      : num  0.968 0.522 0.783 0.703 0.983 ...
 $ SM3      : num  0.898 0.68 0.834 0.688 0.899 0.725 0.687  ...
 $ SM4      : num  0.89 0.721 0.816 0.861 0.855 0.732 0.97  ...
 $ SM5      : num  0.808 0.705 0.965 0.884 0.886 0.752 0.743  ...
 $ SM       : num  0.881 0.684 0.850 0.731 0.913 ...
 $ sd       : num  0.0611 0.1002 0.0692 0.1475 0.0504 ...
 $ Lux      : num  270 805 806 830 220 308 ...

This shows that the table consists of 11 columns and 88 observations. SM1-5 stands for the measurements of soil moisture at the point support, SM is the averaged value (block support), sd is the standard deviation within the block and Lux are the measurements of the illuminance using Lux-meter.

To estimate the global variance of SM1 versus SM we use:

> var(wduinen$SM1); var(wduinen$SM)
[1] 0.03110635
[1] 0.02096423

which shows that the variance at the point support is about 48% higher than for the block support data.

Next, we convert this table to a point map. We also want to work with the local coordinates (NL coordinate system), so we run:

> coordinates(wduinen) <- ~Longitude+Latitude
> proj4string(wduinen) <- CRS("+proj=longlat +ellps=WGS84 +towgs84=0,0,0")
# project to the Dutch coordinate system:
> <- spTransform(wduinen, CRS("+proj=sterea +lat_0=52.15616055555555 
+lon_0=5.38763888888889 +k=0.999908 +x_0=155000 +y_0=463000 +ellps=bessel 
+towgs84=565.237,50.0087,465.658,-0.406857,0.350733,-1.87035,4.0812 +units=m 
# plot the points:
> bubble(, "SM", scales=list(draw=T, cex=.8))

Figure: Bubble plot of sampled SM.
Figure: Gridded maps imported using adehabitat package.

We also need to import the gridded maps (using GDAL):

gridmaps <- readGDAL("dtm.asc")
names(gridmaps) <- "dtm"
gridmaps$landcov <- as.factor(readGDAL("landcov.asc")$band1)
gridmaps$geimg <- readGDAL("geimg.asc")$band1
proj4string(gridmaps) <- CRS(proj4string(

We now have all maps necessary to run the geostatistical analysis.

Analysis of spatial auto-correlation (variogram)

Before we can run any geostatistical interpolation, we need to analyze the spatial auto-correlation of the target variable. To fit the variogram for SM variable, we can use the gstat function fit.variogram:

# fit the variogram:
> <- fit.variogram(variogram(SM~1,, vgm(0, "Exp", 100,
+      sill=var($SM)))
> plot(variogram(SM~1,,
> plot(variogram(SM1~1,,
Figure: Variogram fitted for the SM-data sampled at block support.
Figure: The variogram from the left image, and the SM-data sampled at point support.

As you can notice from the plots above, the nugget variation at the point support is significant, while with the block support the nugget variation has been smoothed out. The variable is, in both cases, auto-correlated up to the distance of about 100 m:

[1] 103.4554

Inverse distance interpolation vs Ordinary Kriging

We are first interested to see if ordinary kriging interpolation is significantly more accurate than inverse distance interpolation. We can run both interpolations and then compare the maps visually:

> SM.ok <- krige(SM~1,, gridmaps,   # Ordinary kriging;
> SM.ok$ <- krige(SM~1,, gridmaps)$var1.pred   # ID;
> spplot(SM.ok[c("var1.pred", "")], scales=list(draw=T, cex=.7), 
+      col.regions=bpy.colors(30), sp.layout=list("sp.points",, 
+      pch="+", col="black"))
Figure: Inverse distance interpolation as compared to ordinary kriging of SM.

Visually, there seems to be only little difference between the two maps. To make this comparison statistically sound, we will also run the leave-one-out cross validation using all points:

> <-,
[inverse distance weighted interpolation]
> <-,,
[using ordinary kriging]
> var($residual, na.rm=T); var($residual, na.rm=T)
[1] 0.01759620
[1] 0.01754314

The difference between the RMSE at validation points is very small. Just to be sure that there is no statistical difference, we can run a t-test (assuming a normal distribution of the errors) on the two groups of validation errors:

> t.test($residual,$residual)

        Welch Two Sample t-test

data:$residual and$residual
t = -0.2729, df = 174, p-value = 0.7852
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.04489375  0.03398585
sample estimates:
   mean of x    mean of y
-0.009883239 -0.004429287

The null-hypothesis that the difference between the two samples equals 0 is accepted, hence there is no significant difference between the prediction power of the inverse distance interpolation and ordinary kriging.

Regression analysis and regression-kriging

In the next steps we will try to explain variability of the SM values using the auxiliary maps. Before we can fit a regression model, we need to estimate the values of predictors at sampling locations:

> wduinen.ov <- overlay(gridmaps,
# copy the values:
>$landcov <- wduinen.ov$landcov  
>$dtm <- wduinen.ov$dtm
>$geimg <- wduinen.ov$geimg

and then fit a regression model (we use the step-wise regression fitting to exclude the predictors that are not significant):

> SM.lm <- lm(SM~landcov+dtm+geimg,
> SM.lms <- step(SM.lm)
> summary(SM.lms)

lm(formula = SM ~ dtm + geimg, data =

     Min       1Q   Median       3Q      Max
-0.45283 -0.08265  0.03485  0.08707  0.26354

              Estimate Std. Error t value
(Intercept)  1.080e+00  8.963e-02  12.047
dtm         -1.590e-04  9.248e-05  -1.719
geimg       -2.598e-03  5.651e-04  -4.598
(Intercept)  < 2e-16 ***
dtm           0.0893 .
geimg       1.49e-05 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1294 on 84 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared: 0.2207,     Adjusted R-squared: 0.2022
F-statistic:  11.9 on 2 and 84 DF,  p-value: 2.826e-05

This shows that only dtm and geimg are predictors that are significantly correlated with SM (they explain about 20% of total variability in SM). The landcov is practically of no use for the interpolation of SM.

Before we can run regression-kriging, we still need to fit a variogram for the residuals:

# residuals:
> sel <- as.numeric(attr(residuals(SM.lms), "names"))  # mask the NA values!
> <- fit.variogram(variogram(residuals(SM.lms)~1,[sel,]), 
+      vgm(0, "Exp", 100, sill=var($SM)))
> plot(variogram(residuals(SM.lms)~1,[sel,]),

which now has a smaller sill and range parameters that in the case of

We have no estimated all the model parameters (significant predictors, variogram for residuals) and can proceed with making predictions. To predict a value at a single point location we can run:

> new.point <- data.frame(X=98126, Y=485863, SM=NA)
> coordinates(new.point) <-~X+Y
> proj4string(new.point) <- CRS(proj4string(
> new.point.ov <- overlay(gridmaps, new.point)
> new.point@data <- merge(new.point.ov@data, new.point@data)
> krige(SM~geimg+dtm,[sel,], new.point,
[using universal kriging]
      coordinates var1.pred   var1.var
1 (98126, 485863) 0.3238555 0.01148894

The regression-kriging variance is 0.011, which is still lower than the global variance (55% of the global variance in fact), hence the predictions are relatively precise.

We can now generate maps of soil moisture using the same model and compare them with the results of ordinary kriging:

> SM.rk <- krige(SM~geimg+dtm,[sel,], gridmaps,
> SM.rk$SM.ok <- SM.ok$var1.pred
> spplot(SM.rk[c("var1.pred", "SM.ok")], scales=list(draw=T, cex=.7), 
+       col.regions=bpy.colors(30), sp.layout=list("sp.points",,
+       pch="+", col="black"))
Figure: Regression-kriging vs ordinary kriging.

Again, we also want to do a statistical cross-validation and then objectively compare the two techniques:

> <-,[sel,],
[using universal kriging]
> 1-var($residual, na.rm=T)/var($SM)
[1] 0.1631869
> 1-var($residual, na.rm=T)/var($SM)
[1] 0.2958105

which shows that regression-kriging explains about 30% of variation, as compared to ordinary kriging that explains only 16%. This is also visible from the regresion-kriging map that now nicely shows areas of bare soil (open sand dunes), which are commonly areas of very low soil moisture content.

How to improve the maps we make and explain even higher percent of variation? There are two possibilities: (1) to consider using additional predictors (e.g. soil parameters from a soil map), (2) to collect more samples in the critical areas - the areas of high prediction error (in this case these are higher elevations where we had no points).

Figure: Regression-kriging variance.

Optional: you can generate geostatistical simulations (equiprobable realization) using the same RK model:

> SM.rk$sim1 <- krige(SM~geimg+dtm,[sel,], nsim=1, gridmaps,, nmax=30)$sim1
> spplot(SM.rk[c("sim1", "SM.ok")], scales=list(draw=T, cex=.7), 
+     col.regions=bpy.colors(30), sp.layout=list("sp.points",, 
+     pch="+", col="black"))
Figure: Regression-kriging - one simulation.

This shows a more realistic picture of how this variable really varies in the space. Note that, even the simulations look noisier than the original RK prediction map, the variogram of the map shown above is in fact equal to the variogram we fitted. To prove this we can randomly allocate a same number of points and then re-estimate the variogram:

> rsample <- spsample(SM.rk["sim1"], type="random", length($SM))
> rsmaple.ov <- overlay(SM.rk["sim1"], rsample)
> plot(variogram(sim1~1, rsmaple.ov),

As you will see, the simulated values have the same spatial auto-correlation structure. This proves that the simulations are accurately presenting the feature of interest.


Personal tools