AGENDA
ARTICLES
PUBLICATIONS
Your current location:
Tomislav Hengl
SCIENCE.UVA.NL
MAIL: UvA, Nieuwe Achtergracht 166, 1018 WV Amsterdam, NL
TEL: +31-(0)20-5257379
FAX: +31-(0)20-5257431
DEM parameterization
Fuzzy classification in GIS
Grid size calculator
Regression-kriging
Supervised landform classification
Visualization of uncertainty
Writing Research Articles

Pedometrics.org

ONCE AN ITC STUDENT, ALWAYS AN ITC ALUMNI!

CROATIAN ENGLISH

by Tomislav Hengl

Post-doctoral researcher [SCIENCE.UVA.NL]

A PRACTICAL GUIDE TO REGRESSION-KRIGING (RK)

Geostatistical mapping can be defined as analytical production of maps by using field observations, auxiliary information and a computer program that calculates values at locations of interest. Today, increasingly the heart of a mapping project is, in fact, the computer program that implements some (geo)statistical algorithm to a given point data set. Purpose of this guide is to assist you in producing quality maps by using fully-operational tools, without a need for serious additional investments. It will first introduce you the to the basic principles of geostatistical mapping and regression-kriging, as the key prediction technique, then it will guide you through four software packages: ILWIS GIS, R+gstat, SAGA GIS and Google Earth, which will be used to prepare the data, run analysis and make final layouts. These materials have been used for the five-days advanced training course "Hands-on-geostatistics: merging GIS and spatial statistics", that is regularly organized by the author and collaborators. Visit the course website to obtain a copy of the datasets used in this exercise.

REGRESSION-KRIGING USING A SIMPLE EXAMPLE:

Purpose of this exercise is to help you run Regression-kriging with your own data. The dataset and methodology used is explained in detail in Hengl et al. (2004, Geoderma). The following texts gives you practical instructions how to achieve this in different software packages:

In all cases the same excersise is used - interpolation of sampled measurements of soil thickness using one auxiliary predictor (slope map) from the same study area:


100 measurements of the thickness of topsoil horizon, shape file (2 KB)
500x500 pixels slope map of the same area (399 KB)


Figure: 100 measurements of the topsoil thickness and slope map used as auxiliary predictor.

 

GSTAT
GSTAT (courtesy of Edzer J. Pebesma) is probably the best and fastest choice to run RK with your data. You only need to download the exe file from the www.gstat.org website and put it in some directory e.g. c:\gstat\. It is advasible to set an environmental variable under Windows > Control panel > System > Advanced > Environmental variables > Add new variable 'gstat' with value "c:\gstat\" or wherever your exe file is.

INSTRUCTIONS: 1. Copy the point data file (depth_slope.eas), the auxiliary map (slope.asc) and the gstat command file (depth_uk.cmd) to the gstat directory; 2. Start DOS prompt by typing: "cmd" in the start> run > menu on your desktop; 3. Move to the active directory using cd.. e.g. cd c:/gstat; 4. Run the gstat programme together with a specific gstat command file from the DOS prompt:

c:\gstat\gstat depth_uk.cmd

this will do universal kriging using the 'slope.asc' as the predictor (base) map. Note that the commands in gstat are very simple and straight forward. The results will be writen directly to the asc format so you can visualize the results in a GIS package. You can modify the input parameters, variogram model etc by editing the gstat command file. For example, the following script can be used to run conditional simulations with 'slope.asc' as the predictor (base) map, and this script can be used to automatically fit a variogram (you can then view it using wgnuplot). Note that, for automated modelling of variogram, you will need to define an initial variogram then is then fitted against the true values. Edzer Pebesma suggest use of initial variogram with nugget parameter = 0, sill parameter = sampled variance and range = 10% of spatial extent of the data. Make sure you also consult the gstat package - user's manual available at www.gstat.org. The only problem with gstat is that you need to have regression residuals before you run interpolation, so you will anyway need to do some regression analysis before preparing the gstat command file. Note that the simulations with base maps in GSTAT can be quite time-cosuming.

 

Variogram fitted automatically in gstat (variogram_fit.cmd).

 

 

Conditional (gaussian) simulations using slope as base map (depth_suk.cmd).

 

 
Variance of the prediction error. White indicates high prediction error.
 
Predictions based on regression-kriging (depth_uk.cmd).

 

ILWIS
ILWIS is an afordable and complete GIS shareware (about 100€ per copy). Its advantages are that it can run sets of elementary and advanced operations on both raster and vector data and databases. All this can also be done via the command line; syntax can be further used to build up new sub-programmes and automatize data processing. Although it does not have RK algorithms built-in, data can be interpolated and analysed manually.

RKguide_ilwis.zip - complete directory with data, calculations and results (1 MB)

INSTRUCTIONS: Point data needs to be imported to an ILWIS table with XY and DEPTH coordinates. You first have to estimate the values of predictor (slope) at the sampled locations using command:

slope=MapValue(slope, Coordinate)

You can calculate the regression model in ILWIS using a simple linear model DEPTH = b0 + b1 * SLOPE. You will get b0=25.27 and b1=-0.414 with R=-0.499 (Ordinary Least Squares). This means that the thickness of the top-horizon will decrease as the slopes gets steeper. Note that, in ILWIS, you can not derive the Generalized Least Squares (GLS) regression coefficients but only the OLS coefficients, which is statistically suboptimal. In fact, regression modelling is so limited so we suggest to always export the table data to a statistical package like S-plus or R and then run step-wise regression and other statistical algorithms. After you estimate the regression coefficients, you can fit the trend by:

DEPTH_trnd=25.27-0.414*SLOPE

Then estimate the residuals at sampled locations using:

DEPTH_res=DEPTH-MapValue(DEPTH_trnd,Coordinate)

If you further inspect the residuals, you will see that the variogram of residuals with lag spacing of 2 km, can be fitted with a spherical model (C0=34, C1=72, R=8500). The residuals can now be interpolated using ordinary kriging, which gives typical kriging pattern. The fitted trend and residuals can now be added back together using:

DEPTH_RK=DEPTH_trnd+DEPTH_res_OK

which gives final predictions. Note that there are even negative values in the final prediction map in the areas where the slopes are very steep. This problem can be solved by using the logistic transformation; see Hengl et al. (2004, Geoderma).
*Note that, if you want to export/import raster maps from ILWIS to other packages, the best is if you use the Export > Arc/Info ASCII (.ASC) option. However, the version ILWIS 3.2 will make an *.asc header that needs to be manually edited (simply change words "xllcenter" and "yllcenter" to "xllcorner" and "yllcorner"). Otherwise, you will not be able to import such maps to ArcGIS (8 or higher) or Idrisi Kilimanjaro.

R
R is an open-source environment for statistical computing and visualisation. It is based on the S language and is the product of an active movement among statisticians for a powerful, programmable, portable, and open computing environment, applicable to the most complex and sophsticated problems. R for MS Windows (you will need version 2.0 or higher) you can download and install in few minutes. After the installation, you should also install and load packages gstat and sp. In fact, we will use R only as a shell to run gstat, which is the true state-of-the-art geostatistical program. Note that, in gstat, RK is called Universal kriging (UK), which is a synonim. For detail discussion about the terminology see the technical note. The R code listed down-below was contributed by Edzer Pebesma.

RKguide_R.zip - complete directory with input data and R script (400 KB)

INSTRUCTIONS: In R, raster and vector maps can be imported in two ways: a) point maps you can import as text files, sorted using the X,Y,Z structure:

depth <- read.delim("DEPTH.txt")
summary(depth)

b) raster maps you can import from ArcInfo ASCII format by using the sp package:

slope_r <- read.asciigrid("SLOPE.asc", as.image=FALSE, plot.image=TRUE)

This will import the image and then plot it is a raster. You can also plot the xy points using:

xyplot(X ~ Y, data=depth)

Note that R is not a GIS package so that (geo)visualisation options are very limited (although the new sp package show significant number of new GIS capabilities). Before you start using this data for interpolation, you need to convert it to a dataframe.
*Note that you can also edit the existing data arrays in R using the simple data editor. To interpolate sampled variable DEPTH using KED/UK and SLOPE as predictors, you need to run the following steps:

0. Define the dataframes first:

depth <- read.delim("DEPTH.txt")
summary(depth)

library(sp)
slope = read.asciigrid("slope.asc")
class(slope)
image(slope)
points(Y ~ X, data=depth)

class(depth)
coordinates(depth)=~X+Y
# this makes depth a SpatialPointsDataFrame

1. Estimate the residuals and their autocorrelation structure (variogram):

library(gstat)
vgm <- vgm(38, "Sph", 8500, nugget = 34)
# parameters estimated above
vgm_depth_r = fit.variogram(variogram(DEPTH~slope.asc,depth), model=vgm)

library(lattice)
plot(variogram(DEPTH~slope.asc,depth),model=vgm, main = "fitted by gstat")

2. Compute and plot the regression model:

slope.ov = overlay(slope, depth) # create grid-points overlay
summary(slope.ov)
depth$slope.asc =slope.ov$slope.asc # copy the slope values

plot(DEPTH~slope.asc, as.data.frame(depth))
abline(lm(DEPTH~slope.asc, as.data.frame(depth)))
# fit the variable DEPTH using slope.asc

3. Run KED/UK using the "krige" command:

depth_uk <- krige(DEPTH~slope.asc, depth, slope, vgm_depth_r)


trellis.par.set(sp.theme()) #
for blue-pink-yellow color ramp
spplot(depth_uk, "var1.pred", main="Universal kriging predictions DEPTH")

This will plot final predictions with appropriate legend and title. After you run KED/UK in gstat, you can export the results of interpolation and import them back to a GIS package using:

write.table(depth_uk, file="DEPTH_UK.txt")

this makes an array of grid nodes in ascii format with xy coordinate for each grid node. A smarter way to export the results of prediction is to use write.asciigrid command.

IDRISI
Idrisi is one of the GIS packages with largest numbers of raster and vector operations. The price per a single licence is about $600, but you can always order a 7-day evaluation version to test it. In Idrisi, gstat code has been also adjusted and integrated within.

RKguide_Idrisi.zip - complete directory with data, calculations and results (2.9 MB)

INSTRUCTIONS: First you need to import the two maps using:
a) Import > Software-specific-formats > ESRI formats > ARCRASTER (Arcinfo ASCII format to raster) for the SLOPE.asc map
b) Import > Software-specific-formats > ESRI formats > SHAPEIDR (Shape file to Idrisi ) for the DEPTH.shp map
*You can use the croatia.ref georeference system instead of the "plane" reference system (download to C:\IDRISI Kilimanjaro\Georef\ directory).
You should first display the two maps using the display launcher. You can model variogram in Idrisi using the Spatial dependence modeller. After you derived semivariances, save and load them in the variogram model fitting environment, where you can (interactively) estimate the variogram model parameters. After you finished fitting the variogram, you can save the variogram parameters and load them later to do kriging or simulations.

After you fitted the variogram model, you can interpolate the DEPTH map with SLOPE map as auxiliary predictor by: GIS analysis > Surface analysis > Interpolation > Kriging and simulations. You will then get an user-friendly kriging dialog window where you can define: variogram model, variable to predict, auxiliary maps etc. In IDRISI, only raster input files are allowed for input maps. The first input file contains the sample data, the second, third etc input file is a fully sampled surface that has attribute values for all prediction locations. This means that, before runing KED/UK in Idrisi, you need to convert the point map (depth.vct) to a raster image using the RASTERVECTOR operation. Also note that a local neighborhood must be specified for each input file. Unfortunately, Idrisi help warns that the Spatial Dependence Modeler can not calculate generalized least squares residuals, although this was originally implemented in gstat. Also note that Idrisi uses term "residuals" with somewhat different meaning - the system will not derive residuals for you but you have to do it by yourself. The final predictions are quite similar to the one produced in ILWIS. Here the advantage is that you can also derive map of the prediction error, which shows in which areas the model makes the worse estimates.
Use of gstat within a GIS package is possible also within GRASS and PCRaster (for more info check gstat homepage).

ISATIS
Isatis is the most expensive package (over 10K €) used in this comparison, but is definitively one of the most professional geostatistical packages for environmental sciences. Isatis was originally built for Unix, but there are MS Windows and Linux versions also.

INSTRUCTIONS: In Isatis, you also need to import the point and raster maps first: vector maps can be imported as shape files and raster maps as ArcView ASCII grids (importing/exporting options are limited to standard GIS formats). Note that you first need to define the project name and working directory using the data file manager. After you imported the two maps, you can visualize them using the display launcher.
RK in Isatis is achieved using Interpolation > Estimation > External Drift (Co)-kriging. Here a user needs to select the target variable (point map), predictors and the variogram model. Note that KED in Isatis is limited to only one (three when scripts are used) auxiliary raster map (in Isatis called background variable)! Isatis justifies limitation of number of auxiliary predictors by computational efficiency. In any case, a user can first run factor analysis on multiple predictors and then select the most siginificant component, or simply use the regression estimates as the auxiliary map. Isatis gives option to fit the variogram model automatically, with or without nugget effect. User can also edit the Model Parameter File where the characteristics of the model you wish to apply for kriging is stored. The user manual does not give insight into the algorithm used (except for the mathematical proofs), so I could not tell how exactly are the KED weights estimated and is there any regression modelling involved prior to the interpolation of data. You finally get predictions that are more similar to the ordinary kriging.

SAGA
SAGA (System for Automated Geoscientific Analyses) is a full-fledged GIS with support for raster and vector data, which includes a large set of geoscientific algorithms, being especially powerful for the analysis of DEMs. With the release of version 2.0 in 2005, SAGA works under both Windows and Linux operating systems. In addition, SAGA is an open-source package, which makes it especially attrative to users that would like extend or improve the existing modules. To install SAGA, just download the zip file and extract it under your /Program files/ folder. Regression-kriging is available in SAGA in the module "Geostatistics" under name "Universal kriging". The users can select global and local (search radius) version of the Universal kriging. In this case, I do not recommend the use of local Universal kriging because it over-simplifies the technique and can lead to serious artefacts (neither local variograms nor local regression models are estimated). In SAGA, you can also run multiple regression prediction using grids, then derive the residuals and fit the variogram of residuals.

INSTRUCTIONS: 1. Copy the shape file (depth.zip), the auxiliary map (slope.asc) in a working directory. You can display the two maps over each other. 2. Select Modules > Geostatistics > Kriging > Universal kriging (global) and you will get this menu. Note that you can select as many predictors as you wish, as long as they are all in the same grid system; 3. The final results can be visualized in both 2D and 3D spaces.


Figure: Results of interpolation in SAGA GIS.

 

Other useful links to help you run RK/KED/UK with your data:

 
spatial-analyst.net/RKguide visited:  9380   times
Comment
 

 

 

LAST UPDATE: January 31, 2008
 Total unique visitors: 68860   this month: 1126 #1 hengl by Google © Tomislav Hengl, 1995-2008