Grid size calculator
lthough much has been publish on the impact of pixel size on geographic modelling, grid cell size selection is seldom based on the inherent spatial variability of the data. In fact, in most GIS projects, grid resolution is selected without any scientific justification. In ArcGIS package, for example, the default output cell size is suggested by the system using some trivial rule: in the case the point data is being interpolated in spatial analyst, the system will take the shortest side of the study area and divide by 250 to estimate the cell size. Obviously, such pragmatic rules do not have a sound scientific background in cartography nor in geoinformation science.

This motivated me to produce methodological guides to select a suitable grid resolution for output maps and based on the inherent properties of the input data. I tried to relate the choice of grid resolution to the concrete cartographic and statistical concepts, namely scale, processing power, positional accuracy, inspection density, spatial correlation and complexity of terrain. Here you can obtain an R script with processing steps explained in detail. For more details see:
 Hengl T., 2006. Finding the right pixel size. Computers and Geosciences, 32(9): 12831298.
Contents 
Examples:
Selection of the cell size based on the GPS positioning accuracy
In this example 100 positioning fixes were recorded using the singlefix GPS positioning method at the control point with a known location (Xt=6535950, Yt=5066581.48). The fluctuation of the GPS readings can be seen in figure. The errors ranged from 0.7 to 23.9 m, average error was 8.5 m with a standard distribution of 5.2 m. The error vectors seems to follow the lognormal distribution. The theoretical distribution gave the 95% probability radius of 19.1 m, while the experimental distribution shows somewhat higher value (20.4). A suitable grid resolution for this case is 34.4 m. If this grid resolution is selected, most (95%) of GPS fixes will fall within the right pixels. This number for example corresponds to the resolution of the Landsat imagery. More accurate positioning method would be needed to locate points within the finer grid resolutions. If we would like to use a GPS positioning with grid resolutions of about 15 m, then we would need to use GPS positioning with averaging (5 minutes per point). Higher positional accuracy (520 times) can be achieved by using differential correction or WAAS (Wide Area Augmentation System), which can improve accuracy to less than three meters on average. Such accuracy would be compatible with grid resolutions within the range 210 m (SPOT or IKONOS imagery).
Selection of the cell size based on the size of agricultural plots
In this example I used an existing polygon map of agricultural fields (figure). The map consists of 121 polygons in total. The smallest polygon is 0.005 ha, the biggest is 6.903 ha, average size of polygons is 0.824 ha with standard deviation of 1.005 ha. The polygons were then separated into two groups according to the shape complexity index. In this case only 6 polygons classified for narrow polygons. For each of these, an average width has been estimated by taking regular measurements (10 per polygon). I further on derived the 5% inverse cumulative distribution value assuming the lognormal distribution. I got 0.046 ha, which means that the pixel size should be about 20 m. The coarsest legible grid size for this data set (P=50%) would be 70 m (A=0.5 ha). If resolutions coarser than 70 m are used to monitor agriculture for this area, then in more than 50% of the areas there will be less than four pixels per polygon. Note that in this case we are not using the true smallest polygon size but somewhat bigger figure (0.046 ha) because the smallest value (0.005 ha) is not representative. The further inspection of the widths showed that the average width of the narrow polygons is about 16 m, which gives a somewhat more strict grid resolution of about 8 m. However, the narrow polygons occupy only 0.9% percents of the the total study area, so we do not have to be as strict. Finally, I would recommend that the satellite imagery in range from 10 to 70 m can be used to monitor agriculture for this study area.
Selection of the cell size for interpolation purposes
In this example I will use the Wesepe point data previously used in numerous soil mapping applications (De Gruijter et al., 1997). The dataset consist of 552 profile observations where various soil variables have been described. The target variable is the membership value to enk earth soil type. The values range from 0 to 1, with an average of 0.232 and a standard deviation of 0.322. The total size of the area is 12.1 km^2, which gives a sampling density of about 45 observations per km^2, which corresponds to the scale of 1:25K, i.e. grid resolution of 12 m. If we inspect the spreading of the points, we see that the average spacing between the closest point pairs is about 120 m, which is fairly close to the regular point sampling (for this data set  148 m). The cumulative distribution showed that 95% of points are at distances of 5 and more meters. This means that the legible grid resolutions are between 5 and 150 m. Automated fitting of the variogram using a global model further gave a nugget parameter (C0) of 0.042, a sill (C0 + C1) of 0.097 and a range parameter (R) of 175.2 m, which means that the variable is correlated up to the distance of about 525 m (h_R). There are 11807 point pairs within this range, which finally means that the optimal lag/grid size would be about 23 m. The pattern analysis of the point data set further shows that there is clear regularity in the point geometry: most of the distances are grouped at 180 m.
Selection of cell size for geomorphometric analysis
In this case study I will demonstrate how a grid resolution can be selected from a map of contours, i.e. a dataset consisting of lines digitized from a topomap. Contour lines were extracted from the 1:50K topomap, with the contour interval of 10 m and supplementary 5 m contours in areas of low relief. The total area is 13.69 km^2 and elevations range from 80 to 240 m. There were 127.6 km of contour lines in total, which means that the average spacing between the contours is 107 m. The grid resolution should be at least 53.5 meters to present the most of the mapped changes in relief. I then derived the distance from the contours map using the 5 m grid and displayed the histogram of the distances to derive the 5% probability distance. Absolutely shortest distance between the contours is 7 meters, and the 5% probability distance is 12.0 m. Finally, I can conclude that the legible resolution for this data set is within range 12.053.5 m. Finer resolutions than 12 m are unnecessary for the given complexity of terrain. Note that selection of the most suitable grid resolution based on the contour maps is scale dependant. For the contour lines digitized from the 1:5K topo maps, the average spacing between the lines is 26.6 m and the 5% probability distance is 1.6 m. This means that, at 1:5K scale, the recommended resolutions are between 1.6 and 13.3 m.
Other interesting issues:
1. When does interpolation becomes downscaling (disaggregation)?
ANSWER: Interpolation becomes downscaling once the grid resolution in more than 50% of the area is finer than it should be for the given sampling density. For example, in soil mapping, one point sample should cover 160 pixels. If we have 100 samples and the size of the area is 10 km2, then it is valid to map soil variables at resolutions of 25 m (maximum 10 m) or coarser. Note that downscaling is only valid if we have some auxiliary data (e.g. digital elevation model) which is of finer resolution than the effective grid resolution and which is highly correlated with the variable of interest (in statistics  'target variable'). In the case from above, to produce maps of resolution of finer than 10 m and without help from significant auxiliary predictors would mean that we are not respecting the soil survey standards.
2. How to select a grid resolution for DEM if the elevation data is collected densely over the area of interest (e.g. GPS height measurements from precision agriculture)?
ANSWER: There are two possibilities. One is to derive a variogram for sampled elevations, then estimate the distance at which are values still autocorrelated (range) and then estimate the bin size following the formula of Izenman (1991). An alternative is to first derive contour lines from the elevation measurements (e. g. in surfer or using the akima package in R). Note that, before deriving the contours, you will have to fit the surface through the measured elevations (e. g. use the splines with tension). This is a sample script to automatically estimate grid cell size using a given point data set (elevations):
> library(maptools) > elevations < readShapePoints("elevations.shp", proj4string=CRS(as.character(NA\))) > library(akima) > elevations.interp < interp(x=elevations@coords[,1], y=elevations@coords[,2], + z=elevations$VALUE, extrap=T, linear=F) # package akima will automatically estimate the initial grid size for you! > image(elevations.interp, col=bpy. colors(), asp=1) # but this is a rather naive approach  it takes x width and devides by 40! > dem.area < (elevations@bbox[1,2]elevations@bbox[1,1])*(elevations@bbox[2,2]elevations@bbox[2,1]) > bin.VALUE < (max(elevations$VALUE)min(elevations$VALUE)) * (length(elevations$VALUE))^(1/3) # for beginning, we take a cell size that corresponds to the effective scale: > dem.pixelsize < 25 > z.contours < ContourLines2SLDF(contourLines(elevations.interp, + nlevels=(max(elevations$VALUE)min(elevations$VALUE))/bin.VALUE)) > writeOGR(z.contours, "z_contours.shp", "contours", driver="ESRI Shapefile") # estimate the buffer distance between the contour lines > library(RSAGA) # Download SAGA GIS from www.sagagis.org and unzip the binaries to: > rsaga.env(path="C:/Program Files/saga_vc") # first, convert the contour map to a raster map: > rsaga.geoprocessor(lib="grid_gridding", module=3, param=list(GRID="contours_buff.sgrd", + INPUT="z_contours.shp", FIELD=1, LINE_TYPE=0, USER_CELL_SIZE=dem.pixelsize, + USER_X_EXTENT_MIN=elevations@bbox[1,1], USER_X_EXTENT_MAX=elevations@bbox[1,2], + USER_Y_EXTENT_MIN=elevations@bbox[2,1], USER_Y_EXTENT_MAX=elevations@bbox[2,2])) # now extract a buffer distance map (for 'contours') and load it back to R: # the parameters DIST and IVAL are estimated based on the grid properties: > rsaga.geoprocessor(lib="grid_tools", module=10, param=list(SOURCE="contours_buff.sgrd", + DISTANCE="contours_dist.sgrd", ALLOC="contours_alloc.sgrd", BUFFER="contours_bdist.sgrd", + DIST=sqrt(dem.area)/3, IVAL=dem.pixelsize)) > rsaga.sgrd.to.esri(in.sgrds="contours_dist.sgrd", out.grids="contours_dist.asc", + out.path=getwd(), prec=1) > contours.dist < readGDAL("contours_dist.asc") > spplot(contours.dist[1], col.regions=bpy.colors(), scales=list(draw=TRUE\), + sp.layout=list("sp.lines", col="cyan", z.contours)) > hist(contours.dist$band1, col="grey") > contours.pixsize1 < quantile(contours.dist$band1, probs=0.5)
3. What is a statistically sound approach to select a grid cell size based on the accuracy of DEMderivatives?
An iterative DEM cellsize optimization algorithm is implemented in the ANUDEM package, following the method of Hutchinson (1996). By plotting the RMS slope versus the grid spacing index, one can select the grid cell size that shows the maximum information content in the slope (relief) map. The optimal grid cell size is the one where further refinement does not change the slope map.
We can implement a similar principle in R, e.g. derive SLOPE maps using several grid cell sizes and then see if they differ significantly. We can start with generating DEMs from point data (e.g. using splines):
> pixel.range < c(20, 30, 40, 50, 60, 80, 100) # generate DEMs using splines: > for(i in 1:length(pixel.range)){ rsaga.geoprocessor(lib="grid_spline", module=1, param=list(GRID=paste("DEM", + pixel.range[i], ".sgrd", sep=""), SHAPES="elevations.shp", FIELD=1, + RADIUS=sqrt(dem.area)*1000/3, SELECT=1, MAXPOINTS=10, TARGET=0, + USER_CELL_SIZE=pixel.range[i], USER_X_EXTENT_MIN=contours.5@bbox[1,1], + USER_X_EXTENT_MAX=contours.5@bbox[1,2], USER_Y_EXTENT_MIN=contours.5@bbox[2,1], + USER_Y_EXTENT_MAX=contours.5@bbox[2,2])) > }
For each DEM, we derive a SLOPE map:
> for(i in 1:length(pixel.range)){ rsaga.slope(paste("DEM", pixel.range[i], ".sgrd", sep=""), + paste("SLOPE", pixel.range[i], ".sgrd", sep=""), method = "poly2zevenbergen") rsaga.sgrd.to.esri(in.sgrds=paste("SLOPE", pixel.range[i], ".sgrd", sep=""), + out.grids=paste("SLOPE", pixel.range[i], ".asc", sep=""), prec=3, out.path=getwd()) > }
We can now compare the slope maps (whether they differ significantly):
> SLOPE < readGDAL(paste("SLOPE", pixel.range[1], ".asc", sep="")) > cor.SLOPE < rep(NA\, length(pixel.range)) > cor.SLOPE[1] < 1.0 > for(i in 2:length(pixel.range)){ assign(paste("SLOPE", pixel.range[i], sep=""), + readGDAL(paste("SLOPE", pixel.range[i], ".asc", sep=""))) SLOPE.s < as(get(paste("SLOPE", pixel.range[i], sep="")), "SpatialPointsDataFrame") slope.ov < overlay(SLOPE, SLOPE.s) slope.ov$SLOPE.s < SLOPE.s$band1 # logtranform for SLOPE: cor.SLOPE[i] < cor(log1p(slope.ov$SLOPE.s), log1p(slope.ov$band1)) > }
As expected, the difference between the finest resolution SLOPE and coarser resolution SLOPEs will slowly increase (there seems to be no inflection point). From the plot shown below, we can conclude that a resolution of 45 m is still highly similar (Rsquare >95%) as compared to the 20meter SLOPE map. A similar type of analysis can be also found in this article.
Selected references
 Atkinson, P.M. and Aplin, P., 2004. Spatial variation in land cover and choice of spatial resolution for remote sensing. Photogrammetric Engineering and Remote Sensing, 25(18): 36873702.
 Bishop, T.F.A., McBratney, A.B. and Whelan, B.M., 2001. Measuring the quality of digital soil maps using information criteria. Geoderma, 103(12): 95111.
 Dungan, J.L. et al., 2002. A balanced view of scale in spatial statistical analysis. Ecography, 25: 626640.
 Florinsky, I.V. and Kuryakova, G.A., 2000. Determination of grid size for digital terrain modelling in landscape investigations – exemplified by soil moisture distribution at a microscale. International Journal of Geographical Information Science, 14(8): 815–832.
 Heuvelink, G.B.M. and Pebesma, E.J., 1999. Spatial aggregation and soil process modelling. Geoderma, 89(12): 4765.
 Hutchinson, M.F., 1996. A locally adaptive approach to the interpolation of digital elevation models. In: Proceedings of the Third International Conference/Workshop on Integrating GIS and Environmental Modeling. National Center for Geographic Information and Analysis, Santa Barbara, CA, 6 pp.
 McBratney, A.B., 1998. Some considerations on methods for spatially aggregating and disaggregating soil information. Nutrient Cycling in Agroecosystems, 50: 5162.
<Rating> Rate this article: 1 Poor and in error 2 Far from ready 3 Needs improvements 4 Minor corrections 5 Very useful!! </Rating>