his article provides some basic instructions on how to install and do first steps in software the R + Open Source Desktop GIS (OSGeo) + Google Earth bundle that can be used to run analysis of various spatio-temporal data (this article was specifically written for the MS Windows OS users). This bundle allows full GIS+statistics integration and can support practically about 80% of processing/visualization capabilities available in commercial packages such as ArcInfo/Map or Idrisi (read more about the popularity of R). Although R contains an increasing number of packages for processing GIS raster layers e.g. biOps, rimage; raster (see also the list of r-gis projects under development), the real power is in combining R with Open Source GIS such as SAGA GIS and GRASS GIS.
Software such as SAGA GIS allow you to input, edit and visually explore geographical data, before and after the actual statistical analysis. Note that there is a large number of open GIS alternatives and commercial GIS packages that could also be linked to R with equal success. For example, GRASS GIS (versions >6.3 also available for MS Windows machines) is another good candidate for GIS+statistics integration. Within GRASS, you can load R and then use its functionality on the GRASS map layers, or run R on top of GRASS (with the help of the spgrass6 package). Another alternative for GIS+R integration is the QGIS, which has among its main characteristics a python console, and a very elaborated way of adding Python plugins, which is already used for an efficient R plugin (manageR).
To install R under Windows, download and run an installation exe from the R project homepage. This will install R for window with a GUI. After you start R, you will first need to set-up the working directory and install additional packages. To run geostatistical analysis in R, you will need to add the following R packages: gstat (gtat in R), rgdal (GDAL import of GIS layers in R), sp (operations on maps in R), foreign, spatstat (spatial statistics in R) and maptools (see also: a complete list of contributed packages). To install these packages you should do the following. First start the R GUI, then select Packages > Load package from the main menu. Note that, if you wish to install a package on the fly, you will need to select a suitable CRAN mirror from where it will download and unpack a package. You can also install a number of packages used for the spatial analysis by typing e.g.:
If you have problems accessing the web, you can also try downloading all packages of interest from CRAN (see e.g.: windows precompiled binaries) and then install them from the local file. After you install a package, you will still need to load it into your workspace (every time you start R) before you can use its functionality. A package is commonly loaded using e.g.:
First steps in R
R is today identified as one of the fastest growing and most comprehensive statistical computing tools/communities. It practically offers statistical analysis and visualisation of unlimited sophistication. A user is not restricted to a small set of procedures or options, and because of the contributed packages, users are not limited to one method of accomplishing a given computation or graphical presentation (Bivand et al., 2008; Rossiter, 2007; Murrell, 2006). R became attractive for analysis of spatial data mainly due to the recent integration of the geostatistical tools (e.g. gstat, geoR) and tools that allow R computations with spatial data layers (e.g. sp/maptools, rgdal, r-gis). A complete listing of all R packages that can be used to run analysis on spatial data in R is maintained by Roger Bivand. The 'Spatial' packages can be nicely combined with e.g. the Environmentrics packages. The interactive graphics in R is also increasingly powerful.
Although there are few GUI's being developed to run analysis in R (e.g. Rcmdr), R is (and will stay) a command-based software. So, the first step you should consider is to get some initial training on R syntax. You may consult some literature first, but also try to follow 1-2 days training courses. The second useful step to get into R is to try to obtain a more user-friendly R scripting edition such as as TINN-R, Rstudio and/or JGR. You can use this Rprofile.site (copy to your "R/etc/" directory) and this Rconsole setting to produce the same GUI configuration as shown below.
If a single argument in the command is incorrect, inappropriate or mistyped, you will get an error message. If the error message is not helpful, then try receiving more help about some operation by typing e.g.:
commands or via the Html help files. You may also wish to register to the special interest groups such as R-sig-Geo or similar and subscribe to their mailing lists. Gstat-info is the mailing list of the gstat package and also offers many interesting information.
These are some easy to follow R tutorials:
- Simple R
- Quick R
- Short course in R by Paul Geissler (USGS)
- Introduction to R by D.G. Rossiter
- Introductory course on Spatial Data Analysis with R by R. Bivand
To download the file (DEM25m) used in this exercise, we can do:
Next, we can import the ArcInfo ASCII map to R using the rgdal package:
After we attach the correct coordinate system string to this map, we can export it to e.g. ILWIS GIS by using:
A serious limitation of R, considering GIS data, is the memory handling issue. You will soon discover that it is almost impossible to visualize and process raster maps which are >1--5 million pixels (see also the raster project). In addition, R has still limited capabilities for visual data exploration, and almost no capabilities for data input/editing. You can overcome these problems by combining R with the open source GIS, such as SAGA, GRASS and/or ILWIS.
SAGA stands for System for Automated Geoscientific Analyses is a full-fledged GIS, and many of its features have some relation with geomorphometry, which makes it an ideal tool for operational work, but also for GIS training purposes. It is an open-source GIS with support for raster and vector data. It includes a large set of geoscientific algorithms, and is especially powerful for the analysis of DEMs.
SAGA has been under development since 2001 at the University of Göttingen (the SAGA development team, including the key developer Olaf Conrad, has since moved to University of Hamburg), Germany, with aim of simplifying the implementation of new algorithms for spatial data analysis within a framework that immediately allows their operational application. Therefore, SAGA targets not only the pure user but also the developer of geo-scientific methods. SAGA has its roots in DiGeM, a small program specially designed for the extraction of hydrological land-surface parameters. In 2004 most of SAGA's source code was published using an Open Source Software (OSS) license. To learn more about SAGA, try obtaining some of the following references:
- Böhner, J., McCloy, K.R., Strobl, J. (Eds.), 2006. SAGA — Analysis and Modelling Applications. Göttinger Geographische Abhandlungen, Heft 115. Verlag Erich Goltze GmbH, Göttingen, 117 pp.
- Brenning, A. 2008. Statistical geocomputing combining R and SAGA: The example of landslide susceptibility analysis with generalized additive models. In: J. Böhner, T. Blaschke & L. Montanarella (eds.), SAGA - Seconds Out (Hamburger Beiträge zur Physischen Geographie und Landschaftsökologie, 19), 23-32.
- Conrad, O. 2007. SAGA - Entwurf, Funktionsumfang und Anwendung eines Systems für Automatisierte Geowissenschaftliche Analysen. electronic doctoral dissertation, University of Göttingen.
- Olaya, V., Conrad, O., 2008. Geomorphometry in SAGA. In: Hengl, T. and Reuter, H.I. (Eds), Geomorphometry: Concepts, Software, Applications. Developments in Soil Science, vol. 33, Elsevier, 293-308 pp.
SAGA can be obtained from a Source Forge repository; the most recent version is available both for windows and linux operating systems. The fastest installation is that you simply obtained the compiled binaries (e.g. 'saga_2.*.*_*.zip') and unzip them into the "/library/RSAGA/saga_vc/" or to "C:/saga_vc" because this is the default location where RSAGA looks for the binary files. Here you can obtain the list of all SAGA GIS (2.1.2) modules available via the SAGA command line. For a complete description and usage of function arguments see the text file or refer to the SAGA help documentation.
First steps in SAGA
First, you need to load the RSAGA library by:
Note that loading a library you will not be able to run any of the SAGA operations if also SAGA binaries and libraries are not installed on the same machine (SAGA is an external GIS packages and NOT an built-in R package). You can at any time set the location of your SAGA environment by typing:
To find out what a certain module (the best way to browse all possible modules is to open SAGA and look at the left menu tab that shows all module libraries available) in SAGA does, you can execute:
which will show this:
To get further usage of the 3rd function of the grid_gridding module, you can execute:
which then gives the complete list of parameters (obligatory and optional) that need to be set before this command can be executed:
Another place to look for libraries and parameters is the SAGA's homepage or this list of SAGA modules. SAGA also provides an API functionality, so you can access SAGA grids and libraries from e.g. python.
Once you have learned about a specific function, you can run it via the rsaga.geoprocessor. For example to extract a stream network from a DEM, we can run:
which will produce a screen as shown on the right (above). Note that the rsaga.geoprocessor only serves to pass the parameters to SAGA command line (saga_cmd.exe). If you are loading a list of grids to some SAGA process, they need to be written as a list:
the rsaga.geoprocessor would expect grids separated by ";", e.g. GRIDS="DSM1.sgrd;DSM2.sgrd;DSM3.sgrd". Sometimes you might receive an error message or computation will fail, which possibly means a bug in SAGA GIS. Such problems you should report directly via the SAGA user's forum (and not via the R-sig-geo mailing list). Read more on how to extract land-surface parameters and automate analysis of DEMs using SAGA.
Installing GRASS (MS Windows)
GRASS GIS, now one of the eight initial Software Projects of the Open Source Geospatial Foundation, is probably the most known open source GIS software and does need to be specially introduced. Its functionality and usage are described in detail in Neteler et al. (2008); a wiki-based tutorial is available at http://grass.osgeo.org/wiki/. GRASS itself is a collection of environment variables (they vary from version to version), which are populated when GRASS is started --- that is a GRASS shell is started within the running command shell. GRASS is a much larger projected than SAGA considering the number of developers/institutions involved, although their functionality considering the DEM analysis is about similar. The latest version of GRASS (6.3.) contains over 350 routines.
Although originally a Linux-based project, the most recent version of GRASS (6.3; development version) is now also available for MS Windows machines. The coming version 6.4. is expected to be even more stable under Windows. To install GRASS, simply obtain the installation exe and install GRASS to some default location e.g. "C:/GRASS/"
Controlling GRASS from R
The GRASS+R integration requires the spgrass6 package. This needs to be installed with all necessary dependencies:
Next, you need to set the location of your GRASS installation:
This will automatically generate a temporary GRASS gisdbase (and create a temporary file). Note that this database contains no maps. These can be imported using:
To get a description of some GRASS function and parameters connected, you can use e.g.:
Once you have imported a map to GRASS format, you can also open and visualize this DEM using the GRASS GUI. In the next step, we can extract the drainage network:
Finally, we can import the GRASS maps to R using the spgrass6 package:
FWTools (created and maintained by Frank Warmerdam; director and active contributor to OSGeo) is a stand-alone collection of open source tools for processing spatial data. It includes OpenEV, GDAL, MapServer, PROJ.4 and OGDI --- i.e. it contain a number of (highly efficient!) utilities excellent for processing large datasets. Or to quote the author: FWTools is intended to give folks a chance to use the latest and greatest functionality.
Linux or Windows version of FWTools can be obtained from the maptools.org website. After you install the software, you can test it by starting the OpenEV and opening and overlaying some GIS layers. FWTools does not much GUI --- we are mainly interested in using the GDAL utilities. The complete description of the package functionality is available via the GDAL Wiki. There is also the FWTools mailing list.
Running FWTools from R
There is still no package to control FWTools from R, but we can simply send command lines using the system command. Before we can use FWTools from R, we need to locate it on our PC by using e.g.:
Now we can download some GIS data from web:
We can reproject/resample the map to our local coordinate system using the gdalwarp functionality (this combines several processing steps in one function):
In this case we have produced a MODIS-based land cover map for the whole Netherlands in resolution of 500 m (in local coordinate system).
In summary, GDAL utilities and similar functions available via FWTools are extremely compact and efficient, and fit to work with large datasets. Commands such as gdalwarp allow you to combine several GIS operations in one line and hence easily transfer GIS data from one to other format/grid system. The only thing that might slow you down is the time needed to get the commands correct, which can take more than few iterations.