GML in RDate:
In this post I explain how to load .gml geographic data in R. Tutorials for spatial analysis in R use mainly ‘ESRI Shapefiles’ (for example in the Introduction to visualising spatial data in R from Robin Lovelace James Cheshire and other). ‘ESRI Shapefiles’ are easy to load with R and are perfect for a first approach. However, GML are more suitable for long term archives and they may play an important role when trying to work directly with archived data. This post aims to provide some help to import gml in R as this file format is not so well documented.
What is .gml?
- an open format
- GML utilises XML to express geographical features (works well with VCS, like Git)
- it is recommended for (long-term) archive
- gdal : the library for reading and writing raster and vector geospatial data formats
- rgdal R bindings to gdal
Get some data (for Linux)
I use the dataset from Palmisano, A :
- (2012) Diachronic and spatial distribution of Khabur ware in the early second millennium BC: http://dx.doi.org/10.5334/data.1334754978
This is how to get the data with the command line and Linux . On other OS, download the data and convert the shapefile to .gml
Get info about your .gml
The first thing to do before trying to import .gml is to get information about your layer. For this example, the geographic file is “2015-10-20–Vector-points.gml”, stored in the directory ../media. On the command line try
or with R command line
This output indicates that there is one layer, named “sites” in the file 2015-10-20–Vector-points.gml.
Import GML in R with readOGR()
The function to import spatial data in R is
The tricky part is, that the loading process changes with type of drivers. Reading ESRI Shapefiles is different from reading GML or GPX
- “dsn” is the path to the file
- “layer” is the name of the layer
For the sake of comparison
reading an ESRI shapefile “Vector-points.shp” with readOGR()
- “dsn” is the path to the file (without the file name)
- “layer” is the name of the shapefile
Things I found confusing
The output of
ogrListLayers depends from data source name, which
interpretation varies by driver. That means it shows just one file even if you
have an ESRI Shapefile and GML in the same directory.
If you call this function within the directory showed above, and use
dsn = "../media", then it shows only the ‘ESRI Shapefile’:
At the beginning, with this output, I thought that I had a problem with my
driver and my gml wasn’t recognised … but changing the
data source name to
dsn = "../media/2015-10-20--Vector-points.gml"
shows that there is a gml file too.
dsn = "../media/2015-10-20--sites.shp" works too
Now you can read every .gml file you want