Dark | Light
Email Me , μBlog me
or follow the feed  


Today I came across a problem that I already encounter at diverse occasions, so I decided to make a post to look at how to solve it.

In many cases, archaeologists, including myself (and the following example comes from my PhD), do manual work to digitise their data. Often this means drawing polygons for a GIS. For example, you can redraw old plans of a building (or its foundations) with polygons. Now in different GIS software (not all) it is easy to create “invalid topology”, i.e. polygons that intersects themselves, like in this figure:

An invalide topology

This is an example from my dissertation, the outline of the remains of a building (Building 79, Middle Bronze Age), and you guess that it is not what I wanted. There is a topological error on the upper corner. Yes, the small thing in the red circle indicating that lines are crossing each other. Not really how the outline of a wall should be, now or 4000 years ago. It wasn’t a big problem when I created the shapefile (a long time ago), because the software didn’t complain, but I am discovering it now that I want to do some spatial analysis and I got this error message

TopologyException: Input geom 0 is invalid: 
Self-intersection at or near point 
3051.9670446535761 3987.14902338094 at 
3051.9670446535761 3987.14902338094

The coordinates indicate the red circle on the figure… so it is easy to find. Of course, there are tools, plugins and configurations to avoid this. I wish it wouldn’t be easy to draw something like that, but I was confronted with it multiple times, not only from my data. If you draw polygons, be aware of this, because this will create problems when you want to analyse your data. And, as the GEOS webpage puts it:

Cleaning is fundamentally a difficult problem, because things can be dirty in so many ways

At least, I stumbled upon a “fundamental difficult problem” today… great ! − and a perfect answer when I didn’t find time to clean the kitchen ¯\_(ツ)_/¯ But I didn’t find a straightforward answer about how to avoid it. Do you have any tips about how to automate this task or do you have a validity check implemented in your workflow? I will probably enforce something like this in the future, in R, whenever I load a vector.


or the same in sf


And if a shapefile is invalid, it will throw an error and I may look at it, or conscientiously agree to work with it. This will avoid to start a script a then running into this “fundamental” problem, again. As an advice, if you are digitising, enforce a topological checker. Always, without (topological) exception.