Visualizing and analyzing geographic data
This post is written as a follow-up to my post on JavaScript mapping. In the early 2000’s I worked in ecology as a GIS modeler and (very briefly) in the field. Back then ArcView dominated GIS, especially in government agencies. In 2019 ArcView is called ArcGIS, and still looks to be dominant, though alternatives like the open-source qGIS and the commercial eSpatial have matured. Also now you can do many GIS tasks like geocoding and spatial querying without having a full-blown GIS. You just need R and its friendly neighborhood packages.
Scale matters
Before you install every R geographic package in CRAN or GitHub, it’s important to know the physical scale of the task your’re performing. Mapping at street level is different when mapping countries or entire regions. For one, you don’t need to worry about projection in the former but they are critically important in the latter. So a static map of the continental US only needs a single function call. Similarly, if you want to highlight a few cities on that map, simply adding the corresponding markers directly via their latitude+longitude will work, without needing the overkill of a full-blown geocoding workflow.
The R ecosystem for working with geographic data
For obtaining data, the rnaturalearth
package draws from the excellent database of the same name, similarly osmdata
pulls from OpenStreetMap.
For data structures, one well-known package is sf
, which allows you to store Simple Features. Geographic data are for the most part tabular, so you can do all the wrangling using tidyverse
tools if you wish. To calculate spatial auto-correlation (eg. Moran’s I), you can use the spdep
package.
For handling projections, the low-level GDAL
library provides bindings for both R (in the form of rgdal) and python.
To detect clusters of objects in geographic data, particularly useful in spatial epidemiology, SpatialEpi
implements the classic Besag-Newell algorithm, and scanstatistics
implements different scan statistics, such as Kulldorff’s.
For plotting maps, ggplot2
conveniently implements geom_sf
for static maps, and coord_sf
for projection, as described in this great writeup on r-spatial.org.
Geocoding requires a bit more work. ggmap
supports GoogleMaps and OpenStreetMap, which I prefer since it doesn’t require an API key. If you want to geocode IP addresses, there is an appropriately named r_IPgeocode package for that, of course.
If instead you want to create hexbin maps, eg. election results, the rgeos
package can calculate centroids of each bin, which can then be fortified to a geom_polygon
, as demonstrated here.
It’s good to be polyglot: geographic data in Python
If you’re working in Python, GeoViews for handling large-scale maps, part of the HoloViz* collection of libraries. For visualization, you can use the old-school matplotlib, the new-school seaborn, as well as a Python version of ggplot. If you need interactivity, the web-oriented Bokeh works well.
*Somewhat confusingly, HoloViz also provides hvPlot
, which allows general-purpose plotting that partially overlaps with seaborn/matplotlib/ggplot.