Visualizing and analyzing geographic data

This post is written as a follow-up to my post on JavaScript mapping. In the early 2000’s I worked in ecology as a GIS modeler and (very briefly) in the field. Back then ArcView dominated GIS, especially in government agencies. In 2019 ArcView is called ArcGIS, and still looks to be dominant, though alternatives like the open-source qGIS and the commercial eSpatial have matured. Also now you can do many GIS tasks like geocoding and spatial querying without having a full-blown GIS. You just need R and its friendly neighborhood packages.

Scale matters

Before you install every R geographic package in CRAN or GitHub, it’s important to know the physical scale of the task your’re performing. Mapping at street level is different when mapping countries or entire regions. For one, you don’t need to worry about projection in the former but they are critically important in the latter. So a static map of the continental US only needs a single function call. Similarly, if you want to highlight a few cities on that map, simply adding the corresponding markers directly via their latitude+longitude will work, without needing the overkill of a full-blown geocoding workflow.

The R ecosystem for working with geographic data

For obtaining data, the rnaturalearth package draws from the excellent database of the same name, similarly osmdata pulls from OpenStreetMap.

For data structures, one well-known package is sf, which allows you to store Simple Features. Geographic data are for the most part tabular, so you can do all the wrangling using tidyverse tools if you wish. To calculate spatial auto-correlation (eg. Moran’s I), you can use the spdep package.

For handling projections, the low-level GDAL library provides bindings for both R (in the form of rgdal) and python.

To detect clusters of objects in geographic data, particularly useful in spatial epidemiology, SpatialEpi implements the classic Besag-Newell algorithm, and scanstatistics implements different scan statistics, such as Kulldorff’s.

For plotting maps, ggplot2 conveniently implements geom_sf for static maps, and coord_sf for projection, as described in this great writeup on r-spatial.org.

Geocoding requires a bit more work. ggmap supports GoogleMaps and OpenStreetMap, which I prefer since it doesn’t require an API key. If you want to geocode IP addresses, there is an appropriately named r_IPgeocode package for that, of course.

If instead you want to create hexbin maps, eg. election results, the rgeos package can calculate centroids of each bin, which can then be fortified to a geom_polygon, as demonstrated here.

It’s good to be polyglot: geographic data in Python

If you’re working in Python, GeoViews for handling large-scale maps, part of the HoloViz* collection of libraries. For visualization, you can use the old-school matplotlib, the new-school seaborn, as well as a Python version of ggplot. If you need interactivity, the web-oriented Bokeh works well.

*Somewhat confusingly, HoloViz also provides hvPlot, which allows general-purpose plotting that partially overlaps with seaborn/matplotlib/ggplot.

Written on December 3, 2019