Kernel density estimation in R spatial

Here’s one way to do kernel density estimation in R spatial

Packages

This requires a surprising number of moving parts (at least the way I did it):

library(sf)
library(tmap)
library(spatstat)
library(maptools)
library(raster)
library(dplyr)

Data

The data are some point data (Airbnb listings from here) and some polygon data (NZ census Statistical Area 2 data).

Load the data

polys <- st_read("sa2.gpkg")
pts <- st_read("abb.gpkg")

And have a look

tm_shape(polys) +
  tm_polygons() + 
  tm_shape(pts) + 
  tm_dots()

spatstat for density estimation

The best way I know to do density estimation in the R ecosystem is using the spatstat library’s specialisation of base R’s density function. That means converting the point data to a spatstat planar point pattern (ppp) object, which involves a couple of steps.

pts.ppp <- pts$geom %>% 
  as("Spatial") %>% # we need to convert to sp as a bridge to spatstat
  as.ppp()          # and this is what we need maptools for...

A point pattern also needs a ‘window’, which we’ll make from the polygons.

pts.ppp$window <- polys %>%
  st_union() %>%       # combine all the polygons into a single shape
  as("Spatial") %>%    # convert to sp
  as.owin()            # convert to spatstat owin - again maptools...

Now the kernel density

We need some bounding box info to manage the density estimation resolution

bb <- st_bbox(polys)
cellsize <- 100
height <- (bb$ymax - bb$ymin) / cellsize
width <- (bb$xmax - bb$xmin) / cellsize

Now we specify the size of the raster we want with dimyx (note the order, y then x) using height and width.

We can convert this directly to a raster, but have to supply a CRS which we pull from the original points input dataset. At the time of writing (August 2021) you’ll get a complaint about the New Zealand Geodetic Datum 2000 because recent changes in how projections and datums are handled are still working themselves out.

kde <- density(pts.ppp, sigma = 500, dimyx = c(height, width)) %>%
  raster(crs = st_crs(pts)$wkt) # a ppp has no CRS information

Let’s see what we got

We can map this using tmap.

tm_shape(kde) +
  tm_raster(palette =  "Reds")

A fallback sanity check

To give us an alternative view of the data, let’s just count points in polygons

polys$n <- polys %>%
  st_contains(pts) %>%
  lengths()

And map the result

tm_shape(polys) +
  tm_polygons(col = "n", palette = "Reds", title = "Points in polygons")

Aggregate the density surface pixels to polygons

This isn’t at all necessary, but is also useful to know. This is also a relatively slow operation. Note that we add together the density estimates in the pixels contained by each polygon.

summed_densities <- raster::extract(kde, polys, fun = sum)

Append this to the polygons and rescale so the result is an estimate of the original count. We multiply by cellsize^2 because each cell contains an estimate of the per sq metre (in this case, but per sq distance unit in general) density, so multiplying by the area of the cells gives an estimated count.

polys$estimated_count = summed_densities[, 1] * cellsize ^ 2

And now we can make another map

tm_shape(polys) + 
  tm_polygons(col = "estimated_count", palette = "Reds", title = "500m KDE estimate")

Clearly, this approach doesn’t work very well with areas of dramatically different sizes which accounts for why the rural but extensive area to the west is badly overestimated.

Next