Kernel density estimation in R spatial
Here’s one way to do kernel density estimation in R spatial
Packages
This requires a surprising number of moving parts (at least the way I did it):
library(sf)
library(tmap)
library(spatstat)
library(maptools)
library(raster)
library(dplyr)
Data
The data are some point data (Airbnb listings from here) and some polygon data (NZ census Statistical Area 2 data).
Load the data
polys <- st_read("sa2.gpkg")
pts <- st_read("abb.gpkg")
And have a look
tm_shape(polys) +
tm_polygons() +
tm_shape(pts) +
tm_dots()
spatstat
for density estimation
The best way I know to do density estimation in the R ecosystem is using the spatstat
library’s specialisation of base R’s density
function. That means converting the point data to a spatstat
planar point pattern (ppp
) object, which involves a couple of steps.
pts.ppp <- pts$geom %>%
as("Spatial") %>% # we need to convert to sp as a bridge to spatstat
as.ppp() # and this is what we need maptools for...
A point pattern also needs a ‘window’, which we’ll make from the polygons.
pts.ppp$window <- polys %>%
st_union() %>% # combine all the polygons into a single shape
as("Spatial") %>% # convert to sp
as.owin() # convert to spatstat owin - again maptools...
Now the kernel density
We need some bounding box info to manage the density estimation resolution
bb <- st_bbox(polys)
cellsize <- 100
height <- (bb$ymax - bb$ymin) / cellsize
width <- (bb$xmax - bb$xmin) / cellsize
Now we specify the size of the raster we want with dimyx
(note the order, y then x) using height
and width
.
We can convert this directly to a raster, but have to supply a CRS which we pull from the original points input dataset. At the time of writing (August 2021) you’ll get a complaint about the New Zealand Geodetic Datum 2000 because recent changes in how projections and datums are handled are still working themselves out.
kde <- density(pts.ppp, sigma = 500, dimyx = c(height, width)) %>%
raster(crs = st_crs(pts)$wkt) # a ppp has no CRS information
Let’s see what we got
We can map this using tmap
.
tm_shape(kde) +
tm_raster(palette = "Reds")
A fallback sanity check
To give us an alternative view of the data, let’s just count points in polygons
polys$n <- polys %>%
st_contains(pts) %>%
lengths()
And map the result
tm_shape(polys) +
tm_polygons(col = "n", palette = "Reds", title = "Points in polygons")
Aggregate the density surface pixels to polygons
This isn’t at all necessary, but is also useful to know. This is also a relatively slow operation. Note that we add together the density estimates in the pixels contained by each polygon.
summed_densities <- raster::extract(kde, polys, fun = sum)
Append this to the polygons and rescale so the result is an estimate of the original count. We multiply by cellsize^2
because each cell contains an estimate of the per sq metre (in this case, but per sq distance unit in general) density, so multiplying by the area of the cells gives an estimated count.
polys$estimated_count = summed_densities[, 1] * cellsize ^ 2
And now we can make another map
tm_shape(polys) +
tm_polygons(col = "estimated_count", palette = "Reds", title = "500m KDE estimate")
Clearly, this approach doesn’t work very well with areas of dramatically different sizes which accounts for why the rural but extensive area to the west is badly overestimated.