wk version 0.5.0!

Jul 12, 2021 14 min read

A new version of wk is fresh on CRAN! Version 0.5 introduces some new features to the framework, incorporates most of the functionality that was previously in the wkutils package, and fixes a number of bugs that popped up in the development of s2 and geos. To showcase some of the new features I’ll use the Vermont counties data set from the VT Open Geodata Portal.

library(wk)
library(sf)
vt <- read_sf("VT_Data_-_County_Boundaries.geojson")["CNTYNAME"]
vt

## Simple feature collection with 14 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -73.4379 ymin: 42.72697 xmax: -71.46539 ymax: 45.01667
## Geodetic CRS:  WGS 84
## # A tibble: 14 × 2
##    CNTYNAME                                                             geometry
##    <chr>                                                      <MULTIPOLYGON [°]>
##  1 ORLEANS    (((-71.92088 45.00785, -71.91398 45.00764, -71.91122 45.00768, -7…
##  2 GRAND ISLE (((-73.23344 44.66446, -73.23334 44.66267, -73.23317 44.66085, -7…
##  3 CHITTENDEN (((-73.10283 44.67616, -73.03808 44.65116, -73.03784 44.65129, -7…
##  4 WINDSOR    (((-72.77108 43.93907, -72.77515 43.93367, -72.76891 43.93163, -7…
##  5 WINDHAM    (((-72.70141 43.22634, -72.68544 43.22259, -72.67632 43.22267, -7…
##  6 BENNINGTON (((-73.0961 43.30715, -73.09257 43.30694, -73.08995 43.30696, -73…
##  7 FRANKLIN   (((-72.99655 45.0148, -72.99599 45.01478, -72.99326 45.01487, -72…
##  8 ESSEX      (((-71.46539 45.01323, -71.46541 45.01305, -71.46547 45.01288, -7…
##  9 LAMOILLE   (((-72.60514 44.79044, -72.5979 44.78706, -72.5944 44.7903, -72.5…
## 10 CALEDONIA  (((-71.85297 44.70108, -71.85878 44.69545, -71.86541 44.68916, -7…
## 11 ORANGE     (((-72.31223 44.18541, -72.3052 44.1831, -72.29759 44.18364, -72.…
## 12 WASHINGTON (((-72.22314 44.42381, -72.22322 44.42351, -72.22343 44.4232, -72…
## 13 RUTLAND    (((-72.85266 43.8336, -72.83834 43.82926, -72.83067 43.83913, -72…
## 14 ADDISON    (((-72.97588 44.29505, -72.9708 44.26335, -72.96568 44.23274, -72…

Breaking down features and building them back up again

The biggest feature in the new release is the ability to break down features to simpler components (at the simplest, a bunch of coordinates) and build them back up again into geometry that you can pass elsewhere. Some of this functionality previously lived in wkutils but it turns out this is really important: pretty much all geometry in base R is done with big long x and y vectors (e.g., xy.coords()). To make geometry that was previously locked away in WKB, WKT, or sf objects accessible, there needed to be a way out.

The first level of a geometry you might want to break down are collections: MULTIPOLYGON, MULTILINESTRING, MULTIPOINT, and GEOMETRYCOLLECTION. These types are useful when you have multiple things that represent one element in a vector, like multiple bits of land representing a county. In our case the features were saved as MULTIPOLYGON but there aren’t actually any features with more than one. To simplify it, we can use wk_flatten():

(vt_poly <- wk_flatten(vt))

## Simple feature collection with 15 features and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -73.4379 ymin: 42.72697 xmax: -71.46539 ymax: 45.01667
## Geodetic CRS:  WGS 84
## # A tibble: 15 × 2
##    CNTYNAME                                                             geometry
##  * <chr>                                                           <POLYGON [°]>
##  1 ORLEANS    ((-71.92088 45.00785, -71.91398 45.00764, -71.91122 45.00768, -71…
##  2 GRAND ISLE ((-73.23344 44.66446, -73.23334 44.66267, -73.23317 44.66085, -73…
##  3 CHITTENDEN ((-73.10283 44.67616, -73.03808 44.65116, -73.03784 44.65129, -73…
##  4 WINDSOR    ((-72.77108 43.93907, -72.77515 43.93367, -72.76891 43.93163, -72…
##  5 WINDHAM    ((-72.70141 43.22634, -72.68544 43.22259, -72.67632 43.22267, -72…
##  6 BENNINGTON ((-73.0961 43.30715, -73.09257 43.30694, -73.08995 43.30696, -73.…
##  7 FRANKLIN   ((-72.99655 45.0148, -72.99599 45.01478, -72.99326 45.01487, -72.…
##  8 ESSEX      ((-71.46539 45.01323, -71.46541 45.01305, -71.46547 45.01288, -71…
##  9 LAMOILLE   ((-72.60514 44.79044, -72.5979 44.78706, -72.5944 44.7903, -72.59…
## 10 CALEDONIA  ((-71.85297 44.70108, -71.85878 44.69545, -71.86541 44.68916, -71…
## 11 CALEDONIA  ((-72.03989 44.15707, -72.03982 44.15704, -72.03971 44.15706, -72…
## 12 ORANGE     ((-72.31223 44.18541, -72.3052 44.1831, -72.29759 44.18364, -72.2…
## 13 WASHINGTON ((-72.22314 44.42381, -72.22322 44.42351, -72.22343 44.4232, -72.…
## 14 RUTLAND    ((-72.85266 43.8336, -72.83834 43.82926, -72.83067 43.83913, -72.…
## 15 ADDISON    ((-72.97588 44.29505, -72.9708 44.26335, -72.96568 44.23274, -72.…

By default wk_flatten() peels off one layer of collections, repeating rows where a feature has more than one element. This is a little like sf::st_cast() but is defined in terms of what you have rather than what you want. For advanced users, the flattening functionality is also implemented as a wk “filter”, which means it can do most of what it does without allocating any extra memory and can stream input and output very efficiently.

I’m demonstrating all of this with sf because it’s the most common use-case, but it works with any data frame/tibble with exactly one column implementing the wk_handle() generic (including an st_sfc()!). In fact, pretty much anything you do with wk also “just works” with data frames.

vt_tbl <- tibble::tibble(vt$CNTYNAME, geom = as_wkb(vt))
wk_flatten(vt_tbl)

## # A tibble: 15 × 2
##    `vt$CNTYNAME` geom                                                           
##    <chr>         <wk_wkb>                                                       
##  1 ORLEANS       <POLYGON ((-71.92088 45.00785, -71.91398 45.00764, -71.91122 4…
##  2 GRAND ISLE    <POLYGON ((-73.23344 44.66446, -73.23334 44.66267, -73.23317 4…
##  3 CHITTENDEN    <POLYGON ((-73.10283 44.67616, -73.03808 44.65116, -73.03784 4…
##  4 WINDSOR       <POLYGON ((-72.77108 43.93907, -72.77515 43.93367, -72.76891 4…
##  5 WINDHAM       <POLYGON ((-72.70141 43.22634, -72.68544 43.22259, -72.67632 4…
##  6 BENNINGTON    <POLYGON ((-73.0961 43.30715, -73.09257 43.30694, -73.08995 43…
##  7 FRANKLIN      <POLYGON ((-72.99655 45.0148, -72.99599 45.01478, -72.99326 45…
##  8 ESSEX         <POLYGON ((-71.46539 45.01323, -71.46541 45.01305, -71.46547 4…
##  9 LAMOILLE      <POLYGON ((-72.60514 44.79044, -72.5979 44.78706, -72.5944 44.…
## 10 CALEDONIA     <POLYGON ((-71.85297 44.70108, -71.85878 44.69545, -71.86541 4…
## 11 CALEDONIA     <POLYGON ((-72.03989 44.15707, -72.03982 44.15704, -72.03971 4…
## 12 ORANGE        <POLYGON ((-72.31223 44.18541, -72.3052 44.1831, -72.29759 44.…
## 13 WASHINGTON    <POLYGON ((-72.22314 44.42381, -72.22322 44.42351, -72.22343 4…
## 14 RUTLAND       <POLYGON ((-72.85266 43.8336, -72.83834 43.82926, -72.83067 43…
## 15 ADDISON       <POLYGON ((-72.97588 44.29505, -72.9708 44.26335, -72.96568 44…

The next level is vertices: one point feature for each vertex in the input.

(vt_vertices <- wk_vertices(vt))

## Simple feature collection with 18348 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -73.4379 ymin: 42.72697 xmax: -71.46539 ymax: 45.01667
## Geodetic CRS:  WGS 84
## # A tibble: 18,348 × 2
##    CNTYNAME             geometry
##  * <chr>             <POINT [°]>
##  1 ORLEANS  (-71.92088 45.00785)
##  2 ORLEANS  (-71.91398 45.00764)
##  3 ORLEANS  (-71.91122 45.00768)
##  4 ORLEANS  (-71.90486 45.00779)
##  5 ORLEANS  (-71.89931 45.00797)
##  6 ORLEANS   (-71.9006 45.00164)
##  7 ORLEANS   (-71.9023 44.99305)
##  8 ORLEANS  (-71.90409 44.98383)
##  9 ORLEANS  (-71.90639 44.97203)
## 10 ORLEANS     (-71.908 44.9641)
## # ℹ 18,338 more rows

Vertices are useful if you need to input something that requires points but you were handed a boundary or polygon layer (I use this kind of thing to set depth values at 0 along coastlines when calculating bathymetry, for example). This is good for GIS function stuff, but if you’re doing any custom geometry processing or just need coordinates for ggplot2 or some base R function, you’ll need straight up x and y vectors. If so, the new wk_coords() function is just for you!

(vt_coords <- tibble::as_tibble(wk_coords(vt)))

## # A tibble: 18,348 × 5
##    feature_id part_id ring_id     x     y
##         <int>   <int>   <int> <dbl> <dbl>
##  1          1       2       1 -71.9  45.0
##  2          1       2       1 -71.9  45.0
##  3          1       2       1 -71.9  45.0
##  4          1       2       1 -71.9  45.0
##  5          1       2       1 -71.9  45.0
##  6          1       2       1 -71.9  45.0
##  7          1       2       1 -71.9  45.0
##  8          1       2       1 -71.9  45.0
##  9          1       2       1 -71.9  45.0
## 10          1       2       1 -71.9  45.0
## # ℹ 18,338 more rows

And, of course, sometimes you get handed a bunch of coordinates but you really want an sf object. For this case there is wk_linestring(), wk_polygon(), and wk_collection(). For example, to reconstruct the original sf object starting with coordinates we could do:

vt_poly_reconstructed <- wk_polygon(
  xy(vt_coords$x, vt_coords$y, crs = "WGS84"),
  feature_id = vt_coords$feature_id,
  ring_id = vt_coords$ring_id
)

vt_multipoly_reconstructed <- wk_collection(
  vt_poly_reconstructed,
  geometry_type = wk_geometry_type("multipolygon"),
  feature_id = seq_along(vt_poly_reconstructed)
)

st_as_sfc(vt_multipoly_reconstructed)

## Geometry set for 14 features 
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -73.4379 ymin: 42.72697 xmax: -71.46539 ymax: 45.01667
## Geodetic CRS:  WGS 84
## First 5 geometries:

## MULTIPOLYGON (((-71.92088 45.00785, -71.91398 4...

## MULTIPOLYGON (((-73.23344 44.66446, -73.23334 4...

## MULTIPOLYGON (((-73.10283 44.67616, -73.03808 4...

## MULTIPOLYGON (((-72.77108 43.93907, -72.77515 4...

## MULTIPOLYGON (((-72.70141 43.22634, -72.68544 4...

dplyr integration

There’s nothing new about dplyr’s ability to work with vectors in wk 0.5, but the new functions are well-suited to dplyr’s group_by() and summarise() as well as the newish auto-unpacking feature in mutate(), summarise(), and reframe(). For example, to keep all the attributes when calling wk_coords() you can do:

library(dplyr)

vt %>% 
  as_tibble() %>% 
  group_by(CNTYNAME) %>% 
  reframe(wk_coords(geometry))

## # A tibble: 18,348 × 6
##    CNTYNAME feature_id part_id ring_id     x     y
##    <chr>         <int>   <int>   <int> <dbl> <dbl>
##  1 ADDISON           1       2       1 -73.0  44.3
##  2 ADDISON           1       2       1 -73.0  44.3
##  3 ADDISON           1       2       1 -73.0  44.2
##  4 ADDISON           1       2       1 -73.0  44.2
##  5 ADDISON           1       2       1 -73.0  44.2
##  6 ADDISON           1       2       1 -73.0  44.2
##  7 ADDISON           1       2       1 -73.0  44.2
##  8 ADDISON           1       2       1 -73.0  44.2
##  9 ADDISON           1       2       1 -73.0  44.2
## 10 ADDISON           1       2       1 -73.0  44.2
## # ℹ 18,338 more rows

To build back up a geometry you can also use group_by() and summarise() in their more traditional roles (returning one row per group). This is a tiny bit slower but so much easier to read.

vt_coords %>% 
  group_by(feature_id) %>% 
  summarise(geom = wk_polygon(xy(x, y, crs = "WGS84")))

## # A tibble: 14 × 2
##    feature_id geom                                                              
##         <int> <wk_wkb>                                                          
##  1          1 <POLYGON ((-71.92088 45.00785, -71.91398 45.00764, -71.91122 45.0…
##  2          2 <POLYGON ((-73.23344 44.66446, -73.23334 44.66267, -73.23317 44.6…
##  3          3 <POLYGON ((-73.10283 44.67616, -73.03808 44.65116, -73.03784 44.6…
##  4          4 <POLYGON ((-72.77108 43.93907, -72.77515 43.93367, -72.76891 43.9…
##  5          5 <POLYGON ((-72.70141 43.22634, -72.68544 43.22259, -72.67632 43.2…
##  6          6 <POLYGON ((-73.0961 43.30715, -73.09257 43.30694, -73.08995 43.30…
##  7          7 <POLYGON ((-72.99655 45.0148, -72.99599 45.01478, -72.99326 45.01…
##  8          8 <POLYGON ((-71.46539 45.01323, -71.46541 45.01305, -71.46547 45.0…
##  9          9 <POLYGON ((-72.60514 44.79044, -72.5979 44.78706, -72.5944 44.790…
## 10         10 <POLYGON ((-71.85297 44.70108, -71.85878 44.69545, -71.86541 44.6…
## 11         11 <POLYGON ((-72.31223 44.18541, -72.3052 44.1831, -72.29759 44.183…
## 12         12 <POLYGON ((-72.22314 44.42381, -72.22322 44.42351, -72.22343 44.4…
## 13         13 <POLYGON ((-72.85266 43.8336, -72.83834 43.82926, -72.83067 43.83…
## 14         14 <POLYGON ((-72.97588 44.29505, -72.9708 44.26335, -72.96568 44.23…

The new coordinate functions make dplyr more accessible for doing raw coordinate processing if you’re into that kind of thing. If you’ll recall the short formula for the signed area of a polygon, you can check the winding direction of your polygons without leaving dplyr (the polygons here are wound correctly, hence the positive areas).

vt %>% 
  as_tibble() %>% 
  group_by(CNTYNAME) %>% 
  reframe(wk_coords(geometry)) %>% 
  group_by(CNTYNAME) %>%
  summarise(
    signed_area = 0.5 * sum(
      (lag(x, 1) - first(x)) * (y - lag(y, 2)),
      na.rm = TRUE
    )
  )

## # A tibble: 14 × 2
##    CNTYNAME   signed_area
##    <chr>            <dbl>
##  1 ADDISON        -0.235 
##  2 BENNINGTON     -0.194 
##  3 CALEDONIA      -0.142 
##  4 CHITTENDEN     -0.182 
##  5 ESSEX          -0.198 
##  6 FRANKLIN       -0.203 
##  7 GRAND ISLE     -0.0570
##  8 LAMOILLE       -0.137 
##  9 ORANGE         -0.201 
## 10 ORLEANS        -0.213 
## 11 RUTLAND        -0.273 
## 12 WASHINGTON     -0.203 
## 13 WINDHAM        -0.228 
## 14 WINDSOR        -0.282

First-class coordinate transforms

Some of the coolest new stuff in GIS is actually in JavaScript, like the really awesome tools for animated projections in D3. The tools in D3 work on arbitrary coordinate transforms (read: PROJ), but the tools that work on the transforms don’t have to know anything about datums or projections…they just care about a coordinate moving from one place to the next. This kind of thing is really common in GIS and visualization and deserved first-class support in wk. For those paying close attention to s2 development, I also need it to make geometry represented in Cartesian space valid on the sphere given an arbitrary projection (currently only implemented for plate carree).

Starting small, though, wk just provides the framework so that other packages can do cool stuff that I don’t need to maintain (maybe). I did put in enough transforms to make sure every thing was tested, which includes an affine transform and transforms that set/drop coordinate values (most usefully, Z and M).

wk_transform(vt, wk_affine_rescale(vt, rct(0, 0, 1, 1)))

## Simple feature collection with 14 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 36.20733 ymin: -24.0665 xmax: 37.20733 ymax: -23.0665
## CRS:           NA
## # A tibble: 14 × 2
##    CNTYNAME                                                             geometry
##  * <chr>                                                          <MULTIPOLYGON>
##  1 ORLEANS    (((36.97642 -23.07035, 36.97991 -23.07045, 36.98131 -23.07043, 36…
##  2 GRAND ISLE (((36.31099 -23.22033, 36.31104 -23.22111, 36.31112 -23.2219, 36.…
##  3 CHITTENDEN (((36.37721 -23.21522, 36.41003 -23.22614, 36.41015 -23.22608, 36…
##  4 WINDSOR    (((36.54539 -23.53713, 36.54333 -23.53949, 36.54649 -23.54038, 36…
##  5 WINDHAM    (((36.58071 -23.84841, 36.58881 -23.85005, 36.59343 -23.85001, 36…
##  6 BENNINGTON (((36.38062 -23.81311, 36.38241 -23.81321, 36.38374 -23.8132, 36.…
##  7 FRANKLIN   (((36.43108 -23.06732, 36.43137 -23.06733, 36.43275 -23.06729, 36…
##  8 ESSEX      (((37.20733 -23.068, 37.20732 -23.06808, 37.20729 -23.06816, 37.2…
##  9 LAMOILLE   (((36.62951 -23.16531, 36.63319 -23.16678, 36.63496 -23.16537, 36…
## 10 CALEDONIA  (((37.01084 -23.20433, 37.00789 -23.20679, 37.00453 -23.20954, 37…
## 11 ORANGE     (((36.77801 -23.42955, 36.78158 -23.43056, 36.78544 -23.43032, 36…
## 12 WASHINGTON (((36.82318 -23.32543, 36.82314 -23.32556, 36.82303 -23.32569, 36…
## 13 RUTLAND    (((36.50403 -23.5832, 36.51129 -23.58509, 36.51518 -23.58078, 36.…
## 14 ADDISON    (((36.44156 -23.38166, 36.44414 -23.39551, 36.44674 -23.40888, 36…

wk_set_z(vt, 0)

## Simple feature collection with 14 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XYZ
## Bounding box:  xmin: -73.4379 ymin: 42.72697 xmax: -71.46539 ymax: 45.01667
## z_range:       zmin: 0 zmax: 0
## Geodetic CRS:  WGS 84
## # A tibble: 14 × 2
##    CNTYNAME                                                             geometry
##  * <chr>                                                      <MULTIPOLYGON [°]>
##  1 ORLEANS    Z (((-71.92088 45.00785 0, -71.91398 45.00764 0, -71.91122 45.007…
##  2 GRAND ISLE Z (((-73.23344 44.66446 0, -73.23334 44.66267 0, -73.23317 44.660…
##  3 CHITTENDEN Z (((-73.10283 44.67616 0, -73.03808 44.65116 0, -73.03784 44.651…
##  4 WINDSOR    Z (((-72.77108 43.93907 0, -72.77515 43.93367 0, -72.76891 43.931…
##  5 WINDHAM    Z (((-72.70141 43.22634 0, -72.68544 43.22259 0, -72.67632 43.222…
##  6 BENNINGTON Z (((-73.0961 43.30715 0, -73.09257 43.30694 0, -73.08995 43.3069…
##  7 FRANKLIN   Z (((-72.99655 45.0148 0, -72.99599 45.01478 0, -72.99326 45.0148…
##  8 ESSEX      Z (((-71.46539 45.01323 0, -71.46541 45.01305 0, -71.46547 45.012…
##  9 LAMOILLE   Z (((-72.60514 44.79044 0, -72.5979 44.78706 0, -72.5944 44.7903 …
## 10 CALEDONIA  Z (((-71.85297 44.70108 0, -71.85878 44.69545 0, -71.86541 44.689…
## 11 ORANGE     Z (((-72.31223 44.18541 0, -72.3052 44.1831 0, -72.29759 44.18364…
## 12 WASHINGTON Z (((-72.22314 44.42381 0, -72.22322 44.42351 0, -72.22343 44.423…
## 13 RUTLAND    Z (((-72.85266 43.8336 0, -72.83834 43.82926 0, -72.83067 43.8391…
## 14 ADDISON    Z (((-72.97588 44.29505 0, -72.9708 44.26335 0, -72.96568 44.2327…

The point of this all, though, is that extension packages can make their own transforms. There’s no C++ wrapper for this yet but the C is relatively minimal.

#include "cpp11.hpp"
using namespace cpp11;
#include "wk-v1.h"
#include "wk-v1-impl.c"

typedef struct {
  double dx;
  double dy;
} bump_trans_t;

int bump_trans_trans(R_xlen_t feature_id, const double* xyzm_in, double* xyzm_out, void* trans_data) {
  bump_trans_t* data = (bump_trans_t*) trans_data;
  xyzm_out[0] = xyzm_in[0] + data->dx;
  xyzm_out[1] = xyzm_in[1] + data->dy;
  return WK_CONTINUE;
}

void bump_trans_finalizer(void* trans_data) {
  delete (bump_trans_t*) trans_data;
}

[[cpp11::linking_to(wk)]]
[[cpp11::register]]
sexp bump_trans_new(double dx, double dy) {
  wk_trans_t* trans = wk_trans_create();
  trans->trans = &bump_trans_trans;
  trans->finalizer = &bump_trans_finalizer;
  trans->trans_data = new bump_trans_t {dx, dy};
  return wk_trans_create_xptr(trans, R_NilValue, R_NilValue);
}

You do need a tiny bit of R infrastructure to make this work smoothly:

bump_trans <- function(dx = 0, dy = 0) {
  new_wk_trans(bump_trans_new(dx, dy), "bump_trans")
}

Then your transform is accessible for anything that needs it! Most algorithms will probably also need an implementation of wk_trans_inverse() to return the inverse operation.

wk_transform(wkt("POINT (0 0)"), bump_trans(12, 13))

## <wk_wkt[1]>
## [1] POINT (12 13)

Plotting

As a side effect of some of this, plotting now “just works” without the wkutils package for the built-in vector types. It isn’t particularly fast or glamourous, but it’s good enough for a basic “get this geometry on my screen”. Previously this was done using a confusing circular dependency on wkutils. It’s used for plot() for wkb() and wkt(), but also works for arbitrary data types with a wk_handle() method via wk_plot().

wk_plot(vt)
wk_plot(vt_vertices, add = T)

What’s next?

So far wk development has focused on getting all the nuts and bolts in place so that extensions can do really awesome stuff with few dependencies. Now is that time! I’m hoping to get transforms working for PROJ so that s2 can do a better job creating valid spherical geometry from projected coordinates. Finally, I’m hoping to get some file readers working so that the lazy nature of the handler/filter system can really shine.