wk version 0.5.0!

wk version 0.5.0!

A new version of wk is fresh on CRAN! Version 0.5 introduces some new features to the framework, incorporates most of the functionality that was previously in the wkutils package, and fixes a number of bugs that popped up in the development of s2 and geos. To showcase some of the new features I’ll use the Vermont counties data set from the VT Open Geodata Portal.

library(wk)
library(sf)
vt <- read_sf("VT_Data_-_County_Boundaries.geojson")["CNTYNAME"]
vt
#> Simple feature collection with 14 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -73.4379 ymin: 42.72697 xmax: -71.46539 ymax: 45.01667
#> Geodetic CRS:  WGS 84
#> # A tibble: 14 x 2
#>    CNTYNAME                                                             geometry
#>    <chr>                                                      <MULTIPOLYGON [°]>
#>  1 ORLEANS    (((-71.92088 45.00785, -71.92933 45.00812, -71.94575 45.00838, -7…
#>  2 GRAND ISLE (((-73.23344 44.66446, -73.23341 44.66632, -73.23355 44.66813, -7…
#>  3 CHITTENDEN (((-73.10283 44.67616, -73.16068 44.69752, -73.16189 44.69799, -7…
#>  4 WINDSOR    (((-72.77108 43.93907, -72.78996 43.94532, -72.78522 43.9522, -72…
#>  5 WINDHAM    (((-72.70141 43.22634, -72.73021 43.2333, -72.75071 43.23825, -72…
#>  6 BENNINGTON (((-73.0961 43.30715, -73.1146 43.30827, -73.12266 43.30869, -73.…
#>  7 FRANKLIN   (((-72.99655 45.0148, -73.00945 45.01503, -73.01624 45.01516, -73…
#>  8 ESSEX      (((-71.46539 45.01323, -71.4654 45.01338, -71.46541 45.01358, -71…
#>  9 LAMOILLE   (((-72.60514 44.79044, -72.60572 44.79073, -72.60685 44.79128, -7…
#> 10 CALEDONIA  (((-71.85297 44.70108, -71.84619 44.70755, -71.84129 44.7123, -71…
#> 11 ORANGE     (((-72.31223 44.18541, -72.3206 44.18818, -72.32826 44.1907, -72.…
#> 12 WASHINGTON (((-72.22314 44.42381, -72.22334 44.42408, -72.2235 44.42432, -72…
#> 13 RUTLAND    (((-72.85266 43.8336, -72.86138 43.83624, -72.86845 43.83816, -72…
#> 14 ADDISON    (((-72.97588 44.29505, -72.997 44.29923, -73.00585 44.30098, -73.…

Breaking down features and building them back up again

The biggest feature in the new release is the ability to break down features to simpler components (at the simplest, a bunch of coordinates) and build them back up again into geometry that you can pass elsewhere. Some of this functionality previously lived in wkutils but it turns out this is really important: pretty much all geometry in base R is done with big long x and y vectors (e.g., xy.coords()). To make geometry that was previously locked away in WKB, WKT, or sf objects accessible, there needed to be a way out.

The first level of a geometry you might want to break down are collections: MULTIPOLYGON, MULTILINESTRING, MULTIPOINT, and GEOMETRYCOLLECTION. These types are useful when you have multiple things that represent one element in a vector, like multiple bits of land representing a county. In our case the features were saved as MULTIPOLYGON but there aren’t actually any features with more than one. To simplify it, we can use wk_flatten():

(vt_poly <- wk_flatten(vt))
#> Simple feature collection with 15 features and 1 field
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -73.4379 ymin: 42.72697 xmax: -71.46539 ymax: 45.01667
#> Geodetic CRS:  WGS 84
#> # A tibble: 15 x 2
#>    CNTYNAME                                                             geometry
#>  * <chr>                                                           <POLYGON [°]>
#>  1 ORLEANS    ((-71.92088 45.00785, -71.92933 45.00812, -71.94575 45.00838, -71…
#>  2 GRAND ISLE ((-73.23344 44.66446, -73.23341 44.66632, -73.23355 44.66813, -73…
#>  3 CHITTENDEN ((-73.10283 44.67616, -73.16068 44.69752, -73.16189 44.69799, -73…
#>  4 WINDSOR    ((-72.77108 43.93907, -72.78996 43.94532, -72.78522 43.9522, -72.…
#>  5 WINDHAM    ((-72.70141 43.22634, -72.73021 43.2333, -72.75071 43.23825, -72.…
#>  6 BENNINGTON ((-73.0961 43.30715, -73.1146 43.30827, -73.12266 43.30869, -73.1…
#>  7 FRANKLIN   ((-72.99655 45.0148, -73.00945 45.01503, -73.01624 45.01516, -73.…
#>  8 ESSEX      ((-71.46539 45.01323, -71.4654 45.01338, -71.46541 45.01358, -71.…
#>  9 LAMOILLE   ((-72.60514 44.79044, -72.60572 44.79073, -72.60685 44.79128, -72…
#> 10 CALEDONIA  ((-71.85297 44.70108, -71.84619 44.70755, -71.84129 44.7123, -71.…
#> 11 CALEDONIA  ((-72.03989 44.15707, -72.03991 44.15719, -72.04024 44.15707, -72…
#> 12 ORANGE     ((-72.31223 44.18541, -72.3206 44.18818, -72.32826 44.1907, -72.3…
#> 13 WASHINGTON ((-72.22314 44.42381, -72.22334 44.42408, -72.2235 44.42432, -72.…
#> 14 RUTLAND    ((-72.85266 43.8336, -72.86138 43.83624, -72.86845 43.83816, -72.…
#> 15 ADDISON    ((-72.97588 44.29505, -72.997 44.29923, -73.00585 44.30098, -73.0…

By default wk_flatten() peels off one layer of collections, repeating rows where a feature has more than one element. This is a little like sf::st_cast() but is defined in terms of what you have rather than what you want. For advanced users, the flattening functionality is also implemented as a wk “filter”, which means it can do most of what it does without allocating any extra memory and can stream input and output very efficiently.

I’m demonstrating all of this with sf because it’s the most common use-case, but it works with any data frame/tibble with exactly one column implementing the wk_handle() generic (including an st_sfc()!). In fact, pretty much anything you do with wk also “just works” with data frames.

vt_tbl <- tibble::tibble(vt$CNTYNAME, geom = as_wkb(vt))
wk_flatten(vt_tbl)
#> # A tibble: 15 x 2
#>    `vt$CNTYNAME` geom                                                           
#>    <chr>         <wk_wkb>                                                       
#>  1 ORLEANS       <POLYGON ((-71.9209 45.0079, -71.9293 45.0081, -71.9458 45.008…
#>  2 GRAND ISLE    <POLYGON ((-73.2334 44.6645, -73.2334 44.6663, -73.2336 44.668…
#>  3 CHITTENDEN    <POLYGON ((-73.1028 44.6762, -73.1607 44.6975, -73.1619 44.698…
#>  4 WINDSOR       <POLYGON ((-72.7711 43.9391, -72.79 43.9453, -72.7852 43.9522.…
#>  5 WINDHAM       <POLYGON ((-72.7014 43.2263, -72.7302 43.2333, -72.7507 43.238…
#>  6 BENNINGTON    <POLYGON ((-73.0961 43.3072, -73.1146 43.3083, -73.1227 43.308…
#>  7 FRANKLIN      <POLYGON ((-72.9966 45.0148, -73.0094 45.015, -73.0162 45.0152…
#>  8 ESSEX         <POLYGON ((-71.4654 45.0132, -71.4654 45.0134, -71.4654 45.013…
#>  9 LAMOILLE      <POLYGON ((-72.6051 44.7904, -72.6057 44.7907, -72.6069 44.791…
#> 10 CALEDONIA     <POLYGON ((-71.853 44.7011, -71.8462 44.7076, -71.8413 44.7123…
#> 11 CALEDONIA     <POLYGON ((-72.0399 44.1571, -72.0399 44.1572, -72.0402 44.157…
#> 12 ORANGE        <POLYGON ((-72.3122 44.1854, -72.3206 44.1882, -72.3283 44.190…
#> 13 WASHINGTON    <POLYGON ((-72.2231 44.4238, -72.2233 44.4241, -72.2235 44.424…
#> 14 RUTLAND       <POLYGON ((-72.8527 43.8336, -72.8614 43.8362, -72.8685 43.838…
#> 15 ADDISON       <POLYGON ((-72.9759 44.295, -72.997 44.2992, -73.0058 44.301..…

The next level is vertices: one point feature for each vertex in the input.

(vt_vertices <- wk_vertices(vt))
#> Simple feature collection with 18342 features and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -73.4379 ymin: 42.72697 xmax: -71.46539 ymax: 45.01667
#> Geodetic CRS:  WGS 84
#> # A tibble: 18,342 x 2
#>    CNTYNAME             geometry
#>  * <chr>             <POINT [°]>
#>  1 ORLEANS  (-71.92088 45.00785)
#>  2 ORLEANS  (-71.92933 45.00812)
#>  3 ORLEANS  (-71.94575 45.00838)
#>  4 ORLEANS   (-71.95514 45.0082)
#>  5 ORLEANS  (-71.96699 45.00797)
#>  6 ORLEANS  (-72.00706 45.00717)
#>  7 ORLEANS  (-72.00985 45.00712)
#>  8 ORLEANS  (-72.01698 45.00697)
#>  9 ORLEANS  (-72.02311 45.00681)
#> 10 ORLEANS  (-72.02493 45.00677)
#> # … with 18,332 more rows

Vertices are useful if you need to input something that requires points but you were handed a boundary or polygon layer (I use this kind of thing to set depth values at 0 along coastlines when calculating bathymetry, for example). This is good for GIS function stuff, but if you’re doing any custom geometry processing or just need coordinates for ggplot2 or some base R function, you’ll need straight up x and y vectors. If so, the new wk_coords() function is just for you!

(vt_coords <- tibble::as_tibble(wk_coords(vt)))
#> # A tibble: 18,342 x 5
#>    feature_id part_id ring_id     x     y
#>         <int>   <int>   <int> <dbl> <dbl>
#>  1          1       2       1 -71.9  45.0
#>  2          1       2       1 -71.9  45.0
#>  3          1       2       1 -71.9  45.0
#>  4          1       2       1 -72.0  45.0
#>  5          1       2       1 -72.0  45.0
#>  6          1       2       1 -72.0  45.0
#>  7          1       2       1 -72.0  45.0
#>  8          1       2       1 -72.0  45.0
#>  9          1       2       1 -72.0  45.0
#> 10          1       2       1 -72.0  45.0
#> # … with 18,332 more rows

And, of course, sometimes you get handed a bunch of coordinates but you really want an sf object. For this case there is wk_linestring(), wk_polygon(), and wk_collection(). For example, to reconstruct the original sf object starting with coordinates we could do:

vt_poly_reconstructed <- wk_polygon(
  xy(vt_coords$x, vt_coords$y, crs = "WGS84"),
  feature_id = vt_coords$feature_id,
  ring_id = vt_coords$ring_id
)

vt_multipoly_reconstructed <- wk_collection(
  vt_poly_reconstructed,
  geometry_type = wk_geometry_type("multipolygon"),
  feature_id = seq_along(vt_poly_reconstructed)
)

st_as_sfc(vt_multipoly_reconstructed)
#> Geometry set for 14 features 
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -73.4379 ymin: 42.72697 xmax: -71.46539 ymax: 45.01667
#> Geodetic CRS:  WGS 84
#> First 5 geometries:
#> MULTIPOLYGON (((-71.92088 45.00785, -71.92933 4...
#> MULTIPOLYGON (((-73.23344 44.66446, -73.23341 4...
#> MULTIPOLYGON (((-73.10283 44.67616, -73.16068 4...
#> MULTIPOLYGON (((-72.77108 43.93907, -72.78996 4...
#> MULTIPOLYGON (((-72.70141 43.22634, -72.73021 4...

dplyr integration

There’s nothing new about dplyr’s ability to work with vectors in wk 0.5, but the new functions are well-suited to dplyr’s group_by() and summarise() as well as the newish auto-unpacking feature in mutate() and summarise(). For example, to keep all the attributes when calling wk_coords() you can do:

library(dplyr)

vt %>% 
  as_tibble() %>% 
  group_by(CNTYNAME) %>% 
  summarise(wk_coords(geometry))
#> `summarise()` has grouped output by 'CNTYNAME'. You can override using the `.groups` argument.
#> # A tibble: 18,342 x 6
#> # Groups:   CNTYNAME [14]
#>    CNTYNAME feature_id part_id ring_id     x     y
#>    <chr>         <int>   <int>   <int> <dbl> <dbl>
#>  1 ADDISON           1       2       1 -73.0  44.3
#>  2 ADDISON           1       2       1 -73.0  44.3
#>  3 ADDISON           1       2       1 -73.0  44.3
#>  4 ADDISON           1       2       1 -73.0  44.3
#>  5 ADDISON           1       2       1 -73.0  44.3
#>  6 ADDISON           1       2       1 -73.0  44.3
#>  7 ADDISON           1       2       1 -73.0  44.3
#>  8 ADDISON           1       2       1 -73.0  44.3
#>  9 ADDISON           1       2       1 -73.0  44.3
#> 10 ADDISON           1       2       1 -73.0  44.3
#> # … with 18,332 more rows

To build back up a geometry you can also use group_by() and summarise() in their more traditional roles (returning one row per group). This is a tiny bit slower but so much easier to read.

vt_coords %>% 
  group_by(feature_id) %>% 
  summarise(geom = wk_polygon(xy(x, y, crs = "WGS84")))
#> # A tibble: 14 x 2
#>    feature_id geom                                                              
#>         <int> <wk_wkb>                                                          
#>  1          1 <POLYGON ((-71.9209 45.0079, -71.9293 45.0081, -71.9458 45.0084..…
#>  2          2 <POLYGON ((-73.2334 44.6645, -73.2334 44.6663, -73.2336 44.6681..…
#>  3          3 <POLYGON ((-73.1028 44.6762, -73.1607 44.6975, -73.1619 44.698...>
#>  4          4 <POLYGON ((-72.7711 43.9391, -72.79 43.9453, -72.7852 43.9522...> 
#>  5          5 <POLYGON ((-72.7014 43.2263, -72.7302 43.2333, -72.7507 43.2382..…
#>  6          6 <POLYGON ((-73.0961 43.3072, -73.1146 43.3083, -73.1227 43.3087..…
#>  7          7 <POLYGON ((-72.9966 45.0148, -73.0094 45.015, -73.0162 45.0152...>
#>  8          8 <POLYGON ((-71.4654 45.0132, -71.4654 45.0134, -71.4654 45.0136..…
#>  9          9 <POLYGON ((-72.6051 44.7904, -72.6057 44.7907, -72.6069 44.7913..…
#> 10         10 <POLYGON ((-71.853 44.7011, -71.8462 44.7076, -71.8413 44.7123...>
#> 11         11 <POLYGON ((-72.3122 44.1854, -72.3206 44.1882, -72.3283 44.1907..…
#> 12         12 <POLYGON ((-72.2231 44.4238, -72.2233 44.4241, -72.2235 44.4243..…
#> 13         13 <POLYGON ((-72.8527 43.8336, -72.8614 43.8362, -72.8685 43.8382..…
#> 14         14 <POLYGON ((-72.9759 44.295, -72.997 44.2992, -73.0058 44.301...>

The new coordinate functions make dplyr more accessible for doing raw coordinate processing if you’re into that kind of thing. If you’ll recall the short formula for the signed area of a polygon, you can check the winding direction of your polygons without leaving dplyr (the polygons here are wound correctly, hence the positive areas).

vt %>% 
  as_tibble() %>% 
  group_by(CNTYNAME) %>% 
  summarise(wk_coords(geometry), .groups = "keep") %>% 
  summarise(
    signed_area = 0.5 * sum(
      (lag(x, 1) - first(x)) * (y - lag(y, 2)),
      na.rm = TRUE
    )
  )
#> # A tibble: 14 x 2
#>    CNTYNAME   signed_area
#>    <chr>            <dbl>
#>  1 ADDISON         0.235 
#>  2 BENNINGTON      0.194 
#>  3 CALEDONIA       0.243 
#>  4 CHITTENDEN      0.182 
#>  5 ESSEX           0.198 
#>  6 FRANKLIN        0.203 
#>  7 GRAND ISLE      0.0570
#>  8 LAMOILLE        0.137 
#>  9 ORANGE          0.201 
#> 10 ORLEANS         0.213 
#> 11 RUTLAND         0.273 
#> 12 WASHINGTON      0.203 
#> 13 WINDHAM         0.228 
#> 14 WINDSOR         0.282

First-class coordinate transforms

Some of the coolest new stuff in GIS is actually in JavaScript, like the really awesome tools for animated projections in D3. The tools in D3 work on arbitrary coordinate transforms (read: PROJ), but the tools that work on the transforms don’t have to know anything about datums or projections…they just care about a coordinate moving from one place to the next. This kind of thing is really common in GIS and visualization and deserved first-class support in wk. For those paying close attention to s2 development, I also need it to make geometry represented in Cartesian space valid on the sphere given an arbitrary projection (currently only implemented for plate carree).

Starting small, though, wk just provides the framework so that other packages can do cool stuff that I don’t need to maintain (maybe). I did put in enough transforms to make sure every thing was tested, which includes an affine transform and transforms that set/drop coordinate values (most usefully, Z and M).

wk_transform(vt, wk_affine_rescale(vt, rct(0, 0, 1, 1)))
#> Simple feature collection with 14 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 0 ymin: 0 xmax: 1 ymax: 1
#> CRS:           NA
#> # A tibble: 14 x 2
#>    CNTYNAME                                                             geometry
#>  * <chr>                                                          <MULTIPOLYGON>
#>  1 ORLEANS    (((0.7690821 0.9961481, 0.7647967 0.9962621, 0.7564697 0.9963774,…
#>  2 GRAND ISLE (((0.1036546 0.8461755, 0.1036727 0.846988, 0.103599 0.847776, 0.…
#>  3 CHITTENDEN (((0.1698733 0.8512869, 0.1405409 0.8606129, 0.1399308 0.8608186,…
#>  4 WINDSOR    (((0.338057 0.5293715, 0.3284853 0.5321012, 0.330888 0.5351043, 0…
#>  5 WINDHAM    (((0.3733756 0.2180931, 0.3587752 0.2211338, 0.3483849 0.2232952,…
#>  6 BENNINGTON (((0.1732828 0.2533885, 0.1639036 0.2538759, 0.1598167 0.254059, …
#>  7 FRANKLIN   (((0.2237507 0.9991795, 0.2172121 0.9992838, 0.2137688 0.9993384,…
#>  8 ESSEX      (((1 0.9984975, 0.9999926 0.998561, 0.9999861 0.9986474, 0.999955…
#>  9 LAMOILLE   (((0.4221812 0.9011965, 0.4218882 0.9013238, 0.4213146 0.9015626,…
#> 10 CALEDONIA  (((0.8035071 0.8621686, 0.8069441 0.8649961, 0.809428 0.867068, 0…
#> 11 ORANGE     (((0.5706808 0.6369557, 0.5664359 0.6381655, 0.5625502 0.6392656,…
#> 12 WASHINGTON (((0.6158429 0.741074, 0.6157443 0.7411901, 0.6156621 0.741298, 0…
#> 13 RUTLAND    (((0.2967002 0.4833058, 0.2922765 0.484459, 0.2886918 0.4852998, …
#> 14 ADDISON    (((0.2342294 0.6848386, 0.2235227 0.686667, 0.2190376 0.687431, 0…
wk_set_z(vt, 0)
#> Simple feature collection with 14 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XYZ
#> Bounding box:  xmin: -73.4379 ymin: 42.72697 xmax: -71.46539 ymax: 45.01667
#> z_range:       zmin: 0 zmax: 0
#> Geodetic CRS:  WGS 84
#> # A tibble: 14 x 2
#>    CNTYNAME                                                             geometry
#>  * <chr>                                                      <MULTIPOLYGON [°]>
#>  1 ORLEANS    Z (((-71.92088 45.00785 0, -71.92933 45.00812 0, -71.94575 45.008…
#>  2 GRAND ISLE Z (((-73.23344 44.66446 0, -73.23341 44.66632 0, -73.23355 44.668…
#>  3 CHITTENDEN Z (((-73.10283 44.67616 0, -73.16068 44.69752 0, -73.16189 44.697…
#>  4 WINDSOR    Z (((-72.77108 43.93907 0, -72.78996 43.94532 0, -72.78522 43.952…
#>  5 WINDHAM    Z (((-72.70141 43.22634 0, -72.73021 43.2333 0, -72.75071 43.2382…
#>  6 BENNINGTON Z (((-73.0961 43.30715 0, -73.1146 43.30827 0, -73.12266 43.30869…
#>  7 FRANKLIN   Z (((-72.99655 45.0148 0, -73.00945 45.01503 0, -73.01624 45.0151…
#>  8 ESSEX      Z (((-71.46539 45.01323 0, -71.4654 45.01338 0, -71.46541 45.0135…
#>  9 LAMOILLE   Z (((-72.60514 44.79044 0, -72.60572 44.79073 0, -72.60685 44.791…
#> 10 CALEDONIA  Z (((-71.85297 44.70108 0, -71.84619 44.70755 0, -71.84129 44.712…
#> 11 ORANGE     Z (((-72.31223 44.18541 0, -72.3206 44.18818 0, -72.32826 44.1907…
#> 12 WASHINGTON Z (((-72.22314 44.42381 0, -72.22334 44.42408 0, -72.2235 44.4243…
#> 13 RUTLAND    Z (((-72.85266 43.8336 0, -72.86138 43.83624 0, -72.86845 43.8381…
#> 14 ADDISON    Z (((-72.97588 44.29505 0, -72.997 44.29923 0, -73.00585 44.30098…

The point of this all, though, is that extension packages can make their own transforms. There’s no C++ wrapper for this yet but the C is relatively minimal.

#include "cpp11.hpp"
using namespace cpp11;
#include "wk-v1.h"
#include "wk-v1-impl.c"

typedef struct {
  double dx;
  double dy;
} bump_trans_t;

int bump_trans_trans(R_xlen_t feature_id, const double* xyzm_in, double* xyzm_out, void* trans_data) {
  bump_trans_t* data = (bump_trans_t*) trans_data;
  xyzm_out[0] = xyzm_in[0] + data->dx;
  xyzm_out[1] = xyzm_in[1] + data->dy;
  return WK_CONTINUE;
}

void bump_trans_finalizer(void* trans_data) {
  delete (bump_trans_t*) trans_data;
}

[[cpp11::linking_to(wk)]]
[[cpp11::register]]
sexp bump_trans_new(double dx, double dy) {
  wk_trans_t* trans = wk_trans_create();
  trans->trans = &bump_trans_trans;
  trans->finalizer = &bump_trans_finalizer;
  trans->trans_data = new bump_trans_t {dx, dy};
  return wk_trans_create_xptr(trans, R_NilValue, R_NilValue);
}

You do need a tiny bit of R infrastructure to make this work smoothly:

bump_trans <- function(dx = 0, dy = 0) {
  new_wk_trans(bump_trans_new(dx, dy), "bump_trans")
}

Then your transform is accessible for anything that needs it! Most algorithms will probably also need an implementation of wk_trans_inverse() to return the inverse operation.

wk_transform(wkt("POINT (0 0)"), bump_trans(12, 13))
#> <wk_wkt[1]>
#> [1] POINT (12 13)

Plotting

As a side effect of some of this, plotting now “just works” without the wkutils package for the built-in vector types. It isn’t particularly fast or glamourous, but it’s good enough for a basic “get this geometry on my screen”. Previously this was done using a confusing circular dependency on wkutils. It’s used for plot() for wkb() and wkt(), but also works for arbitrary data types with a wk_handle() method via wk_plot().

wk_plot(vt)
wk_plot(vt_vertices, add = T)

What’s next?

So far wk development has focused on getting all the nuts and bolts in place so that extensions can do really awesome stuff with few dependencies. Now is that time! I’m hoping to get transforms working for PROJ so that s2 can do a better job creating valid spherical geometry from projected coordinates. Finally, I’m hoping to get some file readers working so that the lazy nature of the handler/filter system can really shine.

Avatar
Dewey Dunnington
Geoscientist, Programmer, Educator