Inspired by @gshotwell, I decided to have a look into bulk translating a ton of functions to SQL. The dplyr system to translate R code to SQL is really cool, but I’ve had some trouble in the past using it to write backend-agnostic code because of slightly different implementations of functions in different database backends.
Is there a reference document somewhere of which dplyr commands work on various database backends? \#rstats — Gordon Shotwell (@gshotwell) April 9, 2019 I should also mention that Bob Rudis posted a solution to this as well that includes more backends (this post only considers the ones directly in dbplyr).
I’m an avid Twitter follower of Simon Kuestenmacher (@simongerman600), who is a prolific tweeter of maps (all sorts). The other day I saw this tweet, which links to a reddit thread that used the PRISM dataset to make an animated map of precipitation in the US. A few weeks ago I had a colleague email me asking for the Canadian climate normals raw data (which can be found here), and having made an animated map of Earth’s paleogeography, I decided to give it a go for Canada.
The recent grounding of almost all Boeing 737 MAX-series aircraft in the world is, according to a recent CBC commentator, unprecedented. I’m not an aircraft expert (or even a hobbyist), but I do love data and mining publicly-available datasets. Inspired by the nycflights13 R package (a dataset of all the flights in and out of New York City in 2013) and the FlightRadar24 blog post regarding Lion Air flight JT610, I thought I would see what information is accessible to the public about flights that used the 737 MAX-series aircraft.
A side project of mine recently has been to play with PHREEQC, which is a powerful geochemical modelling platform put out by the USGS. In order to make the R package for phreeqc more accessible, I’ve started to wrap a few common uses of PHREEQC in a new R package, tidyphreeqc. In particular, I’m interested in using PHREEQC to take a look at the classic Pourbaix diagram, which is almost always represented in pure solution at a particular concentration of the target element, at 25°C.
This post covers creating stratigraphic diagrams using ggplot2, highlighting the helpers contained within the tidypaleo package, which I’ve been using for the past few months to create diagrams. I chose the ggplot2 framework because it is quite flexible and can be used to create almost any time-stratigraphic diagram except ones that involve multiple axes (we can have a fight about whether or not those are appropriate anyway, but if you absolutely need to create them I suggest you look elsewhere).
It is an exciting time for the integration of limnological and paleolimnological datasets. The National (US) Water Quality Monitoring Council Water Quality Portal has just made decades of state and federal water quality measurements available, the Pages2k project has collected hundreds of temperature proxy records for the last 2000 (ish) years, and the Neotoma database provides access to a large number of paleoecological datasets. For a final project in a course last fall, I chose to analyze the Circumpolar Diatom Database (CDD), which is a collection of water chemistry and diatom assemblage data hosted by the Aquatic Paleoecology Laboratory at ULaval.
There is a very old issue in ggplot2 about the ability to modify particular scales when using facet_wrap() or facet_grid(). I often have this problem when using lots of facets, as sometimes the labels overlap with eachother on some of the scales. Without a way to set the breaks on one particular scale, it’s hard to fix this without exporting an SVG and modifying the result (it’s usually possible to fix it by specifying an overall set of breaks, or by rotating the x labels using theme(axis.
There is a lot of talk about the ggplot2 package and the pipe. Should it be used? Some approaches, like the ggpipe package, replace many ggplot2 functions, adding the plot as the first argument so they can be used with the pipe. This ignores the fact that ggplot2 functions construct objects that can (and should) be re-used. Verbifying these noun functions to perform the task of creating the object and updating the plot object is one approach, and recently I wrote an experimental R package that implements it in just under 50 lines of code.
If you have ever worked with dates and times over a wide geographical area, you will know that timezone math is tedious but important. In the forthcoming update of the rclimateca package, the list of climate locations provided in the package will contain the UTC offsets for what Environment Canada calls “local standard time”. Because the UTC offset for local standard time at each location is not provided with the list of locations from Environment Canada, I had to use the latitude and longitude of each site to obtain this information.
It seems that the tools for writing papers in R/RStudio keep getting better and better, to the point where it is rare that I have something I need to do to write a paper that happens outside of RStudio. One of these things is abbreviating journal names, because for whatever reason the checkbox that does this within Zotero’s BibTex export doesn’t work particularly well. My way around this in the past was to wait until the article was about to be submited, and figure everything out in Microsoft Word at the very end.