R data structures for Chemical Modeling

In the past few months I’ve done some work on PHREEQC modeling in R, as well as a whole lot of XRF data work that required converting what seemed like an ungodly number of molecular concentrations (e.g. Al2O3) into elemental concentrations (Al). Both of these highlighted a need for chemical data structures in R such that user input to easyphreeqc can be properly validated and chemical calculations can be made reproducible easily. The result is chemr, which will hopefully integrate soon with other packages of the chemical persuasion.

Installation

You can install chemr from github with:

# install.packages("devtools")
devtools::install_github("paleolimbot/chemr")

If you can load the package, everything worked!

library(chemr)

The periodic tibble

The periodic tibble is Wikipedia’s version of the periodic table in data.frame (tibble) form. You can access these data in a few ways:

is_element("Hg")
#> [1] TRUE
elmass("Hg")
#>       Hg 
#> 200.5923
elz("Hg")
#> Hg 
#> 80
elsymbol(80)
#> [1] "Hg"

You can also access the entire periodic tibble by typing data(pt).

data(pt)
pt
#> # A tibble: 118 x 7
#>        z symbol      name group period      mass   valence
#>    <int>  <chr>     <chr> <int>  <int>     <dbl>    <list>
#>  1     1      H  Hydrogen     1      1  1.008000 <int [3]>
#>  2     2     He    Helium    18      1  4.002602 <int [1]>
#>  3     3     Li   Lithium     1      2  6.940000 <int [2]>
#>  4     4     Be Beryllium     2      2  9.012183 <int [3]>
#>  5     5      B     Boron    13      2 10.810000 <int [6]>
#>  6     6      C    Carbon    14      2 12.011000 <int [9]>
#>  7     7      N  Nitrogen    15      2 14.007000 <int [9]>
#>  8     8      O    Oxygen    16      2 15.999000 <int [5]>
#>  9     9      F  Fluorine    17      2 18.998403 <int [2]>
#> 10    10     Ne      Neon    18      2 20.179760 <int [1]>
#> # ... with 108 more rows

Molecules

Molecules are a collection of counted elements (or sub-molecules) with a charge. While it’s possible to create a molecule by “hand”, it’s much easier to use the character representation of a molecule, which is usually what you get when copy/pasting from a source.

mol("H2O")
#> <mol>
#> [1] H2O

And like everything else in R, mol objects are vectorized, so you can serialize an entire column of molecule formulas.

as_mol(c("H2O", "H+", "Fe(OH)3", "Ca+2"))
#> <mol>
#> [1] H2O     H+      Fe(OH)3 Ca+2

You can access the mass, charge, and elemental composition of a molecule using mass(), charge(), and as.data.frame() or as.matrix()

m <- as_mol(c("H2O", "H+", "Fe(OH)3", "Ca+2"))
mass(m)
#> [1]  18.0150   1.0080 106.8662  40.0784
charge(m)
#> [1] 0 1 0 2
as.data.frame(m)
#>       mol mol_text     mass charge H O Fe Ca
#> 1    2, 1      H2O  18.0150      0 2 1  0  0
#> 2       1       H+   1.0080      1 1 0  0  0
#> 3 1, 1, 1  Fe(OH)3 106.8662      0 3 3  1  0
#> 4       1     Ca+2  40.0784      2 0 0  0  1

Reactions

Reactions are essentially a molecule vector with coefficients (positive for the left side, negative for the right side). Similar to molecules, it’s easiest to use the serialized form (conveniently, what is generally copied/pasted):

as_reaction("2H2 + O2 = 2H2O")
#> <reaction> 2H2 + O2 = 2H2O

The is_balanced() and balance() functions will happily balance these for you, provided you have the correct number of species defined.

balance("H2 + O2 = H2O")
#> <reaction> 2H2 + O2 = 2H2O

You can access various components of a reaction in the same way as for molecules:

r <- as_reaction("2H2 + O2 = 2H2O")
lhs(r)
#> <reaction> 2H2 + O2 =
rhs(r)
#> <reaction> 2H2O =
mass(r) # mass balance of the reaction
#> [1] 0
charge(r) # charge balance of the reaction
#> [1] 0
as.data.frame(r)
#>    mol mol_text charge   mass coefficient H O
#> 1    2       H2      0  2.016           2 2 0
#> 2    2       O2      0 31.998           1 0 2
#> 3 2, 1      H2O      0 18.015          -2 2 1
as.matrix(r)
#>      H  O
#> H2   4  0
#> O2   0  2
#> H2O -4 -2

Molecule and Reaction arithmetic

Various arithmetic operators are available for molecule and reaction objects, such as +, * and ==.

m <- mol(~Fe2O3, ~H2O, ~NH3, ~`H+`)
m + as_mol("OH-")
#> <mol>
#> [1] Fe2O3OH- H2OOH-   NH3OH-   HOH
m * 2
#> <mol>
#> [1] Fe4O6 H4O2  N2H6  H2+2
m == as_mol(~H2O)
#> [1] FALSE  TRUE FALSE FALSE

Reactions have similar arithmetic, with coefficients to various molecules being added together.

r1 <- as_reaction("2H2 + O2 = 2H2O")
r1 + as_reaction("H2O = H2O")
#> <reaction> 2H2 + O2 + H2O = 2H2O + H2O

By default the reaction isn’t simplified, but can be using simplify() and remove_zero_counts().

simplify(r1 + as_reaction("H2O = H2O"))
#> <reaction> 2H2 + O2 = 2H2O
simplify(r1 - as_reaction("2H+ + 2OH- = 2H2O"))
#> <reaction> 2H2 + O2 + 0H2O = 2H+ + 2OH-
remove_zero_counts(simplify(r1 - as_reaction("2H+ + 2OH- = 2H2O")))
#> <reaction> 2H2 + O2 = 2H+ + 2OH-

The Wish List

There are lots of things missing from this package that should exist, including the include various parameters to molecules and equations such as Δ**H or aliases (e.g., CaSO4 as “gypsum”). Additionally, there is currently no way to indicate hydration in the same way as PHREEQC (e.g., CaSO4:2H2O). Currently this is possible only as CaSO4(H2O)2. Feel free to contribute to development or submit feature requests on GitHub.

Software Engineer & Geoscientist