Flexible, platform-independent, open-source data structures and conversion mechanisms for paleolimnological data


Paleolimnological data by nature is complex and multidimensional, which makes the storage, visualization, and manipulation of such data inherently challenging. Dedicated programs have attempted to ameliorate these problems, but the storage format used can be inflexible and/or proprietary, limiting the exchange of data and replicability of visualizations and statistical analyses. This poster conceptualizes values as having qualifiers (e.g. core identifier, depth below surface, and measured parameter) and tags (e.g. amount of error, number of replicates, written notes pertaining to the value), each of which are represented differently depending on the data structure used. This poster identifies four data structures among many as optimal for storing paleolimnological data: a raw values list, a summarized values list, a raw values matrix, and a summarized values matrix. The former two are long-format structures that store each qualifier category as a column (e.g. core identifier, depth, and parameter) with one row per measured value. The two later structures are wide-format structures that store the values for each measured parameter in separate columns. Summarized variants store one value per unique depth, whereas raw variants are able to store values for each replicate. Long data structures store tags more easily than wide data structures, however wide data structures are used more often as input for plotting routines and statistical analyses optimized for paleolimnological data. The conversion between these four structures is easily accomplished using both interactive (e.g. spreadsheet software) and programmatic (e.g. R and Python) mechanisms. As more advanced statistical treatment of data becomes common in paleolimnological studies, storing data in its most flexible form may be advantageous to enhance the replicability of visualizations and statistical analysis. Readily available conversion mechanisms ensure storing data in its raw form is not a barrier to fast and effective completion of this process.

Ontario-Qu├ębec Paleolimnology Symposium