News, Thematic Standardization

Harmonizing ecological community survey data for reuse: an update

The idea of harmonizing data is not new, and for some research domains has been successful. Our body of long-term observations of organisms in ecological communities is growing, and many datasets have been used already in synthesis and meta analyses – but only after considerable effort to bring them into alignment.  A goal of EDI has been to develop recommendations for data harmonization, and to convert “raw data” in specific domains into a common data model to prepare them for analysis and accelerate synthesis or meta analyses.

Temporal, spatial and taxonomic coverage of datasets available in the ecocomDP model. Data source: Black, EDI; Gray, NEON. A) Temporal coverage (years), B) Temporal evenness (years), C) Spatial extent, D) group. An asterisk indicates that two groups (Tick, Mosquito) are specifically targeted by NEON. When these taxa occur in EDI datasets, they are plotted here with Arthropods

EDI recently finalized its data model for ecological community surveys, called “ecocomDP”, which is described in a recent open-access paper. EDI harmonization uses the workflow approach supported by EDI’s PASTA platform to reformat data without altering the original. An R package is available from CRAN  to assist with reformatting original tables and work with ecocomDP data. Development of both the model and the R package was collaborative, involving NEON and LTER scientists and data managers. Another result of that collaboration is that the NEON Network now exposes their community surveys in the ecocomDP model, via the R package, The figure shows temporal, spatial and taxonomic coverage of datasets available in the ecocomDP model from EDI and NEON.

EDI is currently focused on two fronts: to expand the corpus of harmonized intermediates (currently numbering 70 datasets), and to establish a workflow to contribute these observations to GBIF via an additional conversion from the ecocomDP model to the Darwin Core Archive. The similarity of these to EDI’s PASTA data packages makes this process relatively straightforward, once data are harmonized.

References

O’Brien, M, C.A.Smith, E.R. Sokol, C. Gries, N. Lany, S. Record and M.C.N.Castorani. 2021. ecocomDP: A flexible data design pattern for ecological community survey data. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2021.101374

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s