News

Updates to user-contributed journal citation interface on the EDI Data Portal

The Environmental Data Initiative (EDI) has recently updated its user-contributed journal citation interface on its Data Portal to include more granular information regarding the type of citation being submitted. The addition of the Relation Type form field allows you to select the relationship between the data package and the journal manuscript where the data package is mentioned using one of three relationship types: “IsCitedBy” – this data package is formally cited in the manuscript, “IsDescribedBy” – this data package is explicitly described within the manuscript, or “IsReferencedBy” – this data package is implicitly described within the manuscript. This information is conveyed to DataCite through an update of the Digital Object Identifier (DOI) metadata and provides greater exposure to the data package through DataCite’s event data and CrossRef, an official DOI registrar of the International DOI Foundation for academic journals. The EDI Data Portal allows any user with an EDI provisioned account to add a journal citation to any data package, regardless of data package ownership, thereby greatly increasing related information about the data package – a win-win for the entire community!

EDI repository, News, Technical

Normalization of Creator Names in EDI’s Data Portal

The Advanced Search feature of EDI’s Data Portal lets you select a dataset Creator name from a drop-down list of all dataset creators in our repository. The search then displays all the datasets that have that name as one of its creators.

Unfortunately, many creators’ names occur in multiple variations in different datasets and, in the past, each variation appeared separately in the drop-down list. Selecting a variant would return only the datasets for which the creator’s name was spelled as in that particular variant. To give a hypothetical example, the names James T Kirk, James T. Kirk, Jim Kirk, J T Kirk, and J Kirk may all refer to the same person. In such a case, we’d like to display a single, canonical entry (James T Kirk, in this example) and have the search function return the union of all datasets for which that creator’s name appears in one of its variations.

We have recently created web services that implement this kind of names normalization, and the EDI Data Portal is now using the normalized names.

For example, the following names all refer to the same person: “McKnight, Diane M”, “McKnight, Diane”, “Mcknight, Diane”, and “Mcnight, Diane”. The drop-down list now shows just “McKnight, Diane M” and selecting it finds the datasets corresponding to any of the four variants. In addition, accents in names are handled and they now sort in the expected order in the drop-down list. For example, “González, Maria J” corresponds to “González, María J”, “Gonzalez, Maria”, and “Gonzalez, Maria J”. And there are names with misspellings in the surname that are now handled correctly. For example, “Sokol, Eric R” is the same person as “Sokal, Eric”.

You get the idea. If you’re interested in some of the technical details, read on.

Continue reading “Normalization of Creator Names in EDI’s Data Portal”
News, Thematic Standardization

Harmonizing ecological community survey data for reuse: an update

The idea of harmonizing data is not new, and for some research domains has been successful. Our body of long-term observations of organisms in ecological communities is growing, and many datasets have been used already in synthesis and meta analyses – but only after considerable effort to bring them into alignment.  A goal of EDI has been to develop recommendations for data harmonization, and to convert “raw data” in specific domains into a common data model to prepare them for analysis and accelerate synthesis or meta analyses.

Temporal, spatial and taxonomic coverage of datasets available in the ecocomDP model. Data source: Black, EDI; Gray, NEON. A) Temporal coverage (years), B) Temporal evenness (years), C) Spatial extent, D) group. An asterisk indicates that two groups (Tick, Mosquito) are specifically targeted by NEON. When these taxa occur in EDI datasets, they are plotted here with Arthropods

EDI recently finalized its data model for ecological community surveys, called “ecocomDP”, which is described in a recent open-access paper. EDI harmonization uses the workflow approach supported by EDI’s PASTA platform to reformat data without altering the original. An R package is available from CRAN  to assist with reformatting original tables and work with ecocomDP data. Development of both the model and the R package was collaborative, involving NEON and LTER scientists and data managers. Another result of that collaboration is that the NEON Network now exposes their community surveys in the ecocomDP model, via the R package, The figure shows temporal, spatial and taxonomic coverage of datasets available in the ecocomDP model from EDI and NEON.

Continue reading “Harmonizing ecological community survey data for reuse: an update”
News

Integrating Long-Tail Data: How Far Are We?

EDI’s Kristin Vanderbilt and Corinna Gries co-edited a Special Issue of Ecological Informatics “Integrating Long-Tail Data: How Far Are We?” that explores how far the informatics community has come toward lessening the time researchers must spend integrating small, heterogeneous datasets prior to analyzing them.

©Elsevier
Continue reading “Integrating Long-Tail Data: How Far Are We?”