Cite: A lightweight citation service for data packages in the EDI data repository

The EDI technical team has released Cite, a lightweight web-service that generates citations for data packages archived in the EDI data repository. Cite is simple to use and requires only the EDI data package identifier appended to the end of the Cite URL: “”. For example, the URL “”, when entered into a web-browser query field, returns the following ESIP-stylized citation:

Armitage, A.R., C.A. Weaver, J.S. Kominoski, and S.C. Pennings. 2020. 
Hurricane Harvey: Coastal wetland plant responses and recovery in Texas: 2014-2019 ver 1.
Environmental Data Initiative.
<a href=""></a>.

Citations are generated from information found in the data package’s EML document, including the title and creator elements, and from resource information in the EDI data repository, namely the repository name, archive date, data package revision value, and digital object identifier (DOI). This information is then stylized (think “layout”) according to recommended best practices published by community organizations, such as ESIP. Citations are formatted (think “presentation”) based on the mime-type set into the HTTP request accept header field. In the above example, the citation is formatted with HTML attributes, as demonstrated by the anchor tags and HREF surrounding the DOI URL string, since the web-browser automatically requests that responses be returned as “text/html”. Cite also supports additional citation styles (Dryad, BibTex, and raw JSON) and output mime-type formats (text/plain and application/json). Design of the Cite service framework allows new styles and formats to be added with relative ease. The motivation for Cite is to provide a consistent and simple interface for generating citations from data packages found in the EDI data repository. Cite is now being used in the EDI Data Portal to display citations on the data package metadata summary pages (aka, “landing pages”). Details of the Cite web-service can be found at


Google Scholar highlights EDI data packages as first-order citations in user profiles and in scholarly articles

Data is becoming increasingly citable as first-order objects, including data archived in the EDI repository. One indication is that data package publications are indexed in personal Google Scholar user profiles, along with other scholarly articles, as for example in the profile of Paul Hanson (Research Professor at the Center for Limnology, University of Wisconsin-Madison).

There is also an increase in the number of cited data packages in scholarly articles. The figure below shows the annual number of EDI data package citations in scholarly articles over the past seven years as derived from Google Scholar.

Annual number of EDI data package citations in scholarly articles

In order for a data publication to be discoverable by search engines, including Google Scholar and Google’s Data Set Search, the data package needs to be “indexed”. A while ago EDI implemented and metadata (often called Search Engine Optimization) to support search engine discovery and indexing of data packages archived in the EDI repository. Sitemaps metadata serves as a table of contents for high-value information found on websites so that search engines may more easily discover relevant web pages to index. For EDI, the sitemaps metadata points to the most recent data package versions, accessible through the EDI Data Portal, and is refreshed hourly.


EDI plans for EML 2.2

A new version of the Ecological Metadata Language (EML 2.2) was released recently with several significant additions. EDI is working with our community to explore the potential benefits of new EML features and how these work best for the data we handle, and to outline their incorporation into EDI systems. Feedback from a webinar indicated that several new EML features were high priority, and we have already adapted our data package views to display new content in a basic manner for project funding sources, taxonomic identifiers and semantic annotations at the dataset- and measurement- level. We also are migrating our harmonization format for community survey data (ecocomDP) to include annotation, and anticipate adapting ecocomDP creation code to include identifiers that we expect to appear in Level 0 (raw) data. We have begun the revision process of our Best Practices for EML Metadata material – anticipating new recommendations for EML 2.2 – by first migrating current versions to a more dynamic system using GitHub pages. Continue reading “EDI plans for EML 2.2”