The Environmental Data Initiative (EDI) assists researchers from field stations, individual laboratories, and research projects of all sizes to archive and publish their environmental data. EDI’s very successful Summer Fellowship Program for Data Management Training is one component of our Outreach and Training program. For the third consecutive year, EDI is reviewing applications from interested undergraduate and graduate students to become an EDI summer fellow. This year we are seeking nine fellows to be trained in the data publishing process and to support 9 research sites in their efforts to manage their data. EDI’s aim is to ensure that these young professionals learn state-of-the-art data stewardship practices. Continue reading “The Summer Fellowship Program of the Environmental Data Initiative”
The EDI technical team has released Cite, a lightweight web-service that generates citations for data packages archived in the EDI data repository. Cite is simple to use and requires only the EDI data package identifier appended to the end of the Cite URL: “https://cite.edirepository.org/cite/”. For example, the URL “https://cite.edirepository.org/cite/edi.460.1”, when entered into a web-browser query field, returns the following ESIP-stylized citation:
Armitage, A.R., C.A. Weaver, J.S. Kominoski, and S.C. Pennings. 2020.
Hurricane Harvey: Coastal wetland plant responses and recovery in Texas: 2014-2019 ver 1.
Environmental Data Initiative.
Citations are generated from information found in the data package’s EML document, including the title and creator elements, and from resource information in the EDI data repository, namely the repository name, archive date, data package revision value, and digital object identifier (DOI). This information is then stylized (think “layout”) according to recommended best practices published by community organizations, such as ESIP. Citations are formatted (think “presentation”) based on the mime-type set into the HTTP request accept header field. In the above example, the citation is formatted with HTML attributes, as demonstrated by the anchor tags and HREF surrounding the DOI URL string, since the web-browser automatically requests that responses be returned as “text/html”. Cite also supports additional citation styles (Dryad, BibTex, and raw JSON) and output mime-type formats (text/plain and application/json). Design of the Cite service framework allows new styles and formats to be added with relative ease. The motivation for Cite is to provide a consistent and simple interface for generating citations from data packages found in the EDI data repository. Cite is now being used in the EDI Data Portal to display citations on the data package metadata summary pages (aka, “landing pages”). Details of the Cite web-service can be found at https://github.com/PASTAplus/cite.
Data is becoming increasingly citable as first-order objects, including data archived in the EDI repository. One indication is that data package publications are indexed in personal Google Scholar user profiles, along with other scholarly articles, as for example in the profile of Paul Hanson (Research Professor at the Center for Limnology, University of Wisconsin-Madison).
There is also an increase in the number of cited data packages in scholarly articles. The figure below shows the annual number of EDI data package citations in scholarly articles over the past seven years as derived from Google Scholar.
In order for a data publication to be discoverable by search engines, including Google Scholar and Google’s Data Set Search, the data package needs to be “indexed”. A while ago EDI implemented sitemaps.org and schema.org metadata (often called Search Engine Optimization) to support search engine discovery and indexing of data packages archived in the EDI repository. Sitemaps metadata serves as a table of contents for high-value information found on websites so that search engines may more easily discover relevant web pages to index. For EDI, the sitemaps metadata points to the most recent data package versions, accessible through the EDI Data Portal, and is refreshed hourly.