The EDI data portal has added a new informational check to the suite of data congruence checks it runs on every data package submitted. This check is information only and the community felt it was valuable, so we released it outside the regular twice-yearly schedule on July 11th, 2018. For detailed information contact firstname.lastname@example.org. Continue reading “New informational check released for EDI data portal”
PASTA’s date and time format quality check took a step out of the box this past month and used Python to assist in parsing preferred date and time formats and generating regular expression strings that are used for validating date and time data. The date and time format quality check initially relied on Java 8’s new date and time library for interpreting and validating data documented by the Ecological Metadata Language date and time schema, but inconsistencies in how Java 8 handled the ISO 8601 standard required an innovative approach to the problem. Instead of relying strictly on the Java date and time library to validate data, EDI software developers used Python’s Parsimonious package to parse and generate the preferred date and time format strings into Java 8 usable regular expressions. These Python generated “reg-exs” are used by Java to validate date and time data that are being uploaded to the EDI Data Repository. This out-of-the-box solution provides a unique, but simple solution for handling one of the most common data formats seen by EDI.
A new feature has been developed for the EDI Repository that supports the association of data packages in the repository with journal articles that cite them. For a given data package (as specified by its package identifier), a logged-in user of the EDI Data Portal can enter information on a web form about a journal article in cases where the data package or its associated data are cited by the article. Journal articles that used that dataset, but did not cite the dataset DOI, may also be included. Only the Digital Object Identifier (DOI) or the URL of the journal article needs to be entered, though a user may optionally include the article title and the journal title. This information is permanently stored within the EDI Repository (unless later deleted by the same logged-in user) as well as incorporated into the DOI metadata for a data package that is sent to DataCite. It is also displayed on the summary page for the data package in the EDI Data Portal, e.g., https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-sbc&identifier=74, which has been used by three papers.
In 2016, EDI reconstituted a working group to define, prioritize, review and test EML Congruence Checks, and to serve as the information conduit for the larger community. A report was given to the LTER IMC at their annual meeting, in Bloomington IN, in the summer of 2017. All ECC material is stored in github: https://github.com/EDIorg/ecc. This update describes our experiences with checks introduced or considered in 2017. Continue reading “EML Congruence Checker (ECC)”
Many of the data packages in the EDI Data Repository contain metadata listing the data sources that the data package is derived from. Data sources may be data packages in the EDI Data Repository or links to data sets that are external to the repository. The EDI Data Portal displays these provenance relationships on the data package summary page. A good example is LAGOS-NE-LIMNO v1.087.1: A module for LAGOS-NE, a multi-scaled geospatial and temporal database of lake ecological context and water quality for thousands of U.S. Lakes: 1925-2013. When you view this data package in the data portal, scroll down to the Provenance section to see the list of ninety data packages that this data package is derived from! All ninety source data packages can be viewed by clicking on the links provided. The repository keeps track of these provenance relationships in both directions, so when you view any of the source data packages (for example: Acadia National Park, U.S. National Park Service using Lakes and Stream Monitoring Protocol for National Parks in the Northeast Temperate Network, Version 1.1 (2006-2011) you will see links back to any derived data packages that use this data package as one of its data sources!