The LTER Information Management Committee (IMC) has created a significant knowledge base, and made it available through its website, http://im.lternet.edu. That website is being reorganized, and many of its general data management resources are being migrated to other locations, most notably, to EDI (https://environmentaldatainitiative.org/resources/). One of these which is used both within and beyond the LTER is the EML Best Practices recommendations; Version 1 was released in 2005, and a much-expanded Version 2 in 2011. Version 3 was released in 2017 by EDI, with a simple goal: to generalize it for a broader community, while still acknowledging the LTER IMC as its originators. The Version 3 draft was circulated within the LTER’s best practices mailing list, and comments incorporated.
There have been discussions (and attempts) to make this document’s content dynamic, e.g, as individual pages in HTML. However, there are technical constraints that make examples, etc, problematic to display, and citation is simpler if it is kept entire. There are other online best practice resources (e.g., DataONE, and the EML project itself), and EDI is working with those groups on methods for integrating this document’s recommendations. So Version 3 will continue to be distributed as a PDF doc. The document itself and examples are housed in one of EDI’s github repositories https://github.com/EDIorg/dm-best-practices. For any future versions a new working group should be assembled. In fact, a release of EML 2.2 is in development (see https://github.com/NCEAS/eml/ , and so, an EML Best Practices Version 4 should probably be considered, some time after. Below is a summary of the changes in Version 3.
- The significant change to V3 was to remove its specificity to a single community. To acknowledge the LTER IMC as its originators, a section in the introduction describes the document’s history, including contributors’ names (based on working group participant lists). Throughout, the language was adjusted to use “EML preparers” instead of “LTER sites”. The examples have not changed (except as noted below), and all still refer to data from the “Fictitious LTER Site, or FLS”.
- Some V2 recommendations that are useful to a broad community also have special context. For example, use of a particular XML element or attribute may be requested by a community (e.g., LTER), or even required by a repository (e.g., EDI’s). These recommendations remain, because they highlight important aspects of what has been learned in practice. But instead of integrating those with the general text, V3 contains them in separate paragraphs as “Context notes”.
- V2 included a major section about handling of EML by specific applications, e.g., LTERMaps, Metacat and PASTA. This entire section is now absent, so that V3 is specific to EML content only. These parts of V2 should be reviewed and housed with appropriate projects, and cross-linked with other resources. V2 also included some general information about EML design, eg., a description of the attribute-unit model, and its use of XML types. These are planned to become stand-alone resources (e.g., web pages). Keeping V3 specific to EML content makes its overall management easier, and creation of other resources can be more flexible.
- There are a few changes to the content-recommendations themselves that reflect the way EML is used now, which has evolved somewhat since 2011. These were specifically highlighted for review by the LTER EML-BP group.
- V2 said that addresses for individuals should be filled out. But since external ids are now available for people (ORCIDs), V3 says that addresses can be omitted if an ORCID is present.
- spatialSamplingUnits: V2 recommended that this element (under the methods tree) be used for individual sampling sites, and at the dataset level, and that geographicCoverage be used for general coverage only. V3 says that individual sites can be placed at the dataset level, and the recommendation for spatialSamplingUnits is a “context note” for LTER, as this was requested for datasets to be ingested by LTERMapS. Listing sites at the dataset level means they are more likely to be indexed by aggregators for map displays.
- pubDate: As it is the only date field at the resource level, pubDate is used in the construction of package citations. For these to be meaningful, V3 has stronger language that the pubDate should should reflect the data’s “recentness”. Some preparers (LTER sites) use this field to hold the date when the dataset first became public. This usage has highlighted EML’s lack of metadata-management fields for time-series data, and suggestions were made for EML-2.2 to alleviate that.
- Increasingly, we have seen processing code included as an otherEntity. So it is now added to the list of “typical uses” in the Entity section.
The PDF can be found on the EDI website here, and should be cited as:
Best Practices for Dataset Metadata in Ecological Metadata Language (EML Best Practices V3). 2017. Environmental Data Initiative.