EDI hosted four workshops at the ESIP Summer Meeting, July 25-28 2017, Bloomington, IN:
1. “Create EML Using R and Share on GitHub”
In this session, we demonstrated how to use the R Package for EML generation (https://cran.r-project.org/web/packages/EML/index.html). EDI data managers have been using this package extensively and have developed a set of wrapper functions to make it more user friendly (https://github.com/EDIorg/EMLassemblyline). Participants learned how to use GitHub from within R Studio and how to share their R scripts on GitHub. You are encouraged to follow along on your computer. R Studio needs to be installed and a basic level of R experience is expected.
Please fill out our post-workshop survey here.
2. “Developing an Information Management Code Repository”
We presented a proposal Information Management Code Repository for general data manipulation routines, along with best practices. The code repository will include routines for data formatting, quality control, automating ingestion, metadata generation, and implementation of best practices for data archiving. Topics to be addressed include the development of a controlled vocabulary to organize the code and aid in code discovery; code organization within Github; repository governance and metadata format for code.
3. “Make a local research site data catalog using Application Programming Interfaces”
The Environmental Data Initiative (EDI) data repository is a platform that allows ecological researchers to archive data. However, while the repository provides search, download, and other data cataloging functions that facilitate data discoverability and access, research groups are often required to maintain a local catalog featuring those same data but on a project-specific website. Meeting this need is traditionally addressed by running two parallel systems: (1) the data submitted to the EDI repository, and (2) maintaining a local copy of the data catalog. This approach is inefficient and invites inconsistencies between systems. Although most repositories and DataONE provide APIs to access data, in this breakout session, we will discuss and demonstrate how data within the EDI repository may be accessed using the PASTA+ API. The API may be used to harvest data associated with a particular research group, project, or station, which can then be branded and styled for display on a project website. Using this approach, a research group can generate a local catalog of project data by capitalizing on EDI data repository functionality, and avoid the overhead of maintaining two separate data catalogs.
4. “Annotating datasets with measurement classes from the Ecosystem Ontology (ECSO)”
A new measurement ontology, ECSO, is now available, as is a system for annotating datasets in DataONE. Because ontologies include parent-child structure, synonyms and enable formal logic, the data discovery process can be streamlined when dataset metadata incorporates annotations from ontologies. The ECSO project currently focuses on measurements related to carbon cycling – such as fluxes (e.g., ecosystem exchange, NPP, respiration) and concentrations (of pigments, carbon compounds), with additions planned. During this workshop, we will explore the ECSO ontology and demonstrate the annotation system. Participants will then be invited to annotate existing EML-described datasets with carbon cycling measurements, using the DataONE annotation interface.
To help participants prepare for EDI’s hands-on workshops at the ESIP & ESA 2017 summer meetings, short instructions will be available on our EDI website after 20 June 2017 on the following topics:
- Getting ready: Step-by-step instructions on installing RStudio, R packages and setting up your RStudio work environment.
- Introduction to basic concepts of R used in the workshops.
- Instructions on setting up a GitHub account and repository.
Please contact us after 20 June 2017 if you need further help. EDI team members will also offer some help on setting up your RStudio environment at the time of the workshops.