A Brief History and Infrastructure Overview of the Environmental Data Initiative
The Environmental Data Initiative (EDI) began in the summer of 2016 as a collaboration between two US National Science Foundation (NSF) grants, one awarded to the University of Wisconsin (UW) named NIMO and the other to the University of New Mexico (UNM) for PASTA+ (together, they are known as EDI). Both groups originate from the Long Term Ecological Research (LTER) Network and consist of highly motivated and experienced data practitioners, software developers, and research scientists. In addition to the LTER Network, EDI now supports a broad community of environmental and ecological scientists funded through the Long Term Research in Environmental Biology (LTREB), the Organization of Biological Field Stations (OBFS), and the Macrosystem Biology (MSB) programs at NSF. The goal of the LTER focused NIMO (National Information Management Office) project was to expand and enhance the support of informatics in the LTER program, while the goal of PASTA+ (Provenance Aware Synthesis Tracking Architecture – Plus) was to provide an open access data repository that was built using the PASTA software stack for communities other than LTER. To be more inclusive of all served communities, both goals are now part of EDI’s vision. As such, EDI is a combination of informatics expertise and a production-level data repository (Figure 1) for use by all four communities (and others). EDI also works closely with the LTER National Communications Office (NCO) and DataONE to promote data management best practices and stewardship, and supports two separate DataONE member nodes, one for LTER and the other for all non-LTER data (the EDI Member Node).
Figure 1: Components of the EDI infrastructure.
EDI Data Infrastructure
Development of the Provenance Aware Synthesis Tracking Architecture (PASTA) software began in 2009 by LTER information managers and software developers with the goal to serve as the LTER Network Information System data repository. A full production system was delivered to the LTER Network in January 2013 and quickly acquired a majority of LTER’s data products (> 5,900 as of January 2017). PASTA’s design was patterned on a Service Oriented Architecture to provide scalable data-repository functionality through a ReST-based application programmable interface (API), with primary operations to create, read, update, and delete (often termed CRUD) data packages to and from the repository. In addition, the PASTA development team delivered a browser-based web application for LTER called the Data Portal that gives users a human accessible interface to interact with PASTA. This was followed by an LTER Member Node (MN) in the DataONE federation, which exposes LTER data packages through DataONE’s search and catalog service.
By design, PASTA was LTER-centric. With the advent of EDI, aspects of the PASTA software that were idiomatic to LTER practices were generalized for broader use (or removed completely) into a revised software stack called PASTA+ (https://github.com/PASTAplus), which provides the underlying services for the EDI data repository. In simple terms, the EDI data repository is a “re-branding” of the LTER Network Information System data repository, including the full archive of LTER data packages, and uses the revised PASTA+ software stack. Because PASTA+ is backwards compatible with the previous PASTA API, the LTER Data Portal seamlessly interacts with the PASTA+ API. To promote broader inclusivity, EDI software developers released a generalized version of the LTER Data Portal in late 2016, which also interacts directly with the PASTA+ API. The EDI Data Portal can be used in lieu of the LTER Data Portal to access both LTER and non-LTER data packages. In March 2017, EDI released a new DataONE member node that exposes non-LTER data packages to the DataONE federation. Collectively, the infrastructure of EDI includes the EDI data repository, which uses the PASTA+ software stack, the EDI Data Portal, the EDI DataONE Member Node, the LTER Data Portal, and the LTER DataONE Member Node, in addition to a suite of software tools for information and data management (https://github.com/EDIorg). New features that will be incorporated into the EDI infrastructure, including PASTA+, will be an extended user identification system to allow authentication through applications like OpenID Connect/OAuth 2.0 through providers like Google, ORCID, and GitHub, and improved metadata creation and management tools.