Data Package Design for meteorological and hydrological data

A first step on harmonizing meteorological and hydrological data in the EDI data repository was a:

Workshop on “Next generation climate/hydrological data products” (March 12-14, 2019) at the University of New Mexico in Albuquerque, NM

The workshop was jointly organized by EDI, LTER and the Forest Service with the goal of developing a strategy for harmonizing weather, climate and hydrological data that are currently located in the EDI data repository and in the ClimDB/HydroDB (a centralized server to provide open access to long-term meteorological and streamflow records from a collection of research sites).

group_photo_13March2019
Participants of the workshop on meteorological and hydrological data harmonization at UNM in Albuquerque (March 12-14, 2019)

The necessity arose from the fact that the ClimDB/HydroDB software is aging and too difficult to maintain. Also, available data are limited in terms of parameters and time resolution. The group decided to archive all data currently residing in ClimDB/HydroDB in the EDI data repository, to convert (harmonize) the data into a common data model and continue the ClimDB/HydroDB functionality into the future with other software products that support visualization, filtering and analysis of the data packages.

A data harmonization framework, developed by EDI will be used that is currently successfully applied to designing data packages for community survey data: ecocomDP. Figure 1 shows a schematic of the concept. Archived raw data (level 0 – L0) are converted to a common harmonized data model (level 1 – L1). The L1 data allow for a straightforward data discovery and conversion into derived data products (level 2 – L2) in support of synthesis and other cross site studies.

harmonization_procedure_general
Figure 1: General data package harmonization workflow

A number of data models commonly used in the research community for harmonizing meteorological and hydrological data were reviewed and discussed. The group suggested to evaluate the ODM data model for time series data as the L1 data model. The ODM was developed and is widely used by the Consortium of Universities for the Advancement of Hydrologic Science (CUHASI).  CUAHSI is a data platform with a workspace that provides tools for visualization, analysis and might provide some of the ClimDB/HydroDB plotting functionality. The groups intent is to develop other software products for visualization through online hackathons.

harmonization_procedure_odm
Figure 2: Harmonization of meterological and hydrological raw data using the CUAHSI/ODM data model. This allows using tools available through  the CUAHSI workspace.

A first draft of a workflow was designed for converting all ClimDB/HydroDB products (L0) as well as meteorological and hydrological data in the EDI repository (raw L0) archive those in the EDI data repository as L1 data packages. If the ODM data model is adopted, the data packages will also be available in CUAHSI, hopefully with comparable functionality to ClimDB/HydroDB (see figure 2 for the conceptual workflow).

Wade Sheldon demonstrated how the GCE Data Toolbox can be applied for the conversion of L0 data packages to the L1 data model. Margaret O’Brien led the discussion on semantic mappings between important terms in different vocabularies used for archiving meteorological and hydrological data (ClimDB/HydroDB, LTER, AMS, ENVO, CF, EnvThes, ODM, NCEI), in order to pick a candidate vocabulary for the L1 data model (initially CUAHSI ODM vocabulary). Vocabularies are important for defining suitable keywords at the data package level and thereby enhance data discoverability in the EDI repository and via Google’s data search.

The results of the workshop were presented at an LTER IM Water Cooler on April 9, 2019.

Next steps

  • Explore the ODM data model and CUAHSI functionality regarding its suitability for LTER and EDI meteorological and hydrological data products.
    • Contact CUAHSI regarding services & recommendations for our data products.
    • Plan webinar for introducing example data in CUAHSI and the functionality of the CUAHSI workspace.
    • Organize ESIP workshop with members of CUAHSI and Information Managers.
    • Provide examples of converting site L0 raw and ClimDB/HydroDB data to CUAHSI/ODM standard (L1)
    • Develop workflow and best practices for data conversion from L0 to L1 using the GCE toolbox and R/Python.
  • Establish workflow and best practices documentation in EDI’s gitHub space for:
    • Available tools for converting data from raw (L0) to ODM (L1) data model.
    • Develop examples of how to access/extract data in CUAHSI via R or API interface, for one parameter across all LTER sites (maintaining ClimDB/HydroDB functionality).
    • Vocabulary: mapping between LTER and EML site parameter names and ODM controlled vocabulary.
  • Brainstorming about possible L2 products (data products for education and outreach, NCO synthesis working groups, time averages and spatial aggregates, input for great “L50” products like waterviz.org, or smartforests.org.
  • Report activities to LTER Science Council in May 2019
  • Discuss at 2019 IMC meeting in Tacoma
    • Determine timeline of ClimDB/HydroDB retirement and archiving in EDI.
    • Set deadline for LTER sites’ updates to ClimDB/HydroDB.
  • Online hackathon with LTER Information Managers to fill out ODM tables