Data Package Design for meteorological and hydrological data

Past activities on harmonizing meteorological and hydrological data in the EDI repository

Detailed information on activities

A first step on harmonizing meteorological and hydrological data in the EDI data repository was a workshop on the “Next generation climate/hydrological data products” (March 12-14, 2019) at the University of New Mexico in Albuquerque, NM. The workshop was jointly organized by EDI, LTER and the Forest Service with the goal of developing a strategy for harmonizing weather, climate and hydrological data that are currently located in the EDI data repository and in the ClimDB/HydroDB (a centralized server to provide open access to long-term meteorological and streamflow records from a collection of research sites).

Participants of the workshop on meteorological and hydrological data harmonization at UNM in Albuquerque (March 12-14, 2019)

The necessity arose from the fact that the ClimDB/HydroDB software is aging and too difficult to maintain. Also, available data are limited in terms of parameters and time resolution. The group decided to archive all data currently residing in ClimDB/HydroDB in the EDI data repository, to convert (harmonize) the data into a common data model and continue the ClimDB/HydroDB functionality into the future with other software products that support visualization, filtering and analysis of the data packages.

A data harmonization framework, developed by EDI will be used that is currently successfully applied to designing data packages for community survey data: ecocomDP. Figure 1 shows a schematic of the concept. Archived raw data (level 0 – L0) are converted to a common harmonized data model (level 1 – L1). The L1 data allow for a straightforward data discovery and conversion into derived data products (level 2 – L2) in support of synthesis and other cross site studies.

Figure 1: General data package harmonization workflow

A number of data models commonly used in the research community for harmonizing meteorological and hydrological data were reviewed and discussed. The group suggested to evaluate the ODM data model for time series data as the L1 data model. The ODM was developed and is widely used by the Consortium of Universities for the Advancement of Hydrologic Science (CUHASI).  CUAHSI is a data platform with a workspace that provides tools for visualization, analysis and might provide some of the ClimDB/HydroDB plotting functionality. The groups intent is to develop other software products for visualization through online hackathons.

Figure 2: Harmonization of meterological and hydrological raw data using the CUAHSI/ODM data model. This allows using tools available through  the CUAHSI workspace.

A first draft of a workflow was designed for converting all ClimDB/HydroDB products (L0) as well as meteorological and hydrological data in the EDI repository (raw L0) archive those in the EDI data repository as L1 data packages. If the ODM data model is adopted, the data packages will also be available in CUAHSI, hopefully with comparable functionality to ClimDB/HydroDB (see figure 2 for the conceptual workflow).

Wade Sheldon demonstrated how the GCE Data Toolbox can be applied for the conversion of L0 data packages to the L1 data model. Margaret O’Brien led the discussion on semantic mappings between important terms in different vocabularies used for archiving meteorological and hydrological data (ClimDB/HydroDB, LTER, AMS, ENVO, CF, EnvThes, ODM, NCEI), in order to pick a candidate vocabulary for the L1 data model (initially CUAHSI ODM vocabulary). Vocabularies are important for defining suitable keywords at the data package level and thereby enhance data discoverability in the EDI repository and via Google’s data search.

Past activities

For more information visit the LTER GitHub repository on the topic.