Progress on converting community survey data packages into the ecocomDP data model

EDI is supporting data synthesis and cross site research projects by harmonizing data packages that contain the same attributes, but in various raw data formats and vocabularies. The packages are reformatted to a common data model. For more information see EDI’s website here. Figure 1 shows the general workflow for harmonizing data packages. Archived raw data (level 0 – L0) are converted to a common harmonized data model (level 1 – L1). The L1 data allow for a straightforward data discovery and conversion into derived data products (level 2 – L2).

Figure 1: General workflow for data package harmonization and use of harmonized data packages in Level 2 data products.


This data harmonization framework, developed by EDI, is currently successfully applied to converting data packages for community survey data into the ecocomDP data model. To date, EDI harmonized approximately 70 data packages. The summary metrics of those packages is given in the following table:

Packages’ Summary Characteristics Mean Min Max Median
Temporal Coverage 20 3 17 16
Temporal evenness (interval SD) 1.3 0 10.8
Geographic coverage (km2, > 0) 1.9 x 106 1.4 1.3 x 108 158.6
Taxonomic coverage (without OTUs*) 142 1 1752 48


In addition to allowing easier analysis of data packages in a common design pattern (or data model), the data packages can be easily discovered in the EDI repository as well as by Google’s data search. The raw data packages would have not been easily queried due to the use of different vocabularies and keywords. Figure 2 shows the search results returned by the EDI repository software after querying the keyword “ecocomDP”.

Figure 2: Image of list of data packages (not complete) returned when querying the EDI data repository with the keyword “ecocomDP”.


All of EDI’s featured data contributions