EDI is supporting data synthesis and cross site research projects by harmonizing data packages that contain the same attributes, but in various raw data formats and vocabularies. The packages are reformatted to a common data model. For more information see EDI’s website here. Figure 1 shows the general workflow for harmonizing data packages. Archived raw data (level 0 – L0) are converted to a common harmonized data model (level 1 – L1). The L1 data allow for a straightforward data discovery and conversion into derived data products (level 2 – L2).
This data harmonization framework, developed by EDI, is currently successfully applied to converting data packages for community survey data into the ecocomDP data model. To date, EDI harmonized approximately 70 data packages. The summary metrics of those packages is given in the following table:
|Packages’ Summary Characteristics||Mean||Min||Max||Median|
|Temporal evenness (interval SD)||1.3||0||10.8||–|
|Geographic coverage (km2, > 0)||1.9 x 106||1.4||1.3 x 108||158.6|
|Taxonomic coverage (without OTUs*)||142||1||1752||48|
In addition to allowing easier analysis of data packages in a common design pattern (or data model), the data packages can be easily discovered in the EDI repository as well as by Google’s data search. The raw data packages would have not been easily queried due to the use of different vocabularies and keywords. Figure 2 shows the search results returned by the EDI repository software after querying the keyword “ecocomDP”.