Ongoing data – part 1

Introduction
Highlighting the “ongoing” nature of a data package
Data packaging arrangements
Adding new data

Introduction

Remember that all data packages have static (not dynamic) data entities, and so in the case of ongoing time-series, these are simply “snapshots” at points in time. Like any data package with revisions, each time-series addition will be a new revision. Keeping the scope and docid components of the identifier fixed affords some management continuity.
top

Highlighting the “ongoing” nature of a data package

  • Coverage dates: Metadata should reflect the coverage of the data snapshot. DO NOT try to squeeze a word like “ongoing” into a dateTime field to reflect the nature of data collection. Instead, create a descriptive title, as in “Time-series of daily air temperature at site X, ongoing since 1992”.
  • Publication Date:  This is used in constructing a citation, so it’s best if this is updated in the new revision to reflect the date it was revised.
  • Maintenance: EML has elements where you can record the intended update frequency.
  • Specialized interfaces: Often instrument data will have a specialized web interface for query and download of near-real-time data. If so, this link belongs in the (EML field). TO DO: CONFIRM!

top

Data packaging arrangements

  • Continuous: All observations are grouped into a single unit (table), with plans to add data by revising a single entity and updating metadata.
  • Non-continuous: A new packages is created for each logical unit (e.g., a summer sampling season), regardless of similarities or differences in methods.
  • Hybrid: A new entity is created for each logical unit (e.g., year) but the entity is added to an existing package with shared resource-level metadata.

There are advantages and disadvantages to each approach:

Pros Cons Examples
Continuous
(one data entity)
– User will be able to find and download all data at one time – More work for the creators if there are changes, as data are ‘pre-integrated’ by them – knb-lter-mcr.7
– knb-lter-bnz.212
Non-continuous
(new package for each data addition)
– Metadata can be very specific, which can simplify data description where changes between collection events are significant
– The lack of integration by the submitter may reduce the amount of work
– User must find, download and integrate many data packages to create a time series – PISCO instrument data (see DataONE.org)
Hybrid
(multiple data entities in one package)
– Single set of resource-level metadata for all entities
– Users can find all data together
– User must integrate many data entities to create a time series
– To avoid re-uploading all previous data entities along with the new one submitters must use the option “to skip upload if PASTA has a matching entity” (requires a checksum).
– knb-lter-sbc.54
– knb-lter-bnz.398.19

top

Adding new data

Plan to add rows, not columns. In general, you will want to arrange your data so that you can add new rows of data, but not new columns. Adding new columns is technically a ‘redesign’ of the data package (see below). If you’ve planned your package carefully, you can replace the entity (with one containing old+new rows) with only a handful of changes to the metadata. Also see here.

top