Ongoing data – Part 2

Introduction
Issues
“Capping” off
Indexing

Introduction

Remember that all data packages have static data entities, and so in the case of ongoing time-series, these are simply “snapshots”. The primary strategy is to ‘add rows, not columns’, to keep updates of the package simple (ie, very few changes to the package metadata). But things happen; e.g., collection protocols change, measurements are added. Below are guidelines for common occurrences with ongoing data collections.
top

Issues

They made some big changes to the [methods|table|measurements]! What do I do?

Issue: Supposedly, this is a new version of one of our time series data products but it doesn’t look anything like I already have.

  • Solution: Confirm that you have the right data entity. Send the lab a link to the dataset this is supposed to update. Ask why the change, and take steps to stabilize the formats of data entities that are intended for sharing.
  • Example: in 2018, SBC completely changed the modeling algorithm for predicting kelp net primary production to include previously unresolved sources of biomass loss. So we redesigned the data package for those data. The redesigned data package was so different, that we did not want to use the same identifier. Instead, we capped off the old dataset and started a new one (see below, “Capping off”). We also requested that the old dataset be removed from the index (see below, “indexing”).
    • Capped off data package: knb-lter-sbc.21.18
    • replaced by: knb-lter-sbc.112

Issue: They moved the instruments to a new location, that ostensibly represents the same region.

  • Solution: Ask why the move, and find out if data should be considered continuous. In the example, we started a new dataset to make it clear that any integration was up to a user.
  • Examples:
    • pre-move: knb-lter-sbc.2001
    • post-move: knb-lter-sbc.2002 (most recent revisions)

Issue: They added a new measurement to the suite.

  • Solution: Add a new column for the new measurement(s). It’s up to you where to put it. If your system (and the lab’s) makes it easy to keep similar measurement together in a table, do so, because users will appreciate it. If not, put it at the end.
  • Examples:
    • 23 columns: knb-lter-sbc.50.6
    • 24 columns: knb-lter-sbc.50.7

top

“Capping” off

Ongoing collections eventually end! Projects finish, the data have told you all it can, etc. You may not know for several years beyond the most recent update that there will be no more new data. If you have a categorized inventory, a good label for these is “Completed time-series”.

Guidelines:

If you followed the practices above to describe a dataset as “ongoing”, you should update it one more time and make it clear that no more data are expected.

  1. Edit the title.
  2. The temporalCoverage tags will already cover the snapshot, so there should be nothing to do (obviously, if there is one more data addition, make coverage match it).
  3. pubDate: It may make sense to leave the pubDate alone (at the date of the last data addition), rather than to this revision. That will create coherence between metadata and data.
  4. If you used maintenance tags, enter info stating that no more updates are expected.
  5. Check the methods. If you have detailed text descriptions, you may want to make these past tense.

top

Indexing

All data package’s have metadata automatically indexed by PASTA for discovery. If there is a reason to remove a dataset from the index, contact support@edirepository.org. A data package that has been removed from the index:

  • Still has a DOI, so it can still be cited and it’s landing page displayed.
  • Will not show up in searches.

top

Attribution

This material was adapted by EDI from:

top