Five phases of data publishing (2)

 
OVERVIEW 1. ORGANIZE 2. CLEAN 3. DESCRIBE 4. UPLOAD 5. CITE

Phase 2: Format and QC tabular data

Although we accept most file formats, many ecological and environmental data are arranged in tables. We recommend that you use comma or tab delimited ASCII text files for tabular data.

Below are basic rules for preparing tabular data for archive as well as a list with resources such as publications on preparing data tables for archiving:

  1. “Consistent data organization”
    • For multi-year observations, we strongly encourage you to compile your tabular data into a single file.
    • You may be planning to submit your data to us in several tables (e.g., organized by year). If so, each table must have the same structure; that is, the attributes must have the same order and identical names in all the tables so we can write code to process your data into a single file.
    • Use a “Long” rather than “Wide” format for multi-year data.
    • Run quality control checks on your data to ensure they are ready for publication.
    • Keep track of these steps so others know what has been done to these data.
  2. “Consistent formatting”
  3. Be careful of character formatting (e.g. superscript) or symbols (e.g. degree, accent marks, smart quotes) within the data table. Even in fields typed as “character” these may produce unintelligible characters during conversion, or if emailed.
  4. Specify (in the metadata) the code you use for missing values in your tables. We recommend that missing fields (values) in data are NOT left blank.Software interprets fields with a missing value code before ingesting the data table. Multiple missing values are allowed in one column. You will need to specify a definition for each missing value code, e.g.,
    • “NA” = not collected
    • “trace” = trace amount (e.g., instead of  “< .02” for a nitrate value)
    • “-99999” = not available (some researchers prefer to keep their missing values of the same type as the data)

Resources

  • Publications on preparing data tables for archiving:
    • Cook, Robert B., et al. “Best Practices for Preparing Ecological Data Sets to Share and Archive.” Bulletin of the Ecological Society of America, vol. 82, no. 2, 2001, pp. 138–141. JSTOR, www.jstor.org/stable/20168543. Accessed 2 June 2021.
    • Campbell, J.L., Rustad, L.E., Porter, J.H., Taylor, J.R., Dereszynski, E.W., Shanley, J.B., Gries, C., Henshaw, D.L., Martin, M.E., Sheldon, W.M. and Boose, E.R., 2013. Quantity is nothing without quality: Automated QA/QC for streaming environmental sensor data. BioScience, 63(7), pp.574-585, DOI: 10.1525/bio.2013.63.7.10 .
    • Karl W. Broman & Kara H. Woo (2018) Data Organization in Spreadsheets. The American Statistician, 72:1, 2-10, DOI: 10.1080/00031305.2017.1375989.
  • Ecology Workshop lessons by the Carpentries (teaches data cleaning, management, analysis, and visualization of tabular data).
  • R and Python lessons by the Carpentries (teaches introduction to programming in R and Python).
  • Video on “Creating clean data for archiving”