Keyword Best Practices

August 31, 2018

The goal of adding keywords to a metadata document is to assure that researchers who want to use your data will be able to locate it reliably and efficiently. Adding keywords from a controlled vocabulary means that data can be linked to other similar datasets, greatly adding to its scientific value. Before initiating keywording of datasets you should familiarize yourself with the existing keywords and taxonomys (word trees). They can be viewed at the LTER Controlled Vocabulary. There are additional tools and resources available for the LTER Controlled Vocabulary.

Here are some best practices for key wording your metadata documents:

  • Use the most specific possible keywords. When searching or browsing, higher-level the “parents” or higher-level terms for each keyword are implied, so choosing the most specific “child” term combines the highest level of discoverability with the maximum level of discrimination. For example, rather than choosing “transects” choose the more-specific child-term “vegetation transects. ”
  • Be willing to make reasonable compromises. By its nature keywording requires compromise. Datasets vary widely, but if that uniqueness is fully expressed in the keywords, then searching becomes virtually impossible. Therefore you may need to make reasonable compromises in order to be able to use keywords from the controlled vocabulary. For example, you may have conducted a study on the population ecology of rodents, but when you go to the controlled vocabulary, “rodents” isn’t listed, but “small mammals” is. Rather than simply adding “rodents” as an uncontrolled keyword, use the next best term (“small mammals”) instead. If you want, you can also add “rodents” as an uncontrolled keyword, but be sure to add the nearest keyword from the list as well because uncontrolled keywords don’t show up in browse-type searches.
  • Provide keywords from as many of the different taxonomys (top-level groupings) as possible. Ideally there should be at least one keyword from each of the different taxonomys in the controlled vocabulary. However, there may be some taxonomys that are simply not applicable to a specific dataset and these may be skipped. If a user is browsing down through a taxonomy to locate data, and there is not a keyword from that taxonomy associated with the dataset, it will not be discoverable through browsing. Therefore, using a broad selection of keywords is a good idea.
  • When using keywords not in the Controlled Vocabulary, put them in the proper form. If you really need to use a keyword not already part of the controlled vocabulary, put it in the proper form. The international standard NISO Z39.19 (Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies) has recommendations the form of keywords. For example, nouns are preferred and they should be plural if they are something that is counted, but singular if they are something to which the question “how much” might be reasonably applied. See section 6 of NISO Z39.19 for details.

Keywords that indicate what a dataset contains are likely to be more useful than keywords that indicate scientific topics to which data might be applicable. Put another way, keywords that aid in data discovery are most frequently about what the data “contains” (e.g., air temperature) rather than what the data is “about” (e.g., climate). Although “about” (i.e., subject) keywords can be useful, they are difficult to consistently assign because to some degree they are in the eye of the beholder, and any given data might be usefully applied to many scientific topics, including those not yet identified at the time of collection.