News, Resources, Technical

ezEML Templates

Research sites (e.g., LTER sites) and teams of researchers who are using ezEML to capture metadata may find that certain content is used repeatedly across a number of documents. Examples of such repeated content can include Creators, Contacts, Keywords, Intellectual Rights, Geographic Coverage, Project, etc. Users can avoid the tedious task of re-entering this information for each new dataset by creating and publishing one or more “templates” that are prepopulated with this standard content. Since templates exist outside of any individual user’s ezEML account, they are accessible to everyone. Everyone who uses a template will get the current version, which helps alleviate problems arising from different versions residing in different users’ accounts.

Feedback has been positive from researchers who have used ezEML templates when creating metadata for datasets associated with LTER and NEON sites. You can check out template examples by logging in to ezEML and going to “New from Template…” Creating a template is just like creating any ezEML document. Once a template has been populated, it can be sent to support@edirepository.org with an explanation and it will be added to the EDI template library.

News, Resources, Technical

A quick overview of EDI’s Data Explorer (DeX)

The EDI software team is excited to announce DeX, a tool for exploring and subsetting tabular data, which is now in beta testing on the EDI staging Data Portal (https://portal-s.edirepository.org/nis). DeX provides three views into tabular data found in the EDI Data Repository: 1) a statistical profiler that analyzes the data table and displays detailed information about each attribute; 2) a filter and subsetting application that allows you to download the subsetted data, along with a new EML metadata document describing the subset; and 3) a simple-to-use scatter and line plotting application that gives you a visual glimpse into data trends. DeX is currently available on either of our development or staging Data Portals and works with CSV-based data tables (soon to work with a wider set of tabular formats). To see DeX in action, look for a data package in the staging Data Portal containing a CSV data file and click on the “Data Explorer – experimental” link at the end of the data entity record information (see below):

Continue reading “A quick overview of EDI’s Data Explorer (DeX)”
News, Resources, Technical

EDIutils R package update

EDIutils is a client for the Environmental Data Initiative repository REST API and includes functions to search and access existing data, evaluate and upload new data, and assist with related data management tasks (https://github.com/EDIorg/EDIutils).

The package has undergone a major refactor for submission to rOpenSci and CRAN. This new and improved version (0.0.0.9000) covers the full data repository REST API, handles authentication more securely, better matches API call and result syntax, improves documentation, and opens the door for development of wrapper functions to support common data management tasks.

In the process of this refactor the function names and call patterns have changed and several functions supporting other EDI R packages have been removed, thereby creating back compatibility breaking changes with the previous major release (version 1.6.1). The previous version will be available until 2022-06-01 on the “deprecated” branch. Install the previous version with: remotes::install_github(“EDIorg/EDIutils”, ref = “deprecated”)

EDI repository, News, Technical

Normalization of Creator Names in EDI’s Data Portal

The Advanced Search feature of EDI’s Data Portal lets you select a dataset Creator name from a drop-down list of all dataset creators in our repository. The search then displays all the datasets that have that name as one of its creators.

Unfortunately, many creators’ names occur in multiple variations in different datasets and, in the past, each variation appeared separately in the drop-down list. Selecting a variant would return only the datasets for which the creator’s name was spelled as in that particular variant. To give a hypothetical example, the names James T Kirk, James T. Kirk, Jim Kirk, J T Kirk, and J Kirk may all refer to the same person. In such a case, we’d like to display a single, canonical entry (James T Kirk, in this example) and have the search function return the union of all datasets for which that creator’s name appears in one of its variations.

We have recently created web services that implement this kind of names normalization, and the EDI Data Portal is now using the normalized names.

For example, the following names all refer to the same person: “McKnight, Diane M”, “McKnight, Diane”, “Mcknight, Diane”, and “Mcnight, Diane”. The drop-down list now shows just “McKnight, Diane M” and selecting it finds the datasets corresponding to any of the four variants. In addition, accents in names are handled and they now sort in the expected order in the drop-down list. For example, “González, Maria J” corresponds to “González, María J”, “Gonzalez, Maria”, and “Gonzalez, Maria J”. And there are names with misspellings in the surname that are now handled correctly. For example, “Sokol, Eric R” is the same person as “Sokal, Eric”.

You get the idea. If you’re interested in some of the technical details, read on.

Continue reading “Normalization of Creator Names in EDI’s Data Portal”
Technical

Updating schema.org metadata for data packages in the EDI Data Portal to provide rich semantic information that can be utilized by search engines and Google Scholar

The EDI technical team is now updating the schema.org metadata that accompanies every data package landing page on the EDI Data Portal with new recommendations from the ESIP SOSO project (https://github.com/ESIPFed/science-on-schema.org). EDI initially released schema.org metadata for each data package in Fall 2018. The dataset schema.org metadata is encoded as a JSON-LD data structure that is embedded within script tags on the data package metadata landing page. Along with the sitemaps.org metadata that acts as an SEO content table of index, the schema.org metadata provides rich semantic information about the data package that can be utilized by search engines (e.g., Google, Microsoft, Yandex, and even domain specific tools like EarthCube’s Gleaner and DataONE schema.org indexers) and associated applications. For example, data packages that are archived in the EDI data repository are discoverable through Google’s Dataset Search interface (https://bit.ly/3nDhT8j) because of the detailed information provided to Google’s search engine indexer via the schema.org metadata:

Continue reading “Updating schema.org metadata for data packages in the EDI Data Portal to provide rich semantic information that can be utilized by search engines and Google Scholar”