Project Name:

Federated Data Usage Platform

Contractor: Mathematica, Inc.

Lessons Learned

The following are the lessons learned for this quarter and the proposed action to address the lesson learned.

  • Data usage statistics help to understand data that are available and used by a wide variety of data users.
  • Through initial stakeholder engagement, initial findings show:
    • Agencies and data users want to measure impact of the data usage as much as they do the count of uses.
    • Federal agencies can leverage a Data Usage Platform (DUP) to analyze performance of their publicly shared data against other similar agencies.
    • Non-statistical federal agencies have interest using a DUP to measure impact of their publicly and non-publicly available data in published research and through media mentions.
    • Academic researchers can utilize the DUP to identify other research of interest, and seek datasets used in those citations.
  • There are challenges in tracking federal data through research publications alone, since many data assets do not have standardized citation parameters and/or data is shared on multiple platforms or only mentioned in gray literature.

The following are the summary of lessons learned for this quarter.

  • Perspectives of Federal IT staff and post-secondary students are important to inform the eventual implementation of the DUP and intended use.
  • Non-statistical federal agencies have interest for using a DUP to measure impact of their publicly and non-publicly available data in published research and through media mentions.
  • Outreach supported the use case for academic researchers utilizing the DUP to identify other research of interest, and seek datasets used in those citations.
  • There are challenges in tracking federal data through research publications alone, since a lot of the data assets do not have standardized citation parameters and/or data is shared on multiple platforms or only mentioned in gray literature.
  • Agencies and data users want to measure impact of the data usage as much as they do the count of uses.
  • ORCID is primarily a Digital Object Identifier (DOI) for disambiguating researchers. There is a US Government ORCID Consortium that the Dept of Energy (DOE) Office of Scientific and Technical Information (OSTI) launched in April 2020 bringing together US government and DOE-affiliated organizations looking to use, adopt, and integrate with ORCID. The National labs primarily make up consortium membership as it is primarily targeted for research. More info can be found here – https://www.osti.gov/pids/orcid-services/us-gov-orcid-consortium
    1. With respect to the DUP, ORCID can be used as part of the metadata schema when publishing research data or when researchers apply to the Standard Application Process Portal (SAP) to help with screening. For uniquely identifying datasets, another DOI is required.
    2. “DOIs are a foundational requirement to unambiguously identify and access resources” and is the top principles for Findable and Accessible resources and it is also a key to tracking data usage:
      1. (Meta)data are assigned a globally unique and persistent identifier.
      2. (Meta)data are retrievable by their identifier using a standardized communication protocol.

Disclaimer: America’s DataHub Consortium (ADC), a public-private partnership, implements research opportunities that support the strategic objectives of the National Center for Science and Engineering Statistics (NCSES) within the U.S. National Science Foundation (NSF). These results document research funded through ADC and is being shared to inform interested parties of ongoing activities and to encourage further discussion. Any opinions, findings, conclusions, or recommendations expressed above do not necessarily reflect the views of NCSES or NSF. Please send questions to ncsesweb@nsf.gov.