Project Name:

Foreign Born Scientists and Engineers in the Workforce

Contractor: NORC at the University of Chicago

Lessons Learned

NORC has compiled two lessons learned for the period January – March 2024:

  1. Additional layers of complexity are added to the process of securely accessing and linking federal data when one of the parties does not have familiarity with or the infrastructure in place to accommodate CIPSEA requirements.
  2. Securely accessing and linking federal data when one of the parties does not have familiarity with or the infrastructure in place to accommodate CIPSEA requirements adds complexity to the data sharing process.

NORC has compiled the following lessons learned for the period April – June 2024:

  1. NCSES has created a repeatable process for accessing restricted data and securing approval to share those data with non-CIPSEA, third parties. Some features of the repeatable process are standardized application and licensing forms. NCSES also has knowledgeable staff available to assist data users in accessing those data.
  2. There were negotiation delays in determining whether data acquisition security provisions were met that led to discussions between project staff and IT security staff at both NORC and NCSES. If organizations could make the relevant terms of their security plans accessible to data partners as part of the discovery phase, this might eliminate or reduce IT security questions later in the agreement negotiation phase.
  3. Implementing the acquisition process with a third party, non-CIPSEA organization, offered an opportunity to explore tiered access tools to encrypt or mask sensitive data. Using privacy-preserving for linkage can aid in the linkage process when sharing direct identifiers proves prohibitive. This project explored using PPRL to link the sources but ultimately decided on another approach given the limited variables available for linkage and some data quality concerns. This work supports the need to assess the data quality and availability of linkage variables prior to choosing the linkage method.
  4. NORC implemented an approach to link NCSES data with data from Wellspring. There is a lack of industry standards on validating the quality of the match back to the original data after linkage. The federal statistical community could develop and approve a set of standards for conducting and evaluating the quality of linkages, depending on the linkage approach and the identifying variables used.
  • For-profit organizations often collect rich data but do not frequently support research and value protecting their intellectual property over the value of supporting research activities. In addition, these organizations are not accustomed to documenting data in ways and methods that researchers expect, creating additional complications.
  • For-profit data organizations may not understand data security and privacy requirements that accompany working with restricted access federal data, even if they have previously worked with federal clients. Offering clear guidance on security and privacy requirements at the outset of collaboration is important.
  • Non-traditional data can provide a crucial source of information that can and should be used to supplement traditional survey data on the science and engineering enterprise. To that end, natural language processing (NLP) methods may support record linkages with these non-traditional data sources.

Disclaimer: America’s DataHub Consortium (ADC), a public-private partnership, implements research opportunities that support the strategic objectives of the National Center for Science and Engineering Statistics (NCSES) within the U.S. National Science Foundation (NSF). These results document research funded through ADC and is being shared to inform interested parties of ongoing activities and to encourage further discussion. Any opinions, findings, conclusions, or recommendations expressed above do not necessarily reflect the views of NCSES or NSF. Please send questions to ncsesweb@nsf.gov.