Graduate Research Fellowship Program Pilot Project
The National Science Foundation relied on outdated methods to track research fellowship outcomes.
The National Science Foundation’s (NSF) Graduate Research Fellowship Program (GRFP) helps ensure the quality, vitality, and diversity of the scientific and engineering workforce of the United States. It seeks to broaden participation in science and engineering of underrepresented groups, including women, minorities, persons with disabilities, and veterans.
For decades, the NSF relied on traditional survey methodologies to track the career outcomes of GRFP awardees. While these surveys obtained valuable data on academic experiences and careers, the NSF sought more cost-efficient and accurate methodologies that would take advantage of new data sources and inform future evaluation approaches.
NORC developed innovative software that scraped data from online sources.
NORC had been working with the NSF and already had a database of more than 28,000 STEM graduates. We created a subset of graduates to gather career information, including academic achievements, employment, publications, patents, and grants. NORC developed and refined software to scrape this information from online sources, incorporating an Application Programming Interface (API) that allows software applications to communicate with each other. We used several metrics to verify the accuracy of this new software, including return rate, precision, and accuracy. We also created a dataset gathered by hand and compared that with the scraped information, finding that the results were very similar.
NORC reduced data collection time and improved accuracy.
The software NORC developed reduces data collection time by months and creates accurate results. This program also proved that a data collection strategy employing APIs and public databases could create valid data more efficiently. NORC also showed that assessing scholarly output using automated techniques is feasible and reliable.
By bypassing traditional survey methodologies, this new data collection effort reduces the inherent bias that can be created when some people do not respond to a survey. Using machine learning techniques, the software more accurately determines which author and publication correspond to a specific individual and which do not.
The GRFP pilot project provides a roadmap for other avenues of study, including:
- Sequencing of data collection methods to improve accuracy and efficiency
- Exploring more machine learning techniques to validate records
- Assessing intellectual productivity through advanced models
- Using similar approaches to evaluate other NSF initiatives