Data Linkage

Survey data is invaluable for understanding the incidence of social and economic factors and understanding the relationships among them. Nevertheless, there can be significant limitation in the ability to use these data to analyze policy-related questions. Among these are:

  • The cost of fielding a survey
  • Nonresponse rates (which have increased substantially over the last several decades)
  • Measurement error – respondents are not always able to accurately answer the questions posed to them
  • Estimability issues associated with a fixed sample size – it may not be possible to make valid statistical estimates for small areas
  • Lack of necessary detail - It is difficult for surveys to collect detailed data such as claims histories

To a large degree, these issues can be addressed by the introduction of administrative record data into the analysis set—but here, the ability to link data from multiple sources requires specialized expertise as we have at NORC.


NORC has the experience to locate and access data from multiple sources in order to support robust statistical analyses:

  • In-house surveys
  • Public use survey files
    • NORC has the knowledge to identify appropriate public-use data files and understand the specific strengths and shortcoming of each including specification of sampling and measurement error.
  • Micro-data records
    • Administrative – Federal and state government
  • Proprietary data
    • Credit bureau
    • Data broker
  • Health Records
    • Enrollment records
    • Claims records
  • Registers of business establishment, providers, organizations
  • Coding lists – as ZIP Code to county or NAICS to SIC

NORC has expertise and advanced tools to develop functional analysis files that integrate the data from these multiple sources

  • Database Design – effectively structuring data coming from multiple sources so that it can be readily used to flexibly answer multiple research questions
  • Record Linkage (also referred to as probabilistic record linkage or entity resolution) is a technique that allows for matching in the case of incomplete, erroneous, or varying identification fields. It describes the joining of records putatively representing the same entity: person, organization, address, or something else. It is highly useful for enhancing the value of survey and administrative data files and the analysis enabled by them, by allowing integration of data elements available only from multiple sources. With declining survey response rates, its importance for data and policy analysis is increasing.

Statistical Matching: It is often the case that direct record linkage is not feasible either because of the lack of identifying elements (as, names, addresses, etc.) or the small number of individuals (or other entities) who overlap from the multiple files. In these cases, it is often useful to perform statistical linkages such that the records brought together, while not the same person (or entity), are similar enough to use as an analysis