Many surveys depend on the household as the primary unit for sample selection. However, sample frames for households are not generally available and can be expensive to create. The household addresses obtained from the United States Postal Service (USPS) information does not yet provide a representative listing of U.S. households because certain types of households are under-represented. In order to provide a sample frame for use by many surveys, NORC at the University of Chicago constructed a sample frame of households which is representative of over 99% of U.S. households.
The 2010 National Frame used a two stage probability sample design to select a representative sample of households in the United States. At the first stage, the sampling unit is a National Frame Area (NFA), defined using the 2010 Census areas, where each NFA contains a population of at least 10,000. Census areas with a population less than 10,000 were combined with the closest neighboring non-certainty county or statistical area to define the NFAs. The largest areas, areas with a population of at least 1,543,728 (0.5 percent of the 2010 Census U.S. population) were selected with certainty; these areas have a high population density, and are dominated by tracts with street-style addresses. These areas contain 56% of the population within 8% of the geographic area of the U.S. The remaining areas were stratified into areas where street-style addresses predominate and the remaining areas which are less likely to have street -style addresses. The latter stratum (‘rural’ areas) comprises 81% of the geographic area and contains 14% of the population.
The idea was that areas indicated by Census to have street-style addresses were likely to be well-covered by the USPS Delivery Sequence File (DSF). The 2010 National Frame is representative of only 99% of U.S. households because it was decided that extremely isolated areas would be excluded from the population. Specifically, areas of 5,000 square miles or more with less than 0.180 housing units per square mile were deemed cost prohibitive to list. This resulted in the removal of the most remote and sparsely populated areas in Alaska, representing 0.03 percent of the U.S. population (12.9 percent of Alaska’s population) while eliminating 13.7 percent of the United States by area (84.8 percent of Alaska’s area). Within the selected National Frame Areas, the second stage sampling unit is a segment, defined either in terms of Census tracts or block groups, containing at least 300 housing units according to the 2010 Census. A probability sample of 1,514 segments was selected.
In both stages of sampling, the sample was selected using stratified sampling procedures with probability proportional to size. The primary stratification of Census areas was by urban density. Additional sample control was provided by using implicit stratification, i.e. using systematic sampling over a file sorted by important characteristics such as geography and median income level. For each of the 1,514 selected areas, the ratio of the DSF address count to the Census count of occupied housing units was used to determine whether or not the DSF address data were adequate for this segment or whether the area would be listed by field personnel. The result was that 123 out of the 1,514 selected segments were listed by field personnel.
The 2010 National Sample Frame project resulted in a large representative sample of U.S. households that can be used for many different surveys. The National Sample Frame contains almost 3 million households, including over 80,000 rural households not available from the DSF but identified by direct listing by field staff. The sample includes households in 46 states and the District of Columbia. Sample weights are included, constructed as the inverse probability of selection, so that the sample frame can be used to draw representative samples of U.S. households.