19 February 2009
The Annual Social and Economic Supplement (ASEC) to the Current Population Survey (CPS) is among the most widely used and influential data sets in the social sciences and in policymaking. For example, the much-cited figure of 45 million uninsured is a CPS estimate; Title I education funding is allocated using the CPS; and state outlays for the State Children's Health Insurance Program are also determined using the survey.
From the perspective of the social scientist, the CPS is a key research tool because of its large sample size (roughly 60,000 households) and because it is is typically released publicly about 5-6 months after the survey is initially fielded. However, one major drawback is that, unlike other major national surveys (the SIPP, the MEPS, and the NHIS to name a few), the public release of the CPS data does not include variables that must be used to get the correct standard errors for the complex survey design. Rather, the CPS releases a series of adjustment factors for specific population subgroups (e.g. by race, income group, state, etc.) that can be applied to uncertainty estimates. However, this approach is obviously problematic in the case of regression -- which adjustment factors does one use if the regression contains a rich array of covariates? As a result, much research using the CPS (which appears quite often in economics and health services research journals) proceeds either under the assumption of simple random sampling, or using robust standard errors. These studies therefore likely have understated uncertainty estimates, casting some doubt on the conclusions of this work.
So what is the applied researcher to do? One simple method of approximation (suggested to me once by Alan Zaslavsky) is to exploit the fact that the CPS uses monthly rotation groups that effectively replicate the CPS survey design. That is, one could produce separate estimates for each monthly rotation group and combine these estimates to come up with an estimate of the uncertainty from the survey design.
An alternative method (described in Davern, et. al Inquiry 43 (3) 2006), is to construct synthetic stratum and primary sampling unit (PSU) variables using available information in the survey (e.g. metropolitan statistical area, state, and household identifiers). In the above article, the authors compared this synthetic method to the internal census files (which obviously do have the complex survey design variables) and computed the ratio of the synthetic method to the standard error from the internal census file. In general, the ratios were on the order of 0.75 to 0.85, bringing the uncertainty estimates closer to the internal estimates than the ratios of about 0.5-0.6 they found under the assumption of simple random sampling (i.e. making no adjustment for survey design) and using robust standard errors.