22 March 2006
Propensity Score Matching (PSM) has become an increasingly popular method to estimate treatment effects in observational studies. Most papers that use PSM also provide standard errors for their treatment effect estimates. I always wonder where these standard errors actually come from. To my knowledge there still exists no method to calculate valid standard errors for PSM. What do you all think about this topic?
The issue is this: Getting standard errors for PSM works out nicely when the true propensity score is known. Alberto and Guido have developed a formula that provides principled standard errors when matching is done with covariates or the true propensity score. You can read about it here. This formula is used by their nnmatch matching software in Stata and Jasjeet Sekhon’s matching package in R.
Yet, in observational settings we do not know the true propensity score so we first have to estimate it. Usually people regress the treatment indicator on a couple of covariates using a probit or logit link function. The predicted probabilities from this model are then extracted and taken as the estimated propensity score to be matched on in the second step (some people also match on the linear predictor, which is desirable because it does not tend to cluster so much around 0 and 1).
Unfortunately, the abovementioned formula does not work in the case of matching on the estimated propensity score, because the estimation uncertainty created in the first step is not accounted for. Thus, the confidence bounds on the treatment effect estimates in the second step will most likely not have the correct coverage.
This issue is not easily resolved. Why not just bootstrap the whole two-step procedure? Well, there is evidence to suggest that the bootstrap is likely to fail in the case of PSM. In the closely related problem of deriving standard errors for conventional nearest neighbor matching Guido and Alberto show in a recent paper, that even in the simple case of matching on a single continuous covariate (when the estimator is root-N consistent and asymptotically normally distributed with zero asymptotic bias) the bootstrap does not provide standard errors with correct coverage. This is due to the extreme non-smoothness of nearest neighbor matching which leads the bootstrap variance to diverge from the actual variance.
In the case of PSM the same problem is likely to occur unless estimating the propensity score in the first step makes the matching estimator smooth enough for the bootstrap to work. But this is an open question. At least to my knowledge there exists no Monte Carlo evidence or theoretical justification for why the bootstrap should work here. I would be interested to hear opinions on this issue. It’s a critical question because the bootstrap for PSM is often done in practice, various matching codes (for example pscore or psmatch2 in Stata) do offer bootstrapped standard errors options for matching on the estimated propensity score.
Today at noon, the Applied Statistics Workshop will present a talk by Jeff Gill of the Department of Political Science at the University of California at Davis. Professor Gill received his Ph.D from American University and served on the faculty at Cal Poly and the University of Florida before moving to Davis in 2004. His research focuses on the application of Bayesian methods and statistical computing to substantive questions in political science. He is the organizer for this year's Summer Methods Meeting sponsored by the Society for Political Methodology, which will be held at Davis in July. He will be a visiting professor in the Harvard Government Department during the 2006-2007 academic year.
Professor Gill will present a talk entitled "Elicited Priors for Bayesian Model Specifications in Political Science Research." This talk is based on joint work with Lee Walker, who is currently a visiting scholar at IQSS. The presentation will be at noon on Wednesday, March 22 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided. The abstract of the paper follows on the jump:
We explain how to use elicited priors in Bayesian political science research. These are a form of prior information produced by previous knowledge from structured interviews with subjective area experts who
have little or no concern for the statistical aspects of the project. The purpose is to introduce qualitative and area-specific information into an empirical model in a systematic and organized manner in order to produce parsimonious yet realistic implications. Currently, there is no work in political science that articulates elicited priors in a Bayesian specification. We demonstrate the value of the approach by applying elicited priors to a problem in judicial comparative politics using data and elicitations we collected in Nicaragua.