Guest Post by Patrick Lam on "Estimating Individual Causal Effects"

April 16, 2013

Last week, I gave the applied statistics talk at IQSS on some of my research on estimating individual causal effects. Since there was some interest from folks who could not attend, I thought I would give a brief overview of my argument and research.

In the majority of empirical research, the quantity of interest is likely to be some type of average treatment effect, either through a regression model or some other clever research design. For example, we often run a regression of an outcome Y on some treatment W and covariates X and interpret the beta coefficient on W as the "effect" of W on Y given assumptions of ignorability of treatment assignment and no interference across units. While this average treatment effect ATE (and its fancier cousins ATT, ATC, CATE, LATE, etc.) is the easiest causal quantity to estimate, I argue that an ATE is not a very useful or interpretable quantity. Define an individual causal effect (ICE) as Y_i(1) - Y_i(0) for any individual i. An ATE is simply the average of all the individual causal effects in the data or in some larger population: E[Y(1) - Y(0)]. An ATE is not the effect for any specific individual or groups of individuals. It is not even the effect for the average individual. However, implicitly we often have a tendency to attribute the ATE as THE EFFECT for any individual, which is only true if we make the usually unreasonable assumption of constant treatment effects. In short, the ATE is a one-number summary that applies to exactly no individual of interest.

To see this in a trivial and simple example, suppose we have a female birth control pill that in reality prevents pregnancy for every woman that takes it. Now suppose that we didn't know that, but we wanted to test how effective the pill was. So we randomly assign the pills to a evenly distributed sample of men and women. Our results would suggest that the pill was effective in preventing pregnancy approximately 50% of the time. We would then conclude based on the data that the pill is only effective half the time and thus is basically useless as a contraceptive. However, it is trivially obvious that the 50% result is derived from a 100% success rate for women and a 0% success rate for men. The 50% result is not the success rate for any individual and estimating the ATE masked important treatment effect heterogeneity.

One way to account for this heterogeneity is by estimating the conditional average treatment effect (CATE). In this example, we would condition on gender and estimate an average treatment effect for men and one for women. This requires leveraging additional information and defining a variable to condition upon. This is a top down approach in which we subset the data in some way and then estimate an ATE. Of course the example here is trivial, but in most empirical research, it may not be obvious which variables to condition on. Furthermore, the CATE still assumes a constant treatment effect for all individuals within the same covariate strata.

I argue for a different bottom-up approach in which we try to estimate each of the individual causal effects directly. The benefits of directly estimating the ICEs are that

1) they directly estimate the actual quantities of interest, such as an effect for a certain individual or groups of individuals.
2) they allow for discovery of treatment effect heterogeneity through graphical and exploratory approaches.
3) they bridge the gap between quantitative and qualitative research by allowing for small n estimands in a large n framework
4) any other causal quantity such as any ATE can be calculated directly from the ICEs, so estimating ICEs is a more flexible approach.

Of course, the main problem with estimating ICEs is that they are not identified in the data, so the data strictly speaking gives no information about the likelihood of any particular value for any ICE.

To estimate the ICEs, I present introduce a broad framework that leverages the usual causal inference assumptions of treatment assignment ignorability and SUTVA and use existing matching methods coupled with a Bayesian framework to give hints and uncertainty intervals for the ICEs. The Bayesian approach allows for prior qualitative information to be incorporated and also sidesteps the identification issue by defining a posterior over the ICEs. None of the methods used are new, and many date back several decades. But I argue that we can put these existing methods together in a novel way to estimate quantities which are much more important and relevant to researchers.

The basic idea of the estimation process is to impute the missing potential outcomes for each individual. Once the outcomes are imputed, then the ICEs can be calculated in a straightforward manner. The matching algorithms define pools of observations that we can use to help with the imputation and the Bayesian framework gives us uncertainty for the imputations that incorporates uncertainty in both the matching algorithms and the normal estimation uncertainty. The idea of Bayesian imputation of missing potential outcomes dates back at least to Rubin (1978) and Don actually has told me a few times that the imputation idea in general dates by much longer than that, at least back to Neyman. The matching idea and the algorithms used also date back at least to Don's work in the 1970s.

In my talk, I introduced a (hopefully coherent) framework that laid out the assumptions and a model to estimate the ICEs. I also conducted many simulations to test the ability of the model to recover ICEs and also tested several matching specifications. The results suggest that the model actually does a fairly good job of recovering the ICEs although the uncertainty intervals can be quite wide. Nevertheless, they give us hints about plausible ranges of values for the ICEs and aggregating the ICEs to estimate average effects produce nearly identical results to traditional methods. One noteworthy conclusion from the simulations is that the use of regression imputation in which we impute with the predicted values from a regular linear regression generally produces good average results but very poor calibration for individual results. Therefore, one takeaway is that we can use ICEs to estimate both individual and average estimands, but we can only estimate average estimands with ATEs with any accuracy and attempts to get at individual level estimates through ATEs are likely to be incorrect. The last part of my talk uses an existing example from economics and politics on monitoring corruption to demonstrate the flexibility of the approach. I adapt ICE estimation to both binary and continuous treatments and one-stage and two-stage IV type approaches.

For more information and copies of the presentation slides and a rough draft a paper describing the general model and framework, please see http://www.patricklam.org/research.html.

Posted by Konstantin Kashin at April 15, 2013 10:54 PM