November 2009
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          

Authors' Committee


Matt Blackwell (Gov)


Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries



SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
The Education Wonks
Empirical Legal Studies
Free Exchange
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science



Powered by
Movable Type 4.24-en

« November 3, 2009 | Main | November 11, 2009 »

7 November 2009

Just in time for "Superfreakonomics"

A friend recently pointed me to a 2007 New Republic article in which the author, Noam Scheiber, argues that the "Freakonomics" phenomenon is lamentable because it represents a trend toward research in which clever identification strategies are prized over attempts to answer what Scheiber calls "truly deep questions." Although two years and the publication date of a second Levitt and Dubner book have since passed, the article caught my attention because I have been considering a related issue of late. We are all well aware of how difficult it is to make causal inferences in the social sciences, so it is not surprising that researchers are drawn to settings in which some source of exogenous variation allows for identification of the influence of a specific causal factor. In fact, progress on those "truly deep questions" depends in part on this type of work. However, focus on clean identification has some potentially negative implications. Scheiber names one: answering questions of peripheral interest. A second, which is of greater concern for me, is concentrating on population subgroups that may or may not be of scientific interest in and of themselves and that, in either case, are unable to provide direct insights into broader population dynamics.

Thanks to Imbens and Angrist, we know that even when it is not possible to identify the population average effect of a "treatment" (i.e., causal factor of interest) on a given outcome, it is often possible to identify a "local average treatment effect," that is, the average effect of a treatment for the subpopulation whose treatment status is affected by changes in the exogenous regressor. This subpopulation is composed of so-called "compliers," who will take the treatment when assigned to take it and will not when they are not. Sometimes this subpopulation is of scientific or policy interest (for example, we may be interested in knowing the effect of additional schooling on earnings for those students who might drop out of high school but for compulsory education laws). Oftentimes, it is not. In contrast, the broader population and the portion of the population that receives treatment are almost always of interest. These groups are certainly policy-relevant (it would be misleading to project the effect of a drug on public health based only on the drug's effect amongst those who were induced to take the drug) and they are needed to generate "stylized facts" that help us organize our understanding of the social world. (Also, these groups can be observed whereas compliers are not a generally identified subpopulation.)

Unfortunately, when treatment effects are heterogeneous, the identified local average effect does not provide direct information about the wider population. This is problematic since treatment effects are likely to be heterogeneous in social science applications. In fact, this heterogeneity is one of the reasons why identifying causal effects is so difficult (individuals' self-selection into a treatment status based in part on anticipated treatment effects induces endogeneity problems).

A number of demographers have discussed the problem of extrapolating local average treatment effect estimates to the broader population. Greg Duncan, in his presidential address to the Population Association of America, stated that although causal inference is "often facilitated by eschewing full population representation in favor of an examination of an exceedingly small but strategically selected portion of a general population with the 'right kind' of variation in the key independent variable of interest.... a population-based understanding of causal effects should be our principal goal." Robert Moffitt writes that although "some type of implicit weighting is needed" to help us understand how to trade off internal and external validity, "this problem has not really been addressed in the applied research community." Some researchers have suggested using bounds for average treatment effects that are not point-identified (for example, Manski). Of course, the usefulness of bounding techniques depends on the tightness of the bounds, which in turn depends on what assumptions we are willing to impose - and it is exactly scholars' discomfort with prevailing assumptions (e.g., lack of correlation between the error and the treatment indicator) that drove the current focus on non-representative population subgroups. It seems to me that there is still work to be done to connect subpopulation causal estimates to broader population trends. I would be interested to hear of work in this area that you think is promising.

Posted by Deirdre Bloome at 8:02 PM