December 2006
Sun Mon Tue Wed Thu Fri Sat
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31

### Authors' Committee

#### Chair:

Matt Blackwell (Gov)

#### Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

### Blogroll

SMR Blog
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

### Archives

Movable Type 4.24-en

19 December 2006

### Winter Break

It's that time of year again; there is snow on the ground, a fire in the hearth, classes at Harvard end today, and things at IQSS are settling down for a brief winter's nap (at least that is the way I imagine it - the fluorescent lights in my office ensure that seasons have no meaning), so posts to the blog will be irregular for the next couple of weeks. We'll be back to our regularly scheduled programming in early January, but in the meantime, Happy New Year!

Posted by Mike Kellermann at 3:59 PM

15 December 2006

### Causal inference, moral hazard, and the challenges of social science

News that two clinical trials in Africa have been halted because the preliminary results were so strong that it was considered unethical to continue them has received major play in the media (New York Times, Guardian, Washington Post). The reason: the experimental treatment was male circumcision and the outcome of interest was the risk of female-to-male transmission of HIV. This is a topic that has been discussed previously in the Applied Statistics workshop (see posts here and here). The two studies suggest that circumcision reduces the probability of transmission by about 50%, which is similar to an earlier randomized trial in South Africa (and, it should be noted, the estimated effect is also consistent with the results from a number of observational studies, see this NIAID FAQ for more details on the studies). In short, the evidence seems overwhelming at this point that, from a biomedical perspective, circumcision is effective at reducing transmission.

Is the same true from a policy perspective? In other words, would a policy promoting circumcision reduce the number of new HIV cases? The answer to that question is much less obvious, the concern being that the men who were circumcised would engage in riskier behavior given their newfound knowledge. This is a classic moral hazard problem; the people implementing the policy cannot control the actions taken by the treated individuals. Indeed, the researchers behind the study were falling all over themselves to emphasize the need for continued prevention measures. Despite this, it seems likely to me that one of the effects of the study (as opposed to the effect of the treatment) is going to be an increase in HIV transmission, at least at the margin, among the male subpopulation that is already circumcised.

This study thus highlights a couple of issues that face us as social science. First, the scientific quantity of interest (does circumcision reduce the risk of HIV transmission) need not be, and often isn't, the policy quantity of interest (will circumcision reduce the number of new HIV cases). Second, unlike our colleagues in the natural sciences, we do have to worry that the behavior of our subjects (broadly defined) will be influenced by the results of our research. A biologist doesn't have to worry that the dolphins she is studying are reading Marine Mammal Science (although to the extent that the modal number of times that a political science article is cited is still zero, we may not have to worry about our subjects reading the results of our research either!). From my perspective, the possibility of feedback - that behavior will change in response to research, in ways that could either reinforce or mitigate the conclusions that we draw - is one of the key characteristics that distinguish the social sciences from the natural sciences, a distinction that seems underappreciated and that makes our jobs as researchers substantially harder.

Posted by Mike Kellermann at 10:21 AM

14 December 2006

### Misunderstandings among Experimentalists and Observationalists

We had several discussions a while ago on this blog about balance test fallacies, and an early version of a paper on the subject that Kosuke Imai, Liz Stuart and I wrote. Kosuke, Liz, and I also had a number of interesting discussions with people in several other fields about this topic, and we've found much confusion about the benefits of the key portions of the major research designs. Observationalists seem to have experiment-envy, which is in at least some cases unwarrented, and experimentalists have their own related issues too. To sort these issues out (largely or at least at first for ourselves), we have now written a new paper that tries to clarify these issues and also incorporates the points from the previous paper (material from the previous paper is the last few pages of this one). We'd be very grateful for any comments anyone might have.

"Misunderstandings among Experimentalists and Observationalists: Balance Test Fallacies in Causal Inference" by Kosuke Imai, Gary King, and Elizabeth Stuart.

Abstract

We attempt to clarify, and show how to avoid, several fallacies of causal inference in experimental and observational studies. These fallacies concern hypothesis tests for covariate balance between the treated and control groups, and the consequences of using randomization, blocking before randomization, and matching after treatment assignment to achieve balance. Applied researchers in a wide range of scientific disciplines seem to fall prey to one or more of these fallacies. To clarify these points, we derive a new three-part decomposition of the potential estimation errors in making causal inferences. We then show how this decomposition can help scholars from different experimental and observational research traditions better understand each other's inferential problems and attempted solutions. We illustrate with a discussion of the misleading conclusions researchers produce when using hypothesis tests to check for balance in experiments and observational studies.

Posted by Gary King at 9:26 AM

13 December 2006

### Applied Statistics – Harrington

This week the Applied Statistics Workshop will present a talk by David Harrington, Professor of Biostatistics at Harvard’s School of Public Health, and in the Department of Biostatistical Science at the Dana Farber Cancer Institute.

Professor Harrington received his Ph.D. from the University of Maryland and taught at the University of Virginia before coming to Harvard. He has served as Principal Investigator on numerous NIH and NSF grants researching topics including Nonparametric Tests for Censored Cancer Data, and Statistical Problems for Markov Branching Processes. His research has appeared in Journal of the American Statistical Association, Biostatistics, Genetic Epidemiology, Journal of Clinical Oncology, and Biometrics among many others.

Professor Harrington is involved in two different lines of research. The first is research in statistical methods for clinical trials and prospective cohort studies in which the time to an event is a primary outcome. He has worked in efficient nonparametric tests and regression methods for right-censored data, sequential designs for clinical trials, and nonparametric methods for estimating nonlinear covariate effects on survival. Recently, he and co-workers in the Department of Biostatistics have been studying methods for analyzing survival data when some covariates have missing observations. Missing data are common in both prospective and retrospective cohort studies, and simply ignoring cases with missing observations can lead to substantial biases in inference.

Dr. Harrington 's second line of research, on which he will be presenting, is collaborative research in cancer. He is the principal investigator of the Statistical Coordinating Center for the Cancer Care Outcomes Research and Surveillance (CanCORS) Consortium. This NCI-funded study is a network of sites around the country that are conducting a population-based study of access to and outcomes from cancer care, with special focus on ethnic subgroups and subgroups defined by age.

Professor Harrington will present a talk entitled "Statistical Issues in the Cancer Care Outcomes Research and Surveillance Consortium (CarCORS)." The presentation will be at noon on Wednesday, December 13 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 9:23 AM

12 December 2006

### Better Way To Make Cumulative Comparisons With Small Samples?

On July 15, 1971 the research vessel Lev Berg set sail from Aralsk (Kazakhstan) to survey the Aral Sea, then the 4th largest freshwater lake in the world. The Soviet Union had been steadily draining the Aral for agricultural purposes since the 1950s and the Lev Berg was to measure the ecological damage. This trip included passing by the island Vozrozhdeniye on the South side.

(Image Source: "The 1971 Smallpox Epidemic in Aralsk, Kazakhstan, and the Soviet Biological Warfare Program." Center for Nonproliferation Studies Occasional Paper No. 9, Jonathan B. Tucker and Raymand A. Zilinskas.)

Vozrozhdeniye was an ideal site for the main Soviet bioweapons field testing because itwas in a remote area, easily secured as an island, and had reliable winds from the Northto the South allowing safe'' testing and housing on the North end. The site was active from 1936 until 1990 when Yeltsin publicly denounced the program and
had it shut down. This is despite the Soviet Union having signed the 1972 Biological and Toxin Weapons Convention outlawing such research. Shortly after the Lev Berg returned to Aralsk, there was an unusual outbreak of smallpox there, starting with a young researcher who had been onboard. The following is the best
epidemiological data available:

Comparison Case: in 1972 a Muslim man from Kosovo went on a pilgrimage to Mecca, returning through Baghdad where he was infected with smallpox. This was the first reported smallpox case in Kosovo since 1930 and it apparently went undiagnosed for six weeks producing 175 cases and 35 deaths. A good comparison since rates of vaccination were similar as were socio-economic conditions.

Kaplan-Meier graph with time-to-event = onset of illness:

(Image Source: Ibid.)

Key difference: all three Aralsk deaths were from hemorrhagic smallpox and only five in Kosovo were. The baseline for naturally occurring smallpox: Rao's study in Madras, India had 10,857 cases with only 240 hemorrhagic. Only two possible explanations seem to remain for the differences:
- host conditions (nutrition, genetic resistance, environment) differ greatly.
- Aralsk strain was an unusual type.
Obviously, it would be nice to claim strong evidence that the Soviet case resulted from escaped smallpox. We know the extent of the bioweapons program from Yeltsin's opening of the files, but not the responsibility of this dissemination with 100% certainty.

This is just a motivating (and interesting) example; the real question is about testing really small samples, when exact inference doesn't seem appropriate. So what other approaches would readers suggest for making comparisons with these types of cumulative data besides simple Kaplan-Meier comparisons? Obviously typical
correlational analysis won't work (polychoric, multichoric, etc.) and standard tabular approaches are not going to be effective either.

Posted by Jeff Gill at 2:48 PM

### Naming Conventions

This discussion came up yesterday in the Bayes course. There is a plethora of names for multilevel models. Sociologists seem to prefer "hierarchical," many statisticians say "mixed effects," and there is heterogeneity about usage in economics. It seems reasonable to standardize, but this is unlikely to happen. Maybe the most common comes from the following. Given two data matrices, x_{ij} for individual i in cluster j, and z_j for cluster j, there are perhaps four canonical models:

"Pooled:" y_{ij} = \alpha + x_{ij}'\beta + z_j'\gamma + e_{ij}

"Fixed Effect:" y_{ij} = \alpha_j + x_{ij}'\beta + e_{ij}

"Random Effect:" y_{ij} = \alpha_j + x_{ij}'\beta + z_j'\gamma + e_{ij}

"Random Intercept and Random Slope:" y_{ij} = \alpha_j + x_{ij}'\beta_j + z_j'\gamma + e_{ij}

Some prefer "random intercepts" for "fixed effects" and perhaps we can consider these all to be members of a larger family where indices are turned-on turned-off systematically. On the other hand maybe it's just terminology and not worth worrying about too much. Thoughts?

Posted by Jeff Gill at 10:23 AM

11 December 2006

### Iowa Redistricting: The Maytag Repairman of States?

From a forthcoming paper on legislative redistricting commissions: Iowa
has used the same scheme for the past three redistricting cycles. A commission draws three maps, and the legislature selects one of those.

The attached seats-votes graphs are for the 2000 and 2004 state house elections, before and after the 2001 cycle. As we can see, responsiveness (the slope of the curve) is high and remains high afterwards, suggesting that the fraction of contested seats is high, and justifying its reputation as a model for redistricting.

However, the curve is definitively below 50% at the median vote, meaning that an equal vote will almost always split the seats unevenly. (In this case, the Republican party gains the advantage.) This suggests that redistricting is less effective in this case.

Given Iowa's reputation as a well-run redistricter, one wonders how much it is deserved. It's also fair to wonder what would happen if this system were applied to another state where voting was racially polarized.

Posted by Andrew C. Thomas at 1:40 PM

7 December 2006

### NIPS highlights

Amy Perfors

I've just spent this week at the annual NIPS conference; though its main focus seems to be machine learning, there are always interesting papers on the intersection of computational/mathematical methods in cognitive science and neuroscience. I thought it might be interesting to mention the highlights of the conference for me - which obviously tends to focus heavily on the cognitive science end of things. (Be aware that links (pdf) are to the paper pre-proceedings, not final versions, which haven't been released yet).

From Daniel Navarro and Tom Griffiths, we have A Nonparametric Bayesian Method for Inferring Features from Similarity Judgments. The problem, in a nutshell, is that if you're given a set of similarity ratings about a group of objects, you'd like to be able to infer the features of the objects from that. Additive clustering assumes that similarity is well-approximated by a weighted linear combination of common features. However, the actual inference problem -- actually finding the features -- has always been difficult. This paper presents a method for inferring the features (as well as figuring out how many features their are) that handles the empirical data well, and might even be useful for figuring out what sorts of information (i.e., what sorts of features) we humans represent and use.

From Mozer et. al. comes Context Effects in Category Learning: An Investigation of Four Probabilistic Models. Some interesting phenomena in human categorization are the so-called push and pull effects: when shown an example from a target category, the prototype gets "pulled" closer to that example, and the prototypes of other related categories get pushed away. It's proven difficult to explain this computationally, and this paper considers four obvious candidate models. The best one uses a distributed representation and a maximum likelihood learning rule (and thus tries to find the prototypes that maximize the probability of being able to identify the category given the example); it's interesting to speculate about what this might imply about humans. The main shortcoming of this paper, to my mind, is that they use very idealized categories; but it's probably a necessary simplification to begin with, and future work can extend it to categories with a richer representation.

The next is work from my own lab (though not me): Kemp et. al. present an account of Combining causal and similarity-based reasoning. The central point is that people have developed accounts of reasoning about causal relationships between properties (say, having wings causes one to be able to fly) and accounts of reasoning about objects on the basis of similarity (say, if a monkey has some gene, an ape is more likely to have it than a duck is). But many real-world inferences rely on both: if a duck has gene X, and gene X causes enzyme Y to be expressed, it is likely that a goose has enzyme Y. This paper presents a model that intelligently combines causal- and similarity-based reasoning, and is thus able to predict human judgments more accurately than either of them alone.

Roger Levy and T. Florian Jaeger have a paper called Speakers optimize information density through syntactic reduction. They explore the (intuitively sensible, but hard to study) idea that people -- if they are rational -- should try to communicate in the information-theoretically optimal way: they should try to give more information at highly ambiguous points in a sentence, but not bother doing so at less ambiguous points (since adding information has the undesirable side-effect of making utterances longer). They examine the use of reduced relative clauses (saying, e.g., "How big is the family you cook for" rather than "How big is the family THAT you look for" - the word "that" is extra information which reduces the ambiguity of the subsequent word "you"). The finding is that speakers choose to reduce the relative clause -- to say the first type of sentence -- when the subsequent word is relatively unambiguous; in other words, their choices are correlated with information density. One of the reasons this is interesting to me is because it motivates the question of why exactly speakers do this: is it a conscious adaptation to try to make things easier for the listener, or a more automatic/unconscious strategy of some sort?

There are a number of other papers that I found interesting -- Chemudugunta et. al. on Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model; Roy et. al. on Learning Annotated Hierarchies from Relational Data, and Greedy Layer-wise Training of Deep Networks by Bengio et. al., to name a few -- so if this sort of thing interests you, I suggest checking out the NIPS proceedings when they come out. And if any of you went to NIPS also, I'd be curious what you really liked and think I should have included on this list!

Posted by Amy Perfors at 4:07 PM

6 December 2006

### Applied Statistics - Imbens and Ridder

This week the Applied Statistics Workshop will present a talk by Guido Imbens, Professor of Economics at Harvard University, and Geert Ridder, Professor of Economics at the University of Southern California.

Professor Imbens has recently rejoined the Department of Economics at Harvard and is one of the faculty sponsors of the Applied Statistics Workshop, so we are delighted that he will be speaking at the Workshop. He received his Ph.D. from Brown University and served on the faculties of Harvard, UCLA, and Berkeley before returning to Harvard. He has published widely, with a particular focus on questions relating to causal inference. Professor Imbens has been the recipient of numerous National Science Foundation grants and teaching awards. His work has appeared in Econometrica, Journal of Econometrics, Journal of the Royal Statistical Society, and Biostatistics among many others.

Geert Ridder is Professor of Economics at the University of Southern California. Before coming to the United States he was Professor of Econometrics at the Rijksuniversiteit Groningen and the Vrije Universiteit in Amsterdam in The Netherlands. In the United States he was Professor of Economics at the Johns Hopkins University and visiting professor at Cornell University, the University of Iowa, and Brown University. He received his Ph.D. from the University of Amsterdam. Professor Ridder’s research area is econometrics, in particular microeconometrics, and its applications in labor economics, public finance, economic development, economic demography, transportation research, and the economics of sports. His methodological interests are the (nonparametric) identification of statistical and economic structures from observed distributions (mainly in duration data and discrete choice data), models and estimation methods for duration data and panel data, (selectively) missing data, causal inference, and errors-in-variables. His work has appeared in Econometric, Economics of Education Review, Journal of the European Economic Association, and Journal of Econometrics among others.

Professors Imbens and Ridder will present a talk entitled "Complementarity and Aggregate Implications of Assortative Matching: A Nonparametric Analysis." The presentation will be at noon on Wednesday, December 6, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Eleanor Neff Powell at 10:34 AM

5 December 2006

### Causality in the Social Sciences Anybody?

Funny how there is no section on causal inference in the social sciences here? It says that to meet Wikipedia's quality standards, this article may require cleanup. Hopefully, somebody will find the time to contribute a social science section. Why not you? My guess is that readers of this blog know plenty about this topic...and the current entry is lacking a lot of what statistics has to say about causality.

Posted by Jens Hainmueller at 10:00 AM

4 December 2006

### NYT on Improving the Peer Review Process

Wednesday's New York Times reports on recommendations by an independent panel on how the journal Science could improve its review process (see here). The panel was instituted after Science had to rectract papers by Dr. Hwang Woo-suk that were based on fabricated results. The panel recommended four changes:

(1) Flag high visibility paper for extra scrutiny in the review process
(2) Require authors to specify their individual contributions to a paper
(3) Make more raw data available online for replication
(4) Work with other journals to establish a common standard for the review process.

Recommendations 3 and 4 has previously featured on this blog here and here. (2) should produce interesting results in joint publications. Maybe a logical extension would be to asses academic output by using the contributions as weights?

Posted by Sebastian Bauhoff at 4:57 PM

1 December 2006

### Babel of Statistics

As requirement for my doctoral program I am required to take a basic epidemiology class this semester. It's been interesting to see how the basic analytics in epi are the same as in say, econometrics, but how much the language and preferences differ across the fields.

One striking difference is the preference for confidence intervals rather than coefficients and standard errors. Epidemiologists don't like p-values for all the same reason that economists dislike them without additional information. But epidemiologists seem to be in love with confidence intervals. Obviously it's a handy statistic but to me it seems to generate a misleading emphasis on the popular 5 percent level. It just pre-empts any thinking about the process of getting that interval. But most epi or medical publication reports not much else.

On the other hand maybe other social sciences could benefit from what epidemiologists call "positive criteria for causality." Those include the existence of plausible (gasp!) mechanisms of cause-and-effect and dose-response relations (dose of exposure is related to level of disease). Other fields often overly rely only on the strength of association and it would be a good idea to think about other positive criteria more seriously.

Other items are pure lingo. For example, epidemiologists seem to call misclassification what economists call measurement error. But at any rate the differences in terminologies and preferences are surprising. When did the academic tribes separate? Also accepted techniques from one field often seem like innovation in another. Why is there not more communication between the fields? It seems like all could benefit from a wider discussion and application, and it's an easy way to publish so the incentives are right too.

Posted by Sebastian Bauhoff at 1:34 PM