March 2006
Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Authors' Committee


Matt Blackwell (Gov)


Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries



SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
The Education Wonks
Empirical Legal Studies
Free Exchange
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science



Powered by
Movable Type 4.24-en

« February 2006 | Main | April 2006 »

27 March 2006

Spring break, blog break

This week is spring break for both Harvard and MIT, so as per usual, we will be posting less this week. Enjoy the (sort of) spring sunshine!

Posted by Amy Perfors at 6:00 AM

24 March 2006

Another classroom demo: the scientific method

Gary's posts about teaching and breakfast cereal reminded me of a teaching experience I had once while teaching in the Peace Corps in Mozambique -- this time regarding the scientific method and hypothesis testing. It might be nothing particularly exciting to those of you who habitually teach pre-college level science, but I was surprised at how well it worked.

My (secondary-level) students were extremely good at memorizing facts, but they had a very hard time learning and applying the scientific method (as many do, I think). Since I see the method as the root of what makes science actually scientific and I didn't want them to have the view that science was just a disconnected collection of trivia, this was deeply problematic -- all the more frustrating to me because I could see that, in real life, they used the scientific method routinely. We all do, whenever we try to explain people's behavior or solve any of the everyday puzzles that confront us. The trick was to demystify it, to make them see that as well.

The next day I brought in an empty coke bottle. It's not vital that this be done with a coke bottle; in fact I imagine if you have more choice of materials than I had in Africa, you could find something even better. Basically I wanted something that was very familiar to them, to underscore the point that scientific reasoning is something they did all the time.

I held up the empty coke bottle. "What do you suppose had been in it?" I asked. This was the PROBLEM. "Coke!" everyone replied. "Okay," said I, "but I could have used it after the coke was gone for something else, right? What else could it have held?" Once again, people had no trouble suggesting possibilities -- water, gasoline, tea, other kinds of soda. I pointed out that they had just GENERATED HYPOTHESES, and wrote them on the board, along with coke.

Now, I asked them, how could you find out if your hypothesis was correct? They'd ask me, they said, and I pointed out that this was one way of TESTING the hypothesis. But suppose I wasn't around, or lied to them - what else could they do? One student suggested smelling it, and another (thinking about the gasoline hypothesis) suggested throwing a match in and seeing if it caught fire. "Both of these are good tests," I said, "and you'll notice that each of them is good for certain specific hypotheses; the match one wouldn't tell the difference between tea and other kinds of soda, for instance, and smelling it wouldn't help if it were water."

Then I asked a volunteer to come up and actually perform the test - to smell it, since we didn't have any matches. He did, and reported back that it smelled like Fanta even though it was a coke bottle. This, I said, was the RESULT, and it enabled the class to draw the CONCLUSION - that I had put Fanta in the bottle after drinking all of the original coke.

The best part of this demo came when a student, seeking to "trap" me, pointed out that I could still have had water or tea in the bottle, just long enough ago that the Fanta smell was stronger. "Exactly!" I replied. This points out the two limitations of the scientific method -- the validity of your conclusion depends on your hypotheses and on how good your methods of testing are. There are always a potentially infinite number of hypotheses you haven't ruled out, and therefore we cannot draw any conclusion with 100% accuracy. Plus, if our test can't tell the difference between two hypotheses, then we can't decide between those two. For this reason it's very important to have hypotheses that you can test, and to work to develop better methods of testing so that you can eliminate more plausible hypotheses.

This led to a good discussion about the pros and cons of the scientific method and how it compared to other ways of understanding the world. If I had had more time, equipment, or room, I had hoped to make it more interactive, with stations where they had to apply the method to lots of simple real-world problems; but even as it was, it was valuable.

I was surprised at how well this demo worked... not only did they immediately understand how to apply the scientific method, but they also understood its limitations in a way that I think many people don't, even by college age. As the semester advanced, I found myself referring back to the lesson often ("remember the empty coke bottle") when I'd try to explain how we knew what we knew. And I think it was very freeing for them to realize that science wasn't some mysterious system of rules passed down from on high, but rather the best explanation we had so far (and the best way we knew of how to get that explanation). My favorite result of this demo was their realization that scientists were people just like themselves, and that they too could do it -- in fact, they already were.

Posted by Amy Perfors at 6:00 AM

23 March 2006

Control Groups for Breakfast, Revisited

A few months ago, I wrote an entry entitled The Value of Control Groups in Causal Inference (and Breakfast Cereal). It was a report on a fun experiment I did that worked well both in my daughter's kindergarten class and my graduate methods class at Harvard. There were a fair number of comments posted in the blog, and I also received dozens of other notes from parents and school teachers all over the country with many interesting questions and suggestions.

That correspondence covered four main points:

  1. Some people suggested a variety of interesting alternative experiments, which is great, but in designing these many forgot that you must always have a control group. That's the main lesson of the experiment: you often learn nothing without some kind of control group, and teaching this to kids (and graduate students!) is quite important.
  2. Some people didn't squish the cereal enough and the magnet didn't pick up the pieces. It will attract only when squished very well since the bits of iron are very small.
  3. People then asked why the cereal doesn't stick to the magnet without squishing it up. The reason is the same reason a magnet won't pick up a nail driven into a log, but it will pick up the nail if not in the log.
  4. Finally, most people asked for other experiments they could run with their kids. For that, which I'm writing up now, please tune in next time!

Posted by Gary King at 6:00 AM

22 March 2006

Valid Standard Errors for Propensity Score Matching, Anyone?

Jens Hainmueller

Propensity Score Matching (PSM) has become an increasingly popular method to estimate treatment effects in observational studies. Most papers that use PSM also provide standard errors for their treatment effect estimates. I always wonder where these standard errors actually come from. To my knowledge there still exists no method to calculate valid standard errors for PSM. What do you all think about this topic?

The issue is this: Getting standard errors for PSM works out nicely when the true propensity score is known. Alberto and Guido have developed a formula that provides principled standard errors when matching is done with covariates or the true propensity score. You can read about it here. This formula is used by their nnmatch matching software in Stata and Jasjeet Sekhon’s matching package in R.

Yet, in observational settings we do not know the true propensity score so we first have to estimate it. Usually people regress the treatment indicator on a couple of covariates using a probit or logit link function. The predicted probabilities from this model are then extracted and taken as the estimated propensity score to be matched on in the second step (some people also match on the linear predictor, which is desirable because it does not tend to cluster so much around 0 and 1).

Unfortunately, the abovementioned formula does not work in the case of matching on the estimated propensity score, because the estimation uncertainty created in the first step is not accounted for. Thus, the confidence bounds on the treatment effect estimates in the second step will most likely not have the correct coverage.

This issue is not easily resolved. Why not just bootstrap the whole two-step procedure? Well, there is evidence to suggest that the bootstrap is likely to fail in the case of PSM. In the closely related problem of deriving standard errors for conventional nearest neighbor matching Guido and Alberto show in a recent paper, that even in the simple case of matching on a single continuous covariate (when the estimator is root-N consistent and asymptotically normally distributed with zero asymptotic bias) the bootstrap does not provide standard errors with correct coverage. This is due to the extreme non-smoothness of nearest neighbor matching which leads the bootstrap variance to diverge from the actual variance.

In the case of PSM the same problem is likely to occur unless estimating the propensity score in the first step makes the matching estimator smooth enough for the bootstrap to work. But this is an open question. At least to my knowledge there exists no Monte Carlo evidence or theoretical justification for why the bootstrap should work here. I would be interested to hear opinions on this issue. It’s a critical question because the bootstrap for PSM is often done in practice, various matching codes (for example pscore or psmatch2 in Stata) do offer bootstrapped standard errors options for matching on the estimated propensity score.

Posted by Jens Hainmueller at 6:00 AM

Applied Statistics - Jeff Gill

Today at noon, the Applied Statistics Workshop will present a talk by Jeff Gill of the Department of Political Science at the University of California at Davis. Professor Gill received his Ph.D from American University and served on the faculty at Cal Poly and the University of Florida before moving to Davis in 2004. His research focuses on the application of Bayesian methods and statistical computing to substantive questions in political science. He is the organizer for this year's Summer Methods Meeting sponsored by the Society for Political Methodology, which will be held at Davis in July. He will be a visiting professor in the Harvard Government Department during the 2006-2007 academic year.

Professor Gill will present a talk entitled "Elicited Priors for Bayesian Model Specifications in Political Science Research." This talk is based on joint work with Lee Walker, who is currently a visiting scholar at IQSS. The presentation will be at noon on Wednesday, March 22 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided. The abstract of the paper follows on the jump:

We explain how to use elicited priors in Bayesian political science research. These are a form of prior information produced by previous knowledge from structured interviews with subjective area experts who
have little or no concern for the statistical aspects of the project. The purpose is to introduce qualitative and area-specific information into an empirical model in a systematic and organized manner in order to produce parsimonious yet realistic implications. Currently, there is no work in political science that articulates elicited priors in a Bayesian specification. We demonstrate the value of the approach by applying elicited priors to a problem in judicial comparative politics using data and elicitations we collected in Nicaragua.

Posted by Mike Kellermann at 12:01 AM

21 March 2006

World Health Surveys: Arriving Soon

Sebastian Bauhoff

Good data on health-related issues in developing countries is hard to find, especially if you need large samples and cross-country comparability. The latest round of the World Health Surveys (WHS) is starting to become available to researchers in the next months and might be one of the best surveys out there, in addition to the
Demographic and Health Surveys (DHS).

The current WHS has been conducted in 70 countries in 2000-2001. The survey is standardized and comes with several modules, including measures of health states of populations; risk factors; responsiveness of health systems; coverage, access and utilization of key health services; and health care expenditures. The instruments use several innovative features, including anchoring vignettes and geocoding, and seems to collect more information on income/expenditure than DHS does.

From the looks, WHS could easily become the new standard dataset for cross-country comparisons of health indicators, though for some applications it might be more of a complement than substitute for the DHS. As of now, the questionnaires and some country reports are online, and the micro-data is supposed to be available by the middle of the year at the latest.

Posted by Sebastian Bauhoff at 6:00 AM

20 March 2006

Making Diagnostics Mandatory

Jim Greiner

Teaching a class (see here) on the interaction between lawyers, most of whom lack quantitative training, and quantitative analysts has me thinking about the danger statistical techniques pose. As is true of those who study any branch of specialized knowledge, statisticians can abuse the trust decision makers (judges, government officials, members of the public) put in us all too easily, and often with impunity. (Of course, “we? all know that “we? would never do any such thing, even though “we? know that “everyone else? does it all the time. Gee.)

If it’s of interest (or perhaps more accurately, unless a barrage of comments tells me I’m being boring), I’ll be blogging about ways “everyone else? abuses trust, and ways “we? can try to stop it. Here’s my first suggestion: make diagnostics mandatory.

Here’s what I mean. I’ve previously blogged (see here) on the double-edged sword posed by the recent trend towards academics’ writing free software to fit models they’ve developed. One way for software-writers to lessen the danger that their models will be abused is to write diagnostics into their programming . . . and make those diagnostics hard to turn off. Suppose, for example, that some analysts are writing code to implement a new model, and the fitting process requires fancy MCMC techniques. These analysts should write MCMC convergence diagnostics into the software, and should set their defaults so that the fitting process produces these diagnostics unless it’s told not to. Perhaps, the analysts should even make it a little tough to turn off the diagnostics. That way, even if the user doesn’t look at the diagnostics, someone else (perhaps an opposing expert in a court case?) might have easier access to them.

The worry, of course, is that the output from all new software will end up looking like it came out of SAS (a package I wouldn’t wish on my worst enemy). Still, as our cognitive psychologist could probably tell us, people are incredibly lazy. Even if a user of software just has to go to a drop-down menu to look at a diagnostic, chances are he/she won’t bother.

Posted by James Greiner at 6:00 AM

16 March 2006

Are people Bayesian? (and what does that mean?) Part II

In my last post I talked about computational vs. algorithmic level descriptions of human behavior, and I argued that most Bayesian models of reasoning are examples of the former -- and thus make no claims about whether and to what extent the brain physically implements them.

A common statement at this point is that "of course your models don't say anything about the brain -- they are so complicated, how could they? Do people really do all that math?" I share the intuition: the models do look complex, and I am certainly not aware of doing anything like this when I think, but I don't think the possibility can be rejected out of hand. In other words, while it's certainly possible that human brains do nothing like, say, MCMC [insert complicated computational technique here], it's not a priori obvious. Why?

I have three reasons. First of all, we really don't have any good conception of what the brain is capable of computationally -- it has billions of neurons, each of which has thousands of connections, and (unlike modern computers) is a massively parallel computing device. State of the art techniques like MCMC look complicated when written out as mathematical equations -- particularly to those who don't come from that background -- but that doesn't mean, necessarily, that they are complicated in the brain.

Secondly, every model I've seen generally gets its results after running for at most a week, usually for only a few minutes -- much less time than a human has to go about and form theories of the world. If you are studying how long-term theories or models of the world form, it's not at all clear how to compare the time a computer takes to the time a human takes: not only are the scales really different, so is the data they get (models generally have cleaner data, but far less) and so is the speed of processing (computers are arguably faster, but if a human can do in parallel what a computer does serially, this might mean nothing). The point is that comparing a computer after 5 minutes to a human over a lifetime might not be so silly after all.

Thirdly, both the strength and weakness of studying cognitive science is that we have clear intuitions about what cognition and thinking are. It's a strength in that it helps us judge hypotheses and have good intuitions -- but it's a weakness in that it causes us accept or reject ideas based on these intuitions when maybe we really shouldn't. There's a big difference between conscious and unconscious reasoning, and most (if not all) of our intuitions are based on how we see ourselves reason consciously. But just because we aren't aware of, say, doing Hebbian learning doesn't mean we aren't. It's striking to me that people who make Bayesian models of vision rarely have to deal with questions like "but people don't do that! it's so complicated!" This in spite of the fact that it's the same brain. I think this is probably because we don't have conscious awareness of the process of vision, and so don't therefore think we know how it works. But to the extent that higher cognition is unconscious, the same point applies. It's just easy to forget.

Anyway, I'd be delighted to hear objections to any of these three reasons. As I said in the last post, I'm still sorting out these issues to myself, so I'm not really dogmatically arguing any of this.

Posted by Amy Perfors at 6:00 AM

15 March 2006

Incompatibility: Are You Worried?

Jim Greiner

I’m a teaching fellow for a course in missing data this semester, and one topic keeps coming up peripherally in the course, even though we haven’t tackled it head-on just yet. That topic is incompatible conditional distributions. And here’s my question for blog readers: how much does it bother you?

Reduced to its essence, here’s the issue. Supposed I have a dataset with three variables, A, B, and C. There are multiple missing data patterns, and suppose (although it’s not essential to the problem) that I want to use multiple imputation to create six or seven complete analysis datasets. Suppose also that it’s very difficult to conceive of a minimally plausible joint distribution p(A, B, C). Perhaps A is semi-continuous (e.g., income), B is categorical with 5 possible values, and C has support only over the negative integers. What (as I understand it) is often done in this case is to assume conditional distributions, for example, p*(A|B, C), p*(B|A, C), and p*(C|A, B). The idea is that one does a “Gibbs? with these three conditional distributions, as follows. Find starting values for the missing Bs and Cs. Draw missing As from p*(A|B, C). Then draw new Bs from p*(B|A, C) using the newly drawn As and the starting Cs. Continue as though you were doing a real “Gibbs.? Stop after a certain number of iterations and call the result one of your multiply imputed datasets.

The incompatibility problem is that there may be no joint distribution that has conditional distributions p*(A|B, C), p*(B|A,C), and p*(C|A, B). Remember, (proper) joint distributions determine conditional distributions, but conditional distributions do not determine joint distributions, and in some cases, one can actually prove mathematically that no joint distribution has a particular set of conditionals. If you ran your “Gibbs? long enough, eventually your draws would wander off to infinity or become absorbed into a boundary of the parameter space. In other words, your computer would complain; exactly how it would complain depends on how you programmed it.

I confess this incompatibility problem bothers me more than it appears to bother some of my mentors. If the conditional distributions are incompatible, then I KNOW that the "model" I’m fitting could not have generated the data I see. It seems like even highly improbable models are better than impossible ones. On the other hand, I am sympathetic to the idea of doing the best one can, and what else is there to do in (say) large datasets with multiple, complicated missing data patterns and unusual variable types?

How much does incompatibility bother you?

Posted by James Greiner at 6:00 AM

14 March 2006

So You Want to Do a Survey?

"I'm doing a survey. I've never done this before, taken any classes on survey research, or read any books on the subject, and a friend suggested that I get some advice. Can you help me? I'm going in the field next week."

Someone has asked me versions of this question almost every month since I was a graduate student, and every time I have to convey the bad news: doing survey research right is extremely difficult. The reason the question keeps coming up is that it seems like a such a reasonable question: what could be hard about asking questions and collecting some answers? What could someone do wrong that couldn't be fixed in a quick conversation? Don't we ask questions informally in casual conversation all the time? Why can't we merely write up some questions, get some quick advice from someone who has tried this before, and go do a survey?

Well, it may seem easy, but survey research requires considerable expertise, not any less than heart surgery or flying military aircraft. Survey research should not be done casually if you care about the results. Survey research seems easy because its possible to learn a little without much expertise, whereas doing a little heart surgery with a dinner knife, or grabbing the keys to a B-2 after seeing Top Gun, wouldn't accomplish anything useful.

Survey research is not easy; in fact, its a miracle it works at all. Think about it this way. When was the last time you had a misunderstanding with your spouse, a miscommunication with your parent or child, or your colleague thought you were saying one thing and you meant another? That's right: you've known these people for decades and your questions are still misunderstood. When was the last time your carefully worded, and extensively rewritten article or book was misunderstood? This happens all the time. And yet you think you can walk into someone's home you've never met, or do a cold call on the phone, and in five minutes elicit their inner thoughts without error? Its hard to imagine a more arrogant, unjustified assumption.

So what's a prospective survey researcher to do? Taking a course, reading some books, etc., would be a good start. Our blog has discussed some issues in survey research before, such as in this entry and this one on using anchoring vignette technology to deal with the problem of survey respondents who may interpret survey questions differently from each other and from the investigator. Issues of missing data arise commonly in survey research too. I'm sure we'll discuss lots of other survey-related issues on this blog in the future as well.

A more general facility for information on the subject is the Institute for Quantitative Social Science's Survey Research Program, run by Sunshine Hillygus. This web site has a considerable amount of information on the art and science of questioning people you don't know on topics they may know. If readers are aware of any resources not listed on this site that may be of help survey researchers, please post a comment!

Posted by Gary King at 6:00 AM

13 March 2006

Applied Statistics - Felix Elwert

This week, the Applied Statistics Workshop will present a talk by Felix Elwert, a Ph.D. candidate in the Harvard Department of Sociology. Felix received a B.A. from the Free University of Berlin and an M.A. in sociology from the New School for Social Research before joining the doctoral program at Harvard. His article on widowhood and race, joint work with Nicholas Christakis, is forthcoming in the American Sociological Review. He is also a fellow blogger on the Social Science Statistics blog. On Wednesday, he will present a talk entitled "Trial Marriage Reconsidered: The effect of cohabitation on divorce". The presentation will be at noon on Wednesday, March 15 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Mike Kellermann at 9:37 PM

Non-Independence in Competing Risk Models

Felix Elwert

A central assumption in competing risk analysis is the conditional independence of the risks under analysis. Suppose we are interested in cause-specific mortality due to causes A, B, and C. If we assume that the process leading to death from A is independent (conditional on covariates) from the process leading to death from B, then the likelihood factors nicely, and estimation via a series of standard 0/1 hazard models is straightforward. For example, it may be reasonable to assume that death from lung cancer (cause A) is independent of death from being struck by a meteorite (cause B). But it is much less reasonable to assume that death from lung cancer (A) is independent of the risk of dying from emphysema (C), unless we are lucky enough to have, say, appropriate covariate information on smoking history.

The problem is partly rhetorical. The independence assumption in competing risk analysis is the exact same as the assumption of independent censoring in standard hazard models. Few applied papers even mention the latter (unfortunately). In competing risk analysis, however, the assumption becomes quite a bit more visible, and thus harder to hide…

There are a small number of strategies, none particularly popular, to cope with dependence. Sanford C Gordon recently contributed a new strategy in “Stochastic Dependence on Competing Risks? AJPS 46(1), 2002, which builds on an earlier idea of drawing random effects. Rather than drawing individual specific random effects, as has been suggested before by Clayton 1978, Gordon draws risk and individual specific random effects. Thus, a K-risk model on a sample of N individuals may contain up to KxN separate random effects, one for each risk and individual.

The advantage of this strategy is that it allows for the estimation of the direction of dependence (previous work had to assume a specific direction). The disadvantage is that estimation via conditional logit models is very expensive, to the order of several days for moderate size samples of a few thousand cases.

Posted by Felix Elwert at 6:00 AM

10 March 2006

Are people Bayesian? (and what does that mean?) Part I

Anyone who is interested in Bayesian models of human cognition has to wrestle with the issue of whether people use the same sort of reasoning (and, if so, to what extent this is true, and how our brains do that). I'll be doing a series of posts exploring what I think about this issue (which isn't really set in stone yet -- so think of this as "musing out loud" rather than saying "this is the way it is").

First: what does it mean to say that people are (or are not) Bayesian?

In many ways the question of whether people do the "same thing" as the model is a red herring: I use Bayesian models of human cognition in order to provide computational-level explanations of behavior, not algorithmic-level explanations. What's the difference? A computational-level explanation seeks to explain a system in terms of the goals of the system, the constraints involved, and the way those various factors play out. An algorithmic-level explanation seeks to explain how the brain physically does this. So any single computational explanation might have a number of possible different algorithmic implementations. Ultimately, of course, we would like to understand both: but I think most phenomena in cognitive science are not well enough understood on the computational level to make understanding on the algorithmic level very realistic, at least not at this stage.

To illustrate the difference between computational and algorithmic, I'll give an example. People given a list of words to memorize show certain regular types of mistakes. If the list contains many words with the same theme - say, all having to do with sports, but never the specific word "sport" - people they will nevertheless often incorrectly "remember" seeing "sport". One possible computational-level explanation of what is going on might suggest, say, that the function of memory is to use the past to predict the future. It might further say that there are constraints on memory deriving from limited capacity and limited ability to encode everything in time, and that as a result the mind seeks to "compress" information by encoding the meaning of words rather than their exact form. Thus, it is more likely to "false positive" on words with similar meanings but very different forms.

That's one of many possible computational-level explanations of this specific memory phenomenon. The huge value of Bayesian models (and computational models in general) is that they make this type of explanation rigorous and testable - we can quantify "limited capacity" and what is meant by "prediction" and explore how they interact with each other, so we're not just throwing words around. There is no claim in most computational cognitive science, implicit or explicit, that people actually implement the same computations our models do.

There is still the open question of what is going on algorithmically. Quite frankly, I don't know. That said, in my next post I'll talk about why I don't think we can reject out of hand the idea that our brains are implementing something (on the algorithmic level) that might be similar to the computations our computers are doing. And then in another post or two I'll wrap up with an exploration of the other possibility: that people are adopting heuristics that approximate our models, at least under some conditions. All this, of course, is only true to the extent that the models are good matches to human behavior -- which is probably variable given the domain and the situation.

Posted by Amy Perfors at 6:00 AM

9 March 2006

Comparative Politics Dataset Award

As you know, Alan Greenspan retired from Fed about a month ago (and already has an $8M book deal, but I digress...). Jens' post below reminded me of one of my favorite Greenspan quotes: "I suspect greater payoffs will come from more data than from more technique." He was speaking to economics about models for forcasting economic growth, but I suspect his comments apply at least as strongly to political science and other social sciences. You might have the most cutting-edge, whiz-bang, TSCS-2SLS-MCMC evolutionary Bayesian beta-beta-beta-binomial model that will tell you the meaning of life and wash your car at the same time, but if the data that you put in is either non-existent or garbage, it isn't going to do you a lot of good. Unfortunately, the incentives in the profession do not seem sufficient to reward the long, tedious efforts required to collect high-quality data and to make it publicly available to the academic community. Most scholars would surely like to have better data; they would just prefer that someone else collect it.

Having said all that, it is worth noting efforts that make data collection and dissemination a more rewarding pursuit. One such effort is the Dataset Award given by the APSA Comparative Politics section for "a publicly available data set that has made an important contribution to the field of comparative politics." This year's request for nominations hits the nail on the head:

The interrelated goals of the award include a concern with encouraging development of high-quality data sets that contribute to the shared base of empirical knowledge in comparative politics; acknowledging the hard work that goes into preparing good data sets; recognizing data sets that have made important substantive contributions to the field of comparative politics; and calling attention to the contribution of scholars who make their data publicly available in a well-documented form.

The section is currently accepting nominations for the 2006 award, with a deadline of April 14. Information about nominating a dataset can be found here.

Posted by Mike Kellermann at 1:32 PM

8 March 2006

EM And Multi-level Models

Jim Greiner

One of the purposes of this blog is to allow us to share quantitative problems we’re currently considering. Here’s one that arose in my research, and I’d love any comments and suggestions readers might have: can one apply the EM algorithm to help with missing data in multi-level models?

Schematically, the problem I ran into is as follows: A_ij | B_i follows some distribution, call it p1_i, and I had n_i observations of A_ij. A_ij was a random vector, and some parts of some observations were missing. B_i | C follows some other distribution, call it p2. Suppose I’m a frequentist, and I want to make inferences about C. The problem I kept running into was that I couldn’t figure out how to use EM without integrating the B_i’s out of the likelihood, a mathematical task that exceeded my skills. I ended up switching to a Bayesian framework and using a Gibbs sampler, i.e., drawing from the distribution of the missing data given the current value of the parameters, then from the distribution of the parameters given the now-complete data. But I couldn’t help wondering, are hardnosed frequentists just screwed in this situation, do they have to resort to something like Newton-Raphson, or is there an obvious way to use EM that I just missed?

Posted by James Greiner at 6:00 AM

7 March 2006

Applied Statistics - Roland Fryer

This week, the Applied Statistics Workshop will present a talk by Roland Fryer, a Junior Fellow of Harvard Society of Fellows, resident in the Economics Department. Dr. Fryer received his Ph.D. in economics from The Pennsylvania State University in 2002, and was an NSF post-doctoral fellow before coming to Harvard. His work has appeared in several journals, including the Quarterly Journal of Economics and the Review of Economics and Statistics. Dr. Fryer will present a talk entitled "Measuring the Compactness of Political Districting Plans". The presentation will be at noon on Wednesday, March 8 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.

Posted by Mike Kellermann at 11:34 AM

Data Availability

Sebastian Bauhoff

Currently most students in Gov 2001 are preparing for the final assignment of the course: replicating and then improving on a published article. While scouting for a suitable piece myself, I came across the debate about whether (and how) data should be made available.

It is somewhat surprising that nowadays one can get all sorts of scholarly research off the web, except for the data that produced the results. Given that methods already exist to ensure that data remains proprietary and confidential, omitting the data from publication seems rather antiquated, unnecessary and counter-productive to scientific advance. Some health datasets -- such as AddHealth, which arguably contains some of the most sensitive information -- have successfully been public for a few years already. There's of course an intriguing debate about this which Gary's website partly documents.

It seems that we are slowly coming in reach of universal data publication. Apart from projects like ICPSR, several major journals recently started to request authors to submit data and codes. The JPE explained to me that they expect to have data for some articles from April 2006, and that 'only the rare article will not include the relevant datasets' from early 2007.

Since debating the robustness of existing results seems like good research, making data and codes available could spur quite a lot of articles. I wonder what the effects on journal content will be. Rather than publishing various replications, maybe journals will post those only online? Or will there be specialized journals to do that to keep the major publications from being jammed?

Posted by Sebastian Bauhoff at 6:00 AM

6 March 2006

An Unintended Potential Consequence of School Desegregation

Felix Elwert

One goal of school desegregation is to promote racial understanding by fostering interracial contact. In an article in the American Journal of Sociology (1998, Vol. 103[5]), Scott Feld and William Carter develop a simple combinatorial argument about a surprising potential consequence of school desegregation.

They argue that under certain (not so outlandish) circumstances, school desegregation may actually decrease rather than increase opportunities for interracial contact.

Here is their argument by way of a stylized example. Suppose there are four schools, one with capacity C1=400, and three schools with capacities C2=C3=C4=200 students. Under segregation, all 100 black students in the district attend the big school. The 900 other students are white. Assuming that students only interact with students in their own school, there are thus 300*100=30,000 possible interracial, intra-school ties. Now desegregate such that the percentage of black students is the same in all four schools. Then there are 360*40 potential interracial, intra-school friendships in the big school, and 180*20 potential interracial, intra-school friendships in each of the three small schools. Hence, the total number of potential interracial friendships post-desegregation is 25,200, as compared to 30,000 pre-desegregation.

Whether this decrease in potential ties will actually result in a decrease in realized ties is an empirical question, dependent on factors spelled out in the article. Feld and Carter go on to show that this particularly instance is an example of the so-called Class Size Paradox, known from various applications in sociology.

Posted by Felix Elwert at 6:00 AM

3 March 2006

On communication

Jim's entry about the use of the word "parameter" got me thinking about a related issue I wrestle with all the time: communicating the importance and value of computational models in psychology to traditional psychologists.

There is a certain subset of the cognitive science community that is interested in computational/statistical models of human reasoning, stemming from the 70s and the 80s, first with Strong AI and the rise of connectionism. Nowadays, I think more people are becoming interested in Bayesian models, though admittedly it's hard to tell how big this is because of sample bias: since it's what my lab does, I don't have a clear sense of how many people don't know or care about this approach, since they are the very people I'm least apt to converse with.

Nevertheless, I think I can say with some confidence that a not inconsequential number of psychologists just don't see the value of computational models. Though I think some of that is for good reasons (some of which I share), I'm ever more convinced that a lot of this is because we, the computational and quantitative people, do such a lousy job of explaining why they are important, in terms that a non-computationally trained person can understand.

Part of it is word choice: as Jim says, we have absorbed jargon to the point that it is second-nature to us, and we don't even realize how jargony it might be ("parameters", "model", "Bayesian", "process", "generative", "frequentist","likelihood" - and I've deliberately tried put on this list some of the least-jargony terms we habitually use). But I think it also relates to deeper levels of conceptualization -- we have trained ourselves to the point that when something is described mathematically, we can access the intuition fairly easily, and thus forget that the mathematical description doesn't have the same effect for other people. I was recently at a talk geared toward traditional psychologists in which the speaker described what a model was doing in terms of coin flipping and mutation processes. It was perfectly accurate and certainly less vague than the corresponding intuition, but I think he lost a few people right there: since they couldn't capture the intuition rapidly enough, the model felt both arbitrary and too complicated to them. I don't think it's a coincidence that arbitrariness and "too much" complexity are two of the most common criticisms leveled at computational modelers by non-modelers.

The point? Though we shouldn't sacrifice accuracy in order to make vague, handwavy statements, it's key to accompany accurate statistical descriptions with the corresponding intuitions that they capture. It's a skill that takes practice to develop (learning this is one of the reasons I blog, in fact), and it requires being constantly aware of what might be specialized knowledge that your listener might not know. But it's absolutely vital if we want quantitative approaches to be taken seriously by more non-quantitative folks.

Posted by Amy Perfors at 6:00 AM

2 March 2006

Freaks And "Parameter"

Jim Greiner

In a previous post, I briefly described the joint Law School/Department of Statistics course I’m currently co-teaching in which law students act as lawyers and quantitative students act as experts in simulated litigation. I’ll be writing about some of the lessons learned from this course in blog entries, especially lessons about what is quickly becoming the course’s central challenge for the students: communication between those with quantitative training and those without. Here’s my first lesson for the quantitatively adept: avoid the word “parameter.?

Of course it isn’t the word “parameter? so much as is it is any of the jargon that we in the quantitative social science business use every day. And everyone knows that if you’re speaking to persons from another field, you have to speak in regular English (if that’s what you’re speaking). The hard part is remembering what regular English sounds like. We in quantitative social science don’t realize what freaks we become.

Here’s the vignette. In a recent session of the class, a student sought to explain to some lawyers how simulation can be used to test whether a model is doing what it’s supposed to do. She got as far as explaining how one could use a computer to simulate data, but when she began to explain checking to see whether an interval produced by the model covered the known truth, she used the word “parameter.? The change in expression on the law students’ faces resembled air going out of a balloon.

Of course, every first year statistics undergraduate knows what a “parameter? is, and as far as jargon goes, “parameter? is a lot less threatening than some other terms. But it was enough to cause the lawyers in the room to give up on following her. It the recovery period was longer than it might otherwise have been because this episode occurred early in the class, when the lawyers and experts were still getting a feel for each other. The lesson for us is, when communicating with the rest of the world, even the most seemingly innocuous words can make a difference. We have to recognize that we’ve become freaks.

Posted by James Greiner at 6:00 AM

1 March 2006

Thoughts on SUTVA (Part II)

Alexis Diamond, guest blogger

In part I (yesterday), I introduced the subject of SUTVA (the stable unit treatment value assumption), an assumption associated with Rubin's causal model. Well, why have SUTVA in the first place? What work is it actually doing? What does it require? "The two most common ways in which SUTVA can be violated appear to occur when (b) there are versions of each treatment varying in effectiveness or (b) there exists interference between units" (Rubin 1990, p. 282).* But this two-step SUTVA shorthand is frequently implausible in the context of many important and interesting causal questions.

SUTVA allows for a precise definition of causal effects for each unit. When SUTVA obtains, the inference under investigation relates to the difference between what would have been observed in a world in which units received the treatment and what would have been observed in a world in which treatment did not exist. SUTVA makes the inference, the causal question under investigation, crystal clear.

But SUTVA is not necessary to perform inference in the context of Rubin's causal model--what is necessary is to precisely define causal effects of interest in terms of potential outcomes and to adhere to the principle that for every set of allowable treatment allocations across units, there is a corresponding set of fixed (non-stochastic) potential outcomes that would be observed. In my peacekeeping analysis, I define units as country-episodes; each unit is an episode during which a country experienced civil war and was either treated/not-treated by a UN peacekeeping mission.

I define my causal effects precisely: I am interested in causal effects for treated units, and I define the causal effect for each treated unit as the difference between the observed outcome and what would have been observed had that unit's treatment been turned-off and peacekeeping had not occurred. There are many other potential outcomes one could contemplate and utilize to make other causal inferences; these others are beyond the scope of my investigation. I don't need SUTVA or other exclusion restrictions to exclude them. I exclude them in the way I pose my causal question.

I am not claiming that all peacekeeping missions are exactly the same—that would be silly. I also do not claim non-interference across units—after all, how could this be true, or even approximately true? History matters. Peacekeeping missions affect subsequent facts on the ground within and across countries. So SUTVA is going to be violated. But what allows me to proceed with an attempt at analysis is that my causal question is, nevertheless, well-defined. Clearly, I mean only one thing when referring to the "estimated effect of peacekeeping": the difference between the observed outcome for each and every treated unit and what would have been observed for each unit under the control regime of no-peacekeeping. I define the average effect for the treated (ATT), my ultimate estimand of interest, to be the average of these estimated unit-level effects.

Three caveats apply: (1) I am not claiming this ATT represents what it does under SUTVA, namely the average difference in potential outcomes that would have been observed given all selected units experiencing treatment vs. all experiencing control; (2) I must assume there is only one version of the control intervention; (3) estimation will require additional assumptions, and if estimating treatment effects under exogeneity (eg., via matching), one must still make the case for ignorable assignment. This last caveat is very different from, and subsequent to, the others, in the sense that estimation and analysis via matching (or any other method) only makes sense if the first two caveats obtain and the causal question is well-defined.

As social science moves increasingly toward adoption of the Rubin causal model, I predict that political scientists (and social scientists more generally) will frame their SUTVA-like assumptions and inferential questions in this way. I think this is consistent with what Gary King and his coauthors were doing in Epstein et al. (2005)**, when they asked about the effect of war on Supreme Court decision-making. They were not claiming that occurrences of treatment (war) had no effect on subsequent Supreme Court decisions; they were asking about what would have happened if each episode of treatment had been turned off, one at a time. And in many cases, this is the only kind of question there is any hope of answering—the only kind of question close enough to the data to allow for plausible inference. As long as these causal questions themselves are interesting, this general approach seems to me to be a coherent and sensible way forward.

*Rubin, Donald B. Formal Modes of Statistical Inference For Causal Effects. Journal of Statistical Planning and Inference. 25 (1990), 279-292.

** Epstein, Lee; Daniel E. Ho; Gary King; and Jeffrey A. Segal. The Supreme Court During Crisis: How War Affects only Non-War Cases, New York University Law Review, Vol. 80, No. 1 (April, 2005): 1-116.

Posted by James Greiner at 6:00 AM