May 2012
Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries

Categories

Blogroll

SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


February 18, 2009

Is Height Contagious? Detecting Implausible Social Network Effects

Some of you may be familiar with the recent work on social network effects in public health, where several studies have found significant networks effects on outcomes such as obesity, smoking, and alcohol use. We have blogged about some of this work here and here. One key question with these findings is whether the observed relationship between one's own health outcome and the health status of other individuals in one's reference group is indeed causal or driven primarily by selection effects. Many have argued that confounding seems like a serious concern given that one's friends are not chosen at random. But at the end of the day it remains an empirical question whether the study design is able to account for these selection effects or not.

In "Detecting implausible social network effects in acne, height, and headaches: longitudinal analysis" Ethan Cohen-Cole and Jason Fletcher add to this debate with a series of interesting placebo studies. They demonstrate that the same empirical specification used in previous studies (ie. logistic regression of own health ~ friends' health + X) also "detects" significant and fairly large network effects for implausible outcomes such as acne, height, and headaches. For example, having a friend with headache problems increases the respondent's chances of headache problems by about 47% on average. These implausible placebo findings suggest that previous findings may have been driven by confounding. Similar placebo tests have been used in a variety of papers such as DiNardo and Pishke (1997), Rosenbaum (2002), Abadie and Gardeazabal (2003), Angrist and Krueger (1999), and Auld and Grootendorst (2004) to name just a few, but this study is another great example that demonstrates the power of such tests for social science research. I will use this as a teaching example I think.

Interestingly, Cohen-Cole and Fletcher also show that their implausible effects go away once they augment the standard model by adjusting for environmental confounders that may affect both an individual and her friends simultaneously. They conclude that "There is a need for caution when attributing causality to correlations in health outcomes between friends using non-experimental data."

I wonder how this debate will evolve. The ultimate test to disentangle correlation and causation would be to find a good natural experiment or to run field a experiment where social ties are exogenously assigned. Does anybody know of ongoing research that does this? It seems difficult to get something like this approved by research review boards of course.

Posted by Jens Hainmueller at 9:50 AM

October 3, 2008

Regression Discontinuity Reversed

I recently came across a new paper by David Card, Alexandre Mas, and Jesse Rothstein entitled "Tipping and the Dynamics of Segregation." What's interesting from a methodological standpoint is that the authors use what may be called "inverted" regression discontinuity methods to test for race-based tipping in neighborhoods in American cities.

In a classic regression discontinuity design researchers commonly exploit the fact that treatment assignment changes discontinuously as a function of one or more underlying variables. For example scholarships may be assigned based on whether students exceed a test score threshold (like in the classic paper by Thistlethwaite and Campbell (1960)). Unlucky students who just miss the threshold are assumed to be virtually identical to lucky ones who score just above the cutoff value so that the threshold offers a clean identification of the counterfactual of interest (assuming no sorting).

In the Card et al. paper, the situation is slightly different because the authors have no hard-and-fast decision rule, but a theory that posits that whites' willingness to pay for homes depends on the neighborhood minority share and exhibits a tipping behavior. If the minority share exceeds a critical threshold, all the white households will leave. Since the location of the (city-specific) tipping point is unknown, the author's estimate it from the data and find that there are indeed significant discontinuities in the white population growth rate at the identified tipping points. Once the tipping point is located, they go on to examine whether rents or housing prices exhibit non-linearity around the tipping point but find no effects. They also try to explain the location of the tipping points by looking at survey data on racial attitudes of whites. Cities with more tolerant whites appear to have higher tipping points.

I think this is a very creative paper. The general approach could be useful in other contexts so take a look!


Posted by Jens Hainmueller at 8:10 AM

August 23, 2008

Dancing in the Moonlight? MPs with Outside Interests Vote Less

In Britain (and many other democracies) Members of Parliament often have "Outside Interests" and draw extra income from paid directorships, consulting gigs, or journalistic work. There has been plenty of controversy about MPs having such outside interests ranging from possible ethical issues to concerns over possible impact on MPs' legislative behavior. One concern is that MPs may be less effective as a representative if they "moonlight" from their Westminster jobs.

In a recent paper about the financial returns to serving in the House of Commons, Andy Eggers and I consider the relationship between outside interests and MPs' vote attendance (the percent of eligible votes personally attended or told) for the 2005-2007 period. We find that for both the Conservative and the Labour party, MPs with at least one (self-)reported outside interest (directorships, consultancies, and work in journalism) attended fewer votes compared to MPs with no outside interests; attendance rates are around 4-6 percentage points lower and the differences are all significant at conventional levels. The results are summarized in the jittergrams below (we excluded MPs that hold office as minister, speaker, whips, and chairman of standing committees who are not allowed to vote).


fig.PNG


The exception are directorships for Labour MPs where we cannot reject the null
of no difference. We also find no such difference for MPs when comparing those with and
without regular employment (such as work as a barrister, medical doctor, etc.).

There are obviously several other factors that may contribute to low attendance rates such as absence on constituency business, illness, paternity/maternity leaves, etc., but overall the results do suggest that outside interests distract MPs from their legislative work. This finding is also consistent with earlier work by Muller (1977), who found that sponsored Labour MPs were more active than other MPs on issues close to the interests of their sponsors (e.g. mining or railway issues), but that on the whole they were less active members of Parliament, participating in question time, standing committees, and debates far less than non-sponsored members.

If this whetted your curiosity you can analyze the voting data for yourself at publicwhip.

Posted by Jens Hainmueller at 5:03 PM

May 9, 2008

Adventures in Identification III: The Indiana Jones of Economics

fabulous three part series on further adventures in identification on the Freakonomics blogs here, here, and here. The story features Kennedy School Professor Robert Jensen in his five year long quest of achieving rigorous identification for Giffen effects. After finding correlational evidence for Giffen goods in survey data he and his co-author actually followed up by running an experiment in China and guess what, they do find evidence for Giffen behavior. Impressive empirics and a funny read, enjoy!

Posted by Jens Hainmueller at 2:16 PM

April 7, 2008

A Case Against Evidence Based Medicine?

smig95752.f1.gif

Seb just sent this very amusing paper (which he found in a comment to a post on Andrew Gelman's blog):

Objectives: To determine whether parachutes are effective in preventing major trauma related to gravitational challenge. Design: Systematic review of randomised controlled trials. Data sources: Medline, Web of Science, Embase, and the Cochrane Library databases; appropriate internet sites and citation lists. Study selection: Studies showing the effects of using a parachute during free fall. Main outcome measure: Death or major trauma, defined as an injury severity score > 15. Results: We were unable to identify any randomised controlled trials of parachute intervention. Conclusions: As with many interventions intended to prevent ill health, the effectiveness of parachutes has not been subjected to rigorous evaluation by using randomised controlled trials. Advocates of evidence based medicine have criticised the adoption of interventions evaluated by using only observational data. We think that everyone might benefit if the most radical protagonists of evidence based medicine organised and participated in a double blind, randomised, placebo controlled, crossover trial of the parachute.

Funny how such a lampoon can trigger a flame war on the BMJ website. Makes me understand why Gary writes about Misunderstandings between experimentalists and
observationalists about causal inference
...


Posted by Jens Hainmueller at 7:16 PM

April 5, 2008

Political Economy Students Conference

Dear students and colleagues,

We would like to invite you to attend the Political Economy Student Conference, to be held on April 17th in the NBER premises, in Cambridge, MA. The conference is an opportunity for students interested in political economy and other related fields to get together and discuss the open issues in the field, know what other people are working on, and share ideas. The program of the conference can be found at:

http://www.stanford.edu/group/peg/april_2008_conference/conference_program

This year, some members of the NBER Political Economy Group will be joining us for the conference. We are sure that we will greatly benefit from their comments and suggestions during the discussions.

We hope that those of you interested will attend the conference. The success of the conference largely depends on students' attendance and participation. Given that we have limited seats for the conference, please e-mail leopoldo (at) mit (dot) edu as soon as possible if you are interested in attending so that we can secure a spot for you.

Best regards,

Leopoldo Fergusson
Marcello Miccoli
Pablo Querubin

Posted by Jens Hainmueller at 5:04 PM

March 28, 2008

Visualizing Data with Processing

A friend just referred me to Processing, a powerful language for visualizing data:


Processing is an open source programming language and environment for people who want to program images, animation, and interactions. It is used by students, artists, designers, researchers, and hobbyists for learning, prototyping, and production. It is created to teach fundamentals of computer programming within a visual context and to serve as a software sketchbook and professional production tool. Processing is developed by artists and designers as an alternative to proprietary software tools in the same domain.

Their exhibition shows some very impressive results. For example, I liked the visualization of the London Tube map by travel time. I lived in Russel Square once, so this invoked pleasant memories:
carden.jpg.
If you can spare a minute also take a look at the other exhibited pieces. Most are art rather than statistics. For chess friends I especially recommend the piece called "Thinking Machine 4" by Martin Wittenberg, who gave a talk at the IQSS applied stats workshop in the fall. Enjoy!

thinking.jpg.

Posted by Jens Hainmueller at 7:43 AM

October 29, 2007

Visualizing Electoral Data

Andy Eggers and I are currently working on a project on UK elections. We have collected a new dataset that covers detailed information on races for the House of Commons between 1950 and 1970; seven general elections overall. We have spent some time thinking about new ways to visualize electoral data and Andy has blogged about this here and here. Today, I'd like to present a new set of plots that we came up with to summarize the closeness of constituency races over time. This is important for our project because we exploit close district races as a source of identification.

Conventional wisdom holds that in Britain, about one-quarter of all seats are 'marginal', ie. decided within majorities of less than 10 percentage points. To visualize this fact Andy and I came up with the following plot. Constituencies are on the x axis and the elections are on the y axis. Colors indicate the closeness of the district race (ie. vote majority / vote sum) categorized into different bins as indicated in the colorkey on top. Color scales are from Colorbrewer. We have ranked the constituencies from close to safe from left to right. Please take a look:

closewide.png

The same plot is available as a pdf here. The conventional wisdom seems to hold. About 30 percent of the races are close. Also some elections are closer than others.

A long format of the plot is available here. It allows to identify individual districts, but requires some scrolling. We are considering developing an interactive version using javascript so that additional info pops up as one mouses over the plot. Notice that both plots exclude the 50 or so districts that changed names as a result of the 1951 redistricting wave.

Finally, Andy and I care about districts that swing between the two major parties. To visualize this we have produced similar plots where the color now indicates the vote share margins as seen by the Conservative party: ((Conservative vote - Labour vote)/vote sum). So negative values indicate a Labour victory and positive values a victory of the Conservative party. We only look at districts where Labour or the Conservative party took first and second place. Here it is:

conswide.png

The partisan swings from election to election are really clear. Finally, the long format is here. The latter plot allows to easily identify the party strongholds during this time period. Comments and suggestions are highly welcome. We wonder whether anybody has done such plots before or whether we can legitimately coin them as Eggmueller plots (lol).


Posted by Jens Hainmueller at 8:13 PM

October 18, 2007

R Quiz Anybody?

Perl has the Perl quiz, Python has the Python challenges, Ruby has the Ruby quiz, but what about our good old friend R?? Does such a thing exist anywhere? Would be a nice idea I think...

Posted by Jens Hainmueller at 8:52 PM

September 20, 2007

Random Walks by Young Economists

I just came across this interesting article by Angus Deaton, who reflects on changing fashions in graduate work in recent years based on the recruiting for junior positions at Princeton's economics department. Princeton had eighteen candidates to come visit this year and Deaton is impressed by the "the breadth of topic that currently falls within the ambit of applied economics." While twenty years ago applied theses mostly focused on "traditional topics such as applied price theory and generally agreed-upon (preferably ‘frontier’) econometric methods", today's candidates seem to use much less theory, simpler econometrics, but work on topics as widely ranging as HIV/AIDS in Africa, child immunization in India, political bias of newspapers, child soldiering, racial profiling, rain and leisure choices, mosquito nets, malaria, treatment for leukemia, stages of child development, special education, war and democracy, etc. etc. He also observes a trend towards experimental methods in field settings; apparently one candidate even persuaded a Mexican city to pave a random selection of its streets.

I wonder whether other social science disciplines exhibit similar trends. In political science, it seems to me that there still is a strong focus on traditional topics and a reluctance to investigate more "exotic" (but socially important) topics because they apparently have "little to do with political science." However, one could argue that just as economics is everywhere, politics always has its role to play in most social phenomena. Also there is still very little work using field experiments (apart from important exceptions such as for example here or here). The same is true for quasi-experimental designs, which are still rarely used it seems to me. How about other disciplines?

Posted by Jens Hainmueller at 9:27 AM

April 10, 2007

What determines which statistical software you use?

I was recently involved in a discussion among fellow grad students about what determines which statistical software package people use to analyze their data. For example, this recent market survey lists 44 products selected from 31 vendors and they do not even include packages like R that many people around Harvard seem to use. Another survey conducted by Alan Zaslavsky lists 15 packages while `just’ looking at the available software for the analysis of surveys with complex sample designs. So how do people pick their packages given the plethora of options? Obviously, many factors will go into this decision (departmental teaching, ease of use, type of methods used, etc. etc. etc. ). One particularly interesting factor in our discussion concerned the importance of academic discipline. It seems to be the case that different packages are popular in different disciplines. But how exactly usage patterns vary across fields remains unclear. We wondered whether any systematic data exists on this issue? For example, how many political scientists use R compared to other programs? What about statisticians, economists, sociologists, etc.? Any information would be highly appreciated.

Posted by Jens Hainmueller at 10:12 PM

March 13, 2007

Which Color for your Figure?

ever wondered about what would be the best color for your graphs? While common in the sciences, it may be fair to say that the use of color in graphs is still under-appreciated in many social science fields. Colors can be a every effective tool to visualize data in many forms, because color is essentially a 3-d concept:

- hue (red, green, blue)
- value/lightness: (light vs. dark)
- saturation/chroma (dull vs. vivid)

From my limited understanding of this topic, not much scientific knowlegde exists about how color is best used. However, a few general principles have emerged from the literature. For example, sequential information (ordering) is often best indicated through distinction in lightness. The tricky part here is that indicating sequence with colors requires the viewer to remember the color ordering. A small number of colors should be used. One principle that is sometimes advocated is the use of a neutral color midpoint, that makes sense when there is a "natural" midpoint in the data. If so, you may want to distinguish above and below the midpoint, and use dark color1 -> light color1 -> white -> light color2 -> dark color2 (e.g., dark blue to dark red) . If no natural midpoint exists, one option is to use a single hue and just vary lightness (e.g., white/pink to dark red). Another idea is that categorical distinctions are best indicated through hue (e.g., red=higher than average, blue=lower than average). Read Edward Tufte and the cites therein for more ideas on the use of color. In addition, a nice online tool that helps you choose color in a principled way is ColorBrewer, a website definitely worth a visit. Many of the color schemes advocated there are also available in R in the ColorBrewer {RColorBrewer} library. Good luck!

Posted by Jens Hainmueller at 11:14 PM

February 27, 2007

Adventures in Identification II: Exposing Corrupt Politicians

Today we continue our voyage in the treasure quest for identification in observational studies. After our sojourn in Spain two weeks ago, the next stopover is in Brazil, where in a recent paper Claudio Ferraz and Frederico Finan discovered a nice natural experiment that allows to estimate the effect of transparency on political accountability. Many in the policy world are agog over the beneficial impact of transparency on good governance. Yet, empirical studies of this subject are often bedevilled by selection problems for obvious reasons. Ideally, we would like to find a situation in which changes in transparency are randomly assigned, which (also for obvious reasons) tends to be a low probability event. But is does happen. Turns out that in a recent anti-corruption program in Brazil, the federal government randomly audits 60 municipalities every month and then discloses the findings of the report to the municipality and the media. The authors exploit this variation and find that the dissemination of information on corruption, which is facilitated by media, does indeed have a detrimental impact on the incumbent’s electoral performance.

Here is the abstract of the paper:

Exposing Corrupt Politicians: The Effects of Brazil’s Publicly Released Audits on Electoral
Outcomes

This paper examines whether access to information enhances political accountability. Based upon the results of Brazil’s recent anti-corruption program that randomly audits municipal expenditures of federally-transferred funds, it estimates the effects of the disclosure of local government corruption practices upon the re-election success of incumbent mayors. Comparing municipalities audited before and after the elections, we show that the audit policy reduced the incumbent’s likelihood of re-election by approximately 20 percent, and was more pronounced in municipalities with radio stations. These findings highlight the value of information and the role of the media in reducing informational asymmetries in the political process.

Posted by Jens Hainmueller at 12:48 PM

February 13, 2007

Adventures in Identification I: Voting After the Bomb

Jens Hainmueller

I've decided to start a little series of entries under the header `Adventures in Identification.' The title is inspired by the increasing trend in the social sciences, in particular economics, public health, also political science, sociology, etc. to look for natural or quasi-experiments to identify causal effects in observational settings. Although there are of course plenty of bad examples of this type of study, I think the general line of research is very promising and the rising interest in issues of identification is commendable. Natural experiments often provide the only credible alternative to answer many of the questions we care about in the social sciences, where real experiments are often unethical or infeasible (or both) and observational data usually has selection bias written all over it. Enough said, let's jump right into the material: `Adventures in Identification I: Voting After the Bomb -- a Macabre Natural Experiments in electoral politics.

A recent question in political science and also economics is how terrorism effects democratic elections. Now clearly this seems a fairly tricky question to get some (identification) handle on. Heretic graduate students riding on their Rubin horses around IQSS will tell you two minutes into your talk that you can't just run a regression and call it `causal.' One setting where an answer may be (partly) possible is the case of the Spanish congressional elections in 2004. The incumbent conservative party led by Prime Minister Jose Maria Aznar had been favored to win by a comfortable margin according to opinion polls. On March 11, however, Islamic terrorists deposited nine backpacks full of explosive in several commuter trains in Madrid. The explosions killed 191 people and wounded 1,500. Three days later Spain's socialists under the lead of Jose-Luis Rodriguez Zapatero scored a stunning victory in the elections. Turnout was high and many have argued that voters seemingly expressed anger with the government, accusing it of provoking the Madrid attacks by supporting the U.S.-led war in Iraq, which most Spaniards opposed.

Now the question is how (if at all) the terrorist attacks affected the election result. As usual, only one potential outcome is observed and the crucial question is what the election results would have been like in the absence of the attacks. One could do a simple before and after study imputing this missing potential outcome based on some extrapolated pre-attacks trend in opinion polls. But then the question remains whether these opinion polls are an accurate representation of how people would have voted on election day. A difference-in-differences design seems better suited, but given that the attacks probably affected all voters a control group is hard to come by.

In a recent paper, Jose G. Montalvo, actually found a control group. Turns out that at the time the attacks hit, Spanish residents abroad had already cast their absentee ballots. Thus, they were not affected in their decision by the attacks. The author then sets up a diff-in-diffs exploiting voting trends in the treated group (Spanish residents) and the control group (Spanish citizens in a foreign country). He finds that the attacks had a large effect on the result to the benefit of the opposition party. Interestingly, this result seems to be different from the findings of other simple before and after studies on the topic (although I can't say because I have not read the other papers cited).

Of course, the usual disclaimers about DID estimates apply. Differential trends between the groups may exist if foreign residents perceived terrorism differently than Spanish residents over time. Foreign residents are probably very different than Spanish residents. But to the defense of the author, the results seem fairly robust given the checks he presents. And hey, it's a though question to ask and this provides a more appropriate way to get a handle on identifying the counterfactual outcome then simply comparing before and after.

Posted by Jens Hainmueller at 8:00 AM

February 6, 2007

Ask why...why, why, why

askwhy1.jpeg

Posted by Jens Hainmueller at 10:11 PM

January 30, 2007

The Role of Sample Size and Unobserved Heterogeneity in Causal Inference

Jens Hainmueller

Here is a question for you: Imagine you are asked to conduct an observational study to estimate the effect of wearing a helmet on the risk of death in motorcycle crashes. You have to choose one of two different data-sets for this study: Either a large, rather heterogeneous sample of crashes (these happened on different roads, at different speeds, etc.) or a smaller, more homogeneous sample of crashes (let's say they all occurred on the same road). Your goal is to unearth a trustworthy estimate of the treatment effect that is as close as possible to the `truth', i.e. the effect estimate obtained from an (unethical) experimental study on the same subject. Which sample do you prefer?

Naturally, most people tend to choose the large sample. Larger sample, smaller standard error, less uncertainty, better inference…we’ve heard it all before. Interestingly, in a recent paper entitled "Heterogeneity and Causality: Unit Heterogeneity and Design Sensitivity in Observational Studies" Paul Rosenbaum comes to the opposite conclusion. He demonstrates that heterogeneity, and not sample size matters for the sensitivity of your inference to hidden bias (a topic we blogged about previously here and here). He concludes that:

“In observational studies, reducing heterogeneity reduces both sampling variability and sensitivity to unobserved bias—with less heterogeneity, larger biases would need to be present to explain away the same effect. In contrast, increasing the sample size reduces sampling variability, which is, of course useful, but it does little to reduce concerns about unobserved bias.”

This basic insight about the role of unit heterogeneity in causal inference goes back to John Stuart Mill’s 1864 System of Logic. In this regard, Rosenbaum’s paper is a nice comparison to Jas’s view on Mill’s methods. Of course, Sir Fisher dismissed Mill for his plea for unit homogeneity because in experiments, when you have randomization working for you, hidden bias is not a real concern so you may as well go for the larger sample.

Now you may say: well it all depends on the estimand, no? Do I care about the effect of helmets in the US as a whole or only on a single road? This point is well taken, but keep in mind that for causal inference from observational data we often care about internal validity first and not necessarily generalizability (most experiments are also done on highly selective groups). In any case, Rosenbaum’s basic intuition remains and has real implications for the way we gather data and judge inferences. Next time you complain about a small sample size, you may want to think about heterogeneity first.

So finally back to the helmet example. Rosenbaum cites an observational study that deals with the heterogeneity issue in a clever way: “Different crashes occur on different motorcycles, at different speeds, with different forces, on highways or country roads, in dense or light traffic, encountering deer or Hummers. One would like to compare two people, one with a helmet, the other without, on the same type of motorcycle, riding at the same speed, on the same road, in the same traffic, crashing into the same object. Is this possible? It is when two people ride the same motorcycle, a driver and a passenger, one helmeted, the other not. Using data from the Fatality Analysis Reporting System, Norvell and Cummings (2002) performed such a matched pair analysis using a conditional model with numerous pair parameters, estimating approximately a 40% reduction in risk associated with helmet use.”

Posted by Jens Hainmueller at 8:30 AM

December 5, 2006

Causality in the Social Sciences Anybody?

Funny how there is no section on causal inference in the social sciences here? It says that to meet Wikipedia's quality standards, this article may require cleanup. Hopefully, somebody will find the time to contribute a social science section. Why not you? My guess is that readers of this blog know plenty about this topic...and the current entry is lacking a lot of what statistics has to say about causality.

Posted by Jens Hainmueller at 10:00 AM

November 27, 2006

Designing and Analyzing Randomized Experiments in Political Science

I just read a paper by Yusaku Horiuchi, Kosuke Imai, and Naoko Taniguchi (HIT) on "Designing and Analyzing Randomized Experiments." HIT draw upon the longstanding statistics literature on this topic and attempt to “pave the way for further development of more methodologically sophisticated experimental studies in political science.” While experiments are becoming more frequent in political science, HIT observe that a majority of recent studies do not randomize effectively and still ignore problems of noncompliance and or nonresponse.

Specifically, they offer four general recommendations:

(I) Researchers should obtain information about background characteristics of experimental subjects that can be used to predict their noncompliance, nonresponse, and the outcome.

(II) Researchers should conduct efficient randomization of treatments by using, for example, randomized-block and matched-pair designs.

(III) Researchers must make every effort to record the precise treatment received by each experimental subject.

(IV) Finally, a valid statistical analysis of randomized experiments must properly account for noncompliance and nonresponse problems simultaneously.

Take a look. I agree with HIT that these issues are not new, yet too often ignored in political science (exceptions acknowledged). HIT illustrate their recommendations using a carefully crafted online experiment on Japanese elections. Statistically, they employ a Bayesian approach using the general statistical framework of randomized experiments with noncompliance and nonresponse (Angrist, Imbens, and Rubin 1996; Imbens and Rubin 1997; Frangakis and Rubin 1999, 2002). There is also interesting new stuff on modeling causal heterogeneity in this framework (a big topic in of itself).

Posted by Jens Hainmueller at 12:19 PM

November 21, 2006

Back to the Drawing Board?

thumb.jpg

Have you ever been to a social science talk and heard somebody saying things like "i guess I will have to go back to the drawing board…" I always wondered what that really meant, until an engineering friend of mine suggested taking a look at this.

Maybe we can get one for the IQSS?

Posted by Jens Hainmueller at 11:34 AM

November 10, 2006

Chernoff Faces

We haven't had much on graphics on this blog yet, partly because there are several specialized fora for this peculiar aspect of statistics: for instance, junkcharts, the R-gallery, information aesthetics, the Statistical Graphics and Data Visualization blog, the Data Mining blog, Edward Tufte's forum, Andrew Gelman's blog and others. Yet, I assume readers of this blog wouln't mind a picture every once in a while, so here are some Chernoff faces for you right there. In spirit of Mike's recent entry, they illustrate team statistics from the 2005 baseball season:

faces.png

I recently came across the Chernoff faces while looking for a neat way to display multivariate data to compare several cities along various dimensions in a single plot. Chernoff faces are a method introduced by Herman Chernoff (Prof Emeritus of Applied Math at MIT and of Statistics at Harvard) in 1971 that allows one to convert multivariate data to cartoon faces, the features of which are controlled by the variable values. So for example in the above graph, each teams winning percentage are represented by face height, smile curve, and hair styling; hits are represented by face width, eye height, nose height; etc. (for details and extensions see here).

The key idea is that human are well trained to recognize faces and discern small changes without difficulty. Therefore Chernoff faces allow for easy outlier detection and pattern recognition despite multiple dimensions of the data. Since the features of the faces vary in perceived importance, the way in which variables are mapped to the features should be carefully chosen.

Mathematica and R have canned algorithms for Chernoff faces (see here and here). I haven't seen a Chernoff plot in a social science journal yet, but maybe I am reading the wrong journals. Does anyone know articles that use this technique? Also do you think that this is an effective way of displaying data that should be used more often? Obviously there are also problems with this type of display, but even if you don't like the key idea you have to admit that they look much funnier then the boring bar-graphs or line plots we see all the time.

Posted by Jens Hainmueller at 10:29 AM

November 2, 2006

Incumbency as a Source of Contamination in Mixed Electoral Systems

Jens Hainmueller

Since the early 1990s, more than 30 countries have adopted mixed electoral systems that combine single-member districts (SMD) in one tier with proportional representation (PR) in a second tier. Political scientists like these type of electoral systems because each voter gets to cast two votes, the first vote according to one set of institutional rules and the second vote according to another. Some have argued that this allows for causal inference because it offers a controlled comparison of voting patterns under different electoral rules. But does it really?

The more recent literature on so called contamination effects undermines this claim. Several papers (Herron and Nishikawa 2001; Cox and Schoppa 2002; Ferrara, Herron, and Nishikawa 2005) have found evidence that there are interaction effects between the two tiers in mixed electoral systems. For example, small parties are able to attract more PR votes in those districts in which they run SMD candidates. The argument is that running a SMD candidate gives a human face to the party and thus enables it to attract additional PR votes.

In a recent paper, Holger Kern and I attempt to add to this debate by identifying incumbency as a source of contamination in mixed electoral systems. It is well known that incumbents that run in single-member district (SMD) races have a significant advantage compared to non-incumbents (Gelman and King 1990). It thus seems plausible to expect that this advantage carries over to the proportional representation (PR) tier, and that incumbents are able to attract additional PR votes for their party in the district. In our paper we identify such an effect using a regression-discontinuity design that exploits the local random assignment to incumbency in close district races (based on an earlier paper by Lee 2006). The RD design allows us to separate a subpopulation of district races in which treatment is assigned as good as randomly from the rest of the data that is tainted by selection effects. We find that incumbency causes a gain of 1 to 1.5 percentage points in PR vote share. We also present simulations of Bundestag seat distributions, demonstrating that contamination effects caused by incumbency have been sufficiently large to trigger significant shifts in parliamentary majorities.

Needless to say, any feedback is highly appreciated.

Posted by Jens Hainmueller at 12:00 PM

October 29, 2006

America by the Numbers

Reading the Data Mining blog, I just learned about this cool visualization of the US population density presented by Time magazine.

Take a closer look here. Cute, isn't it?

Posted by Jens Hainmueller at 3:14 PM

October 12, 2006

Causation and Manipulation VII: The Cartoon Version

Doging Bill Collectors

As Tailor (A) fits customer (B) and calls out measurements, college boy (C) mistakes them for football signals and makes a flying tackle at clothing dummy (D). Dummy bumps head against paddle (E) causing it to pull hook (F) and throw bottle (G) on end of folding hat rack (H) which spreads and pushes head of cabbage (I) into net (J). Weight of cabbage pulls cord (K) causing shears (L) to cut string (M). Bag of sand (N) drops on scale (O) and pushes broom (P) against pail of whitewash (Q) which upsets all over you causing you to look like a marble statue and making it impossible for you to be recognized by bill collectors. Don't worry about posing as any particular historical statue because bill collectors don't know much about art (more on causal chains in cartoons click here).

Posted by Jens Hainmueller at 11:00 PM

October 3, 2006

Causation and Manipulation II: The Causal Effect of Gender?

Jens Hainmueller

In a recent post, Jim Greiner asked whether we adhere to the principle of "no causation without manipulation." This principle, if true, raises the question of whether it makes sense to talk about the causal effect of gender.

The Rubin/Holland position on this is clear: it makes no sense to talk about the causal effect of gender because what manipulation and thus what counterfactual one has in mind (a sex-transformation surgery?) is clearly ill-defined. One can ask related questions like sending resumes to employers randomizing female and male names and see whether one gender is more likely to be invited to a job interview, but it makes no sense to think about a causal effect of gender per se.

The contrasting view is presented by one of their main foils, James Heckman, who writes in a recent paper (Andrew Gelman also had a blog post on this): "Holland claims that there can be no causal effect of gender on earnings. Why? Because we cannot randomly assign gender. This confused statement conflates the act of definition of the causal effect (a purely mental act) with empirical difficulties in estimating it. This type of reasoning is prevalent in statistics. As another example of the same point, Rubin (1978, p. 39) denies that it is possible to define a causal effect of sex on intelligence because a randomization cannot in principle be performed. In this and many other passages in the statistics literature, a causal effect is defined by a randomization. Issues of definition and identification are confused. [...] the act of definition is logically separate from the acts of identification and inference." Heckman sees this as a "view among statisticians that gives rise to the myth that causality can only be determined by randomization, and that glorifies randomization as the ‘‘gold standard’’ of causal inference."

So what do you make of this? Does it make sense to think about a causal effect of gender or not? Does it make sense to try to estimate it, i.e. interpret a gender gap in wages as causal (balance on all confounders except gender). How about the causal effect of race, etc.? Just to be precise here notice that Rubin/Holland admit that "even thought it may not make much sense to talk about the 'causal' effect of a person being a white student versus being a black student, it can be interesting to compare whites and blacks with similar background characteristics to see if there are differences" in some outcome of interest.

Posted by Jens Hainmueller at 10:00 PM

September 10, 2006

The Tenth Dimension

The semester is about to start, which means it is math camp time at the Government Department. The very first topic is usually an introduction to dimensions, starting from R1 (lines), to R2 (planes), to R3 (3D planes), to R4 (3D plane plus time). Here is a nice flash animation (click on “imagining ten dimensions” on the left) that takes you a step further, from zero to ten dimensions in less than 5 minutes (including cool visual and acoustic effects). It doesn’t necessarily become more graspable as you ascend ... :-)

Posted by Jens Hainmueller at 8:26 AM

May 9, 2006

Running Statistics On Multiple Processors

Jens Hainmueller

You just bought a state-of-the-art PC with dual processors and yet your model still runs forever? Well, your statistical software is probably not multi-threading, meaning that despite the fact that your computer actually has two processors, the whole computation runs only on one of them. Don’t believe me? Well check your CPU usage, it's probably stuck at 50 percent (or less).

You might ask why statistical software doesn't use both processors simultaneously. The fact is that splitting up computations to two or even more processors is a non-trivial issue that many software packages do not accomplish yet. This may change in the near future, however, as the advent of dual processors for regular PCs exhibits increasing pressure on statistical software producers to allow for multi-threading.


In fact, Stata Corp. has recently released Stata/MP, a new version of Stata/SE that runs on multiprocessor computers. Their website proclaims that: "Stata/MP provides the most extensive support for multiple-processor computers and dual-core computers of any statistics and data-management package." So this bodes well for Stata users.

What’s in it for Non-Stataists? People at S-PLUS told me yesterday that there is "currently an enhancement request to add functionality to S-PLUS that will allow it to use multiple processors. This request has been submitted to our developers for further review." Unfortunately no further information is available at this point.

In my favourite software R, there are efforts to get concurrency and potentially parallelism. Currently, the SNOW package allows for simple parallel computing.

It will be interesting to see how other statistical software producers like SAS, LIMDEP, etc. will react to this trend toward dual processing. Does anybody have more information about this issue?

Posted by Jens Hainmueller at 6:00 AM

May 4, 2006

Detecting Attempted Election Theft

At the Midwest conference last week I saw Walter Mebane presenting his new paper entitled "Detecting Attempted Election Theft: Vote Counts, Voting Machines and Benford's Law." The paper is really fun to read and contains many cool ideas about how to statistically detect vote fraud in situations where only minimal information is available.

With the advent of voting machines that replace traditional paper ballots physically verifying vote counts becomes impossible. As Walter Mebane puts it: "To steal an election it is no longer necessary to toss boxes of ballots in the river, stuff the boxes with thousands of phony ballots, or hire vagrants to cast repeated illicit votes. All that may be needed nowadays is access to an input port and a few lines of computer code.?

How does Mebane utilize statistical tools to detect voting irregularities? He relies on two sets of tests:

The first test relies on Benford’s Law. The idea here is that if individual votes originate from a mix of at least two statistical distributions there may be a rationale to expect that the distribution of the digits in reported vote counts should satisfy the second digit Benford's law. Walter provides simulations showing that the Benford's Law test is sensitive to some kinds of manipulation of vote counts but not others.

The second set of tests relies on randomization. The idea is based on the assumption that in each precinct (especially crowded ones) voters may be randomly and independently assigned to each machine used in the precinct. The test involves checking whether the split of the votes is the same on all the machines used in a precinct. If some of the machines were indeed hacked, the distribution of the votes among candidates would differ on the affected machines. Mebane tests these expectations against data from three Florida counties with very interesting findings.

In general, the paper was very well received by the audience. Some attendees raised concerns about the randomization test, arguing that voters may not be randomly assigned to voting machines (for example old voters may be more likely to go to the first machine in line etc.). The discussant, Jonathan Wand, raised the idea of actually using random assignment of voters to voting machines as an administrative tool to facilitate fraud detection ex post. He also proposed to use sampling techniques to make recounts more feasible (but that would require voting machines that do leave a paper trail). Another comment alluded to the fact that if somebody smart wants to steal an election, he or she might anticipate some of Walter's tests and design manipulations so that they satisfy the test.

Overall, my impression is that although his research is admittedly still at an early stage, Mebane is onto something very cool here and I am eager to see the redrafts and more results in the future. This is a very important topic given that more and more voting machines will be used in the future. Everybody interested in the vote fraud should read this paper.

Posted by Jens Hainmueller at 6:00 AM

April 20, 2006

Explaining Individual Attitudes Toward Immigration in Europe

Jens Hainmueller and Michael Hiscox

We have written a paper that investigates individual attitudes toward immigration in 22 European countries. In line with our research on individual attitudes toward trade policies (see previous blog entries here, here, and here), we find that a simple labour market model (a la Heckscher-Ohlin) does not do very well in accounting for preferences at the individual level. This finding resonates well with economic theory, given that more recent economic models are actually quite equivocal about whether immigrants will have an adverse impact on the wages or employment opportunities of local workers with similar skills (see our discussion of these models here).

Please find our abstract after the jump. Here is the link to the paper. As always, comments are highly appreciated.

Educated Preferences: Explaining Attitudes Toward Immigration In Europe:

Recent studies of individual attitudes toward immigration emphasize concerns about labor market competition as a potent source of anti-immigrant sentiment, in particular among less-educated or less-skilled citizens who fear being forced to compete for jobs with low-skilled immigrants willing to work for much lower wages. We examine new data on attitudes toward immigration available from the 2003 European Social Survey. In contrast to predictions based upon conventional arguments about labor market competition, which anticipate that individuals will oppose immigration of workers with similar skills to their own, but support immigration of workers with different skill levels, we find that people with higher levels of education and occupational skills are more likely to favor immigration regardless of the skill attributes of the immigrants in question. Across Europe, higher education and higher skills mean more support for all types of immigrants. These relationships are almost identical among individuals in the labor force (i.e., those competing for jobs) and those not in the labor force. Contrary to the conventional wisdom, then, the connection between the education or skill levels of individuals and views about immigration appears to have very little, if anything, to do with fears about labor market competition. This finding is consistent with extensive economic research showing that the income and employment effects of immigration in European economies are actually very small. We find that a large component of the effect of education on attitudes toward immigrants is associated with differences among individuals in cultural values and beliefs. More educated respondents are significantly less racist and place greater value on cultural diversity than their counterparts; they are also more likely to believe that immigration generates benefits for the host economy as a whole.

Posted by Jens Hainmueller at 6:00 AM

April 6, 2006

New Immigrant Survey (NIS 2003) Online

Jens Hainmueller

Great News for people studying immigration: The first-full cohort module of the New Immigrant Survey (NIS 2003) is now online. The NIS is "a nationally representative multi-cohort longitudinal study of new legal immigrants and their children to the United States based on nationally representative samples of the administrative records, compiled by the U.S. Immigration and Naturalization Service (INS), pertaining to immigrants newly admitted to permanent residence."

The sampling frame consists of new-arrival and adjustee immigrants. The Adult Sample covers all immigrants who are 18 years of age or older at admission to the Lawful Permanent Residence (LPR) program. There is also a Child Sample, which covers immigrants with child-of-U.S.-citizen visas who are under 18 years of age and adopted orphans under five years of age. Overall 8,573 adults and 810 children were interviewed. This constiutes a response rate of about 65%.

The NIS features a wide variety of questions regarding demographics, pre-immigration experiences, employment, health, health and life Insurance, health care utilization and daily activities, income, assets, transfers, social variables, migration history, etc. There is also the controversial and much discussed skin color scale test, where the survey measured respondent skin color using an 11-point scale, ranging from zero to 10, with zero representing albinism,
or the total absence of color, and 10 representing the darkest possible skin. The Scale was memorized by the interviewers, so that the respondent never sees the chart. Check out the ten shades of skin color corresponding to the points 1 to 10 and a description of the skin color test here.


Posted by Jens Hainmueller at 6:00 AM

March 22, 2006

Valid Standard Errors for Propensity Score Matching, Anyone?

Jens Hainmueller

Propensity Score Matching (PSM) has become an increasingly popular method to estimate treatment effects in observational studies. Most papers that use PSM also provide standard errors for their treatment effect estimates. I always wonder where these standard errors actually come from. To my knowledge there still exists no method to calculate valid standard errors for PSM. What do you all think about this topic?

The issue is this: Getting standard errors for PSM works out nicely when the true propensity score is known. Alberto and Guido have developed a formula that provides principled standard errors when matching is done with covariates or the true propensity score. You can read about it here. This formula is used by their nnmatch matching software in Stata and Jasjeet Sekhon’s matching package in R.

Yet, in observational settings we do not know the true propensity score so we first have to estimate it. Usually people regress the treatment indicator on a couple of covariates using a probit or logit link function. The predicted probabilities from this model are then extracted and taken as the estimated propensity score to be matched on in the second step (some people also match on the linear predictor, which is desirable because it does not tend to cluster so much around 0 and 1).

Unfortunately, the abovementioned formula does not work in the case of matching on the estimated propensity score, because the estimation uncertainty created in the first step is not accounted for. Thus, the confidence bounds on the treatment effect estimates in the second step will most likely not have the correct coverage.

This issue is not easily resolved. Why not just bootstrap the whole two-step procedure? Well, there is evidence to suggest that the bootstrap is likely to fail in the case of PSM. In the closely related problem of deriving standard errors for conventional nearest neighbor matching Guido and Alberto show in a recent paper, that even in the simple case of matching on a single continuous covariate (when the estimator is root-N consistent and asymptotically normally distributed with zero asymptotic bias) the bootstrap does not provide standard errors with correct coverage. This is due to the extreme non-smoothness of nearest neighbor matching which leads the bootstrap variance to diverge from the actual variance.

In the case of PSM the same problem is likely to occur unless estimating the propensity score in the first step makes the matching estimator smooth enough for the bootstrap to work. But this is an open question. At least to my knowledge there exists no Monte Carlo evidence or theoretical justification for why the bootstrap should work here. I would be interested to hear opinions on this issue. It’s a critical question because the bootstrap for PSM is often done in practice, various matching codes (for example pscore or psmatch2 in Stata) do offer bootstrapped standard errors options for matching on the estimated propensity score.

Posted by Jens Hainmueller at 6:00 AM

February 27, 2006

Resources for Multiple Imputation

Jens Hainmueller

As applied researchers, we all know this situation all too well. Like the alcoholic standing in front of the bar that is just about to open, you just downloaded (or somehow compiled) a new dataset. You open your preferred statistical software and begin to investigate the data. And there again you are struck by lightening: Holly cow - I have missing data!! So what do you do about it? Listwise deletion as usual? In the back of your mind you recall your stats teacher saying that listwise deletion is unlikley to result in valid estimates but hitherto you have simply ignored these caveats. Don't be a fool, you can do better -- use multiple imputation (MI).

As is well known in the statistcial literature on the missing data problem, MI is not the silver bullet for dealing with missing values. In some cases, better (primarily more efficent) estimates can be obtained using weighted estimation procedures or specialized numerical methods (EM, etc.) Yet, these methods are often complicated and problem specific and thus not for the faint of heart applied researcher. MI in contrast is relatively easy to implement and works well in most instances. Want to know how to MI? I suggest you take a look at www.multiple-imputation.com, a website that brings together various ressources regaring the method, software, and literature citations that will help you to add MI to your toolkit. A nice (non-technical) introduction is also provided on Joseph Schafer's multiple imputation FAQ page. Gary and co-authors have also written extensivley on this subject offering lots of practical advice for applied rearchers. Last but not least, I recommend searching for "multiple imputation" on Andrew Gelman's blog; you will find many of interesting entries on the topic. Good luck!

Posted by Jens Hainmueller at 6:00 AM

February 17, 2006

Do People Think like Stolper-Samuelson? Part III

Jens Hainmueller and Michael Hiscox

In two previes entries here and here we wrote about a recent paper that re-examines the available evidence for the prominent claim that public attitudes toward trade follow the Stolper-Samuelson theorem (SST). We presented evidence that is largely at odds with this hypothesis. In this posting, we take issue with the last specific finding in this literature that has been interpreted as strong support for the SST: The claim that the skill effect of trade preferences is proportional to a country’s factor endowment. What the heck does this mean?

Recall that the according to the SST, skilled individuals will gain in terms of real wages (and thus should be likely to favor trade openness) in countries that are abundantly endowed with skilled labor, but the size of those gains should be proportional to the degree of skill abundance in each country. Of course, in countries that are actually poorly endowed with skilled labor relative to potential trading partners, those gains should become losses.

The seminal paper on this topic, Rodrik and Mayda (2004), shows evidence supporting this idea that the skill effect (proxied by education) is proportional to a country’s factor endowment: they find the largest positive effects in the richest (i.e. most skill adundant) and smaller positive effects in the somewhat poorer (skill scare) countries in their sample. For the only really poor country in their survey sample, the Philippines, they even find a (significant) negative effect (i.e. more educated are less likely to support trade liberalization). This finding constitutes R&M's smoking gun evidence that preferences do indeed follow the SST - the finding very often cited in the literature.

The central problem with the R&M findings, which are mainly based on data from the International Social Survey Programme (ISSP), is the lack of skill scare countries in their sample. Their data thus does not allow for a comprehensive test of the claim that the skill effect of trade preferences is proportional to a country’s factor endowment, simply because most countries in their sample are skill abundant, relatively rich economies. In the supplement to a recent paper we specifically reexamine the R&M claim, using data from the Global Attitudes Project survey administered by Pew in 2002. The PEW data has not been examined by scholars interested in attitudes toward trade, although it has some key advantages compared to the other datasets that have been used (ISSP, etc.). Most importantly, it covers a much broader range of economies that are very heterogeneous in terms of their levels of skill endowments. The PEW data does not only covers the Philippines, but additionally 43 countries, many of which are skill scare.

This figure summarizes our results from the PEW data. It plots the estimated marginal effect of an additional year of schooling on the probability of favouring free trade (evaluated at the sample means, using country specifc ordered probit models) against skill endowment as measured by the log of GDP per capital in 2002 (PPP). The solid diamonds decode the point estimates and the dashed lines shows the .90 confidence envelopes.

Two main findings emerge here: First, there is no clear relationship between the marginal effect of education on support for trade among respondents and their countries’ skill endowments. The pattern more resembles that of a drawing by expressionist painter Jackson Pollack than that of a clear upwards sloping line (what one would predict based upon a simple application of Stolper-Samuelson). Second, in all countries increased schooling has either a positive or zero effect on the probability of supporting free trade. This includes the Philippines which is the only case of a country abundant in low-skills for which Mayda Rodrik and found a negative relationship. Moreover, even most of the point estimates are positive, except for Canada, Ivory Coast, Mali, and Nigeria; not quite a cluster of countries with common skill endowments!

Overall these results strongly suggest that the impact of education levels on support for trade among individuals is not driven by differences in skill endowments across countries (and individual concerns about wage levels) as suggested by a simple application of the Stolper-Samuelson theorem.

Posted by Jens Hainmueller at 6:00 AM

February 1, 2006

Do People Think like Stolper-Samuelson? Part I

Jens Hainmueller and Michael Hiscox

In face of the fierce political disagreements over free trade taking place in the US and elsewhere, it's critical we try to understand how people think about trade policies. A growing body of scholarly research has examined survey data on attitudes toward trade among voters, focusing on individual determinants of protectionist sentiments. These studies have converged upon one central finding: fears about the distributional effects of trade openness among less-educated, blue-collar workers lie at the heart of much of the backlash against globalization in the United States and other advanced economies. Support for new trade restrictions is highest among respondents with the lowest levels of education (e.g., Scheve and Slaughter 2001a, 2001b; Mayda and Rodrik 2005; O’Rourke and Sinnott 2002). These findings are interpreted as strong support for the Stolper-Samuelson theorem, a classic economic treatment of the income effects of trade. It predicts that trade openness benefits those owning factors of production with which their economy is relatively well endowed (those with high skill levels in the advanced economies) while hurting others (low skilled and unskilled workers).


But is it really true that people think like Stolper-Samuelson (i.e. that more educated people favour trade because it will increase their factor returns)? The positive relationship between education and support for trade liberalization might also – and perhaps primarily – reflect the fact that more educated respondents tend to be more exposed to economic ideas about the overall efficiency gains for the national economy associated with greater trade openness, and tend to be less prone to nationalist and anti-foreigner sentiments often linked with protectionism. In our recent paper “Learning to Love Globalization:
Education and Individual Attitudes Toward International Trade“
we try to shed light on this issue. More on this in a subsequent post tomorrow.

Posted by Jens Hainmueller at 6:00 AM

January 26, 2006

Stats Games

Jens Hainmueller

January is exam period at Harvard. Since exams are usually pretty boring, I sometimes get distracted from learning by online games. Recently, I found a game that may even be (partly) useful for exam preparation, at least for an intro stats class. Yes, here it is a computer game about statistics: StatsGames. StatsGame is a collection of 20 games or challenges designed to playfully test and refine your statistical thinking. As the author of StatsGames, economist Gary Smith, admits: "These games are not a substitute for a statistics course, but they may give you an enjoyable opportunity to develop your statistical reasoning." The program is free and runs on all platforms. Although the graphical makeup somewhat reminds me of the days when games were still played on Atari computers, most of the games in the collection are really fun. My favorites include the Batting Practice (a game to teach students to use the binomial distribution to test the hypothesis whether you are equally likely to swing late or early) and the Stroop effect (a game featuring a simple cognitive science type experiment which is then evaluated using the F-test). I also enjoyed the simulation of Galton's Apparatus. Go check it out! But don't waste too much exam preparation time of course - and good luck if you have any exams soon! I also wonder whether there are other computer games about statistics out there. Any ideas?

Posted by Jens Hainmueller at 6:00 AM

January 16, 2006

Against All Odds-Ratios

Jens Hainmueller

I've decided this blog needs more humor: Check out some Statistics Movies that Never Made it to the Big Screen. Or even better: What's the question the Cauchy distribution hates the most? "Got a moment?"

Posted by Jens Hainmueller at 1:25 AM

December 8, 2005

What Did (and Do We Still) Learn from the La Londe Dataset (Part I)?

Jens Hainmueller

In a pioneering paper, Bob La Londe (1986) used experimental data from the National Supported Work Demonstration Program (NSW) as well as observational data from the Current Population Survey (CPS) and the Panel Study of Income Dynamics (PSID) to evaluate the reliability with which conventional estimators recover the experimental target estimate. He utilized the experimental data to establish a target estimate of the average treatment effect, then replaced the experimental controls with several control groups built from the general population surveys. Then he re-estimated the effects using conventional estimators. His crucial finding was that conventional regression as well as tweaks such as instrumental variables etc. get it wrong, i.e. they do not reliably recover the causal effects estimated in the experimental data. This is troubling, of course, because usually we do not know what the correct answer is, so we simply accept the estimates that our conventional estimators spit out, not knowing how wrong we may be.

This finding (and others) sparked a fierce debate in both econometrics and applied statistics. Several authors have used the same data to evaluate other estimators, such as several matching estimators and related techniques. In fact, today the La Londe data is THE canonical dataset in the causal inference literature. It has not only been used for many articles, it has also been widely distributed as a teaching tool. I think it’s about time we stand back for a second and ask two essential questions: (1) What have we learned from the La Londe debate? (2) Does it makes sense to beat this dataset any further or have we essentially exhausted the information that can be extracted from this data and need to move one to new datasets? I wholeheartedly invite everybody to join the discussion. I will provide some suggestions in a subsequent post tomorrow.

Posted by Jens Hainmueller at 4:33 AM

November 29, 2005

Beyond Standard Errors, Part I: What Makes an Inference Prone to Survive Rosenbaum-Type Sensitivity Tests?

Jens Hainmueller

Stimulated by the lectures in Statistics 214 (Causal Inference in the Biomedical and Social Sciences), Holger Kern and I have been thinking about Rosenbaum-type tests for sensitivity to hidden bias. Hidden bias is pervasive in observational settings and these sensitivity tests are a tool to deal with it. When done with your inference, it seems constructive to replace the usual qualitative statement that hidden bias “may be a problem? with a precise quantitative statement like “in order to account for my estimated effect, a hidden bias has to be of magnitude X.? No?

Imagine you are (once again) estimating the causal effect of smoking on cancer and you have successfully adjusted for differences in observed covariates. Then you estimate the “causal? effect of smoking and you’re done. But wait a minute. Maybe subjects who appear similar in terms of their observed covariates actually differ in terms of important unmeasured covariates. Maybe there exists a smoking gene that causes cancer and makes people smoke. Did you achieve balance on the smoking gene? You have no clue. Are your results sensitive to this hidden bias? How big must the hidden bias be to account for your findings? Again, you have no clue (and so neither does the reader of your article).

Enter stage Rosenbaum type sensitivity tests. These come in different forms but the basic idea is similar in all of them. We have a measure, call it (for lack of latex in the blog) R, which gives the degree to which your particular smoking study may deviate from a study that’s free of hidden bias. You assume that two subjects with the same X may nonetheless differ in terms of some unobserved covariates, so that one subject has an odds of receiving the treatment that is up to Gamma ≥ 1 times greater than the odds for another subject.. So, for example, Gamma=1 would mean your study is indeed free of hidden bias (like a big randomized experiment), and Gamma=4 means that two subjects who are similar on their observed X can differ on unobservables such that one could be four times as likely as the other to receive treatment.

The key idea of the sensitivity test is to specify different values of Gamma and check if the inferences change. If your results break down at Gamma values just above 1 already, this is bad new. We probably should not trust your findings, because the difference in outcome data you found may not be caused by your treatment but may instead be due to an unobserved covariate that you did not adjust for. But if the inferences hold at big values of Gamma, let’s say 7, then your results seems very robust to hidden bias. (That’s what happened in the smoking on cancer case btw). Sensitivity tests allow you to shift the burden of proof back to the critics who laments about hidden bias: Please, Mr. Knows it all, go and find me this “magic omitted variable? which is so extremely imbalanced but strongly related to treatment assignment that it is driving my results.

More on this subject in a subsequent post.

Posted by Jens Hainmueller at 4:24 AM

November 16, 2005

Fun with bad graphs

Just read this nice entry on Andrew Gelman's blog about junk graphs . Somebody complemented the entry by posting a link to another site by Karl Broman in the Biostatistics department of Johns Hopkinson. In case you missed this please take a look. We all make these mistakes, but it's actually really funny...


Posted by Jens Hainmueller at 2:09 PM

October 12, 2005

Estimating the Causal Effect of Incumbency

Jens Hainmueller

Since the early seventies, political scientists have been interested in the causal effects of incumbency, i.e. the electoral gain to being the incumbent in a district, relative to not being the incumbent. Unfortunately, these two potential outcomes are never observed simultaneously. Even worse, the inferential problem is compounded by selection on unobservables. Estimates are vulnerable to hidden bias because there probably is a lot of unobserved stuff that’s correlated with both incumbency and electoral success (such as candidate quality, etc.) that you cannot condition on. To identify the incumbency advantage, estimates had to rely on rather strong assumptions. In a recent paper entitled "Randomized Experiments from Non-random Selection in U.S. House Elections", economist David Lee took an innovative whack at this issue. He employs a regression discontinuity design (RDD) that tackles the hidden bias problem based on a fairly weak assumption.

Somewhat ironically, this technique is rather old. The earliest published example dates back to Thistlethwaite and Campbell (1960). They examine the effect of scholarships on career outcomes by comparing students just above and below a threshold for test scores that determines whether students were granted the award. The underlying idea is that in the close neighborhood of the threshold, assignment to treatment is as good as random. Accordingly, unlucky students that just missed the threshold are virtually identical to lucky ones who scored just above the cutoff value. This provides a suitable counterfactual for causal inference. Take a look at the explanatory graph for the situation of a positive causal effect and the situation of no effect.

See the parallel to the incumbency problem? Basically, the RDD works in settings in which assignment to treatment changes discontinuously as a function of one or more underlying variables. Lee argues that this is exactly what happens in the case of (party) incumbency. In a two party system, you become the incumbent if you exceed the (sharp) threshold of 50 percent of vote share. Now assume that parties usually do not exert perfect control over their observed vote share (observed vote share = true vote share + error term with a continuous density). The closer the race, the more likely that random factors determine who ends up winning (just imagine the weather had been different on election day).

Incumbents that did barely win the previous election are thus virtually identical to non-incumbents that did barely lose. Lee shows that as long as the covariate that determines assignment to treatment includes a random component with a continuous density, treatment status close to the threshold is (in the limit) statistically randomized. The plausibility of this identification assumption is a function of the degree to which parties are able to sort around the threshold. And the cool thing is that you can even test whether this identifying assumption holds - at least for the observed confounders – by using common covariate balance tests.

There is no free lunch, of course. One potential limitation of the RDD is that it identifies the incumbency effect only for close elections. However, one could argue that when looking at the incumbency advantage, marginal districts are precisely the subpopulation of interest. It is only in close elections that the incumbency advantage is likely to make any difference. Another potential limitation is that the RDD identifies the effect of “party? incumbency, which is not directly comparable to earlier estimates of incumbency advantage that focused on “legislator? incumbency advantage. Party incumbency subsumes legislator incumbency, but also contains a seperate party effect and there is no chance to disentangle the two. So surley, the RDD design is no paneca. Yet, it can be used to draw causal inferences from observational data based on weaker assumptions that previously employed in this literature.

The Lee paper has led to a surge in the use of the RDD in political science. Incumbency effects have been re-estimated not only for US House elections, but also for other countries as diverse as India, Great Britain, and Germany. It has also been used to study split-party delegations in the Senate. There may be other political settings in which the RDD framework can be fruitfully applied. Think about it - before economists do :-)

Posted by Jens Hainmueller at 6:48 AM