February 2006
Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28        

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries

Categories

Blogroll

SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« January 2006 | Main | March 2006 »

28 February 2006

Thoughts on SUTVA (Part I)

Alexis Diamond, guest blogger

I gave a talk on Wed, Feb 8 at the IQSS methods workshop where I described my efforts to estimate the effects of UN intervention and UN peacekeeping on peacebuilding success following civil war. One of my goals was to demonstrate how matching-based methods and the Rubin model of causal inference can be helpful for answering questions in political science, particularly in fields like comparative politics and international relations.

An important issue in this context relates to Rubin's SUTVA, the stable-unit-treatment-value assumption typically assumed whenever matching-based methods are performed. SUTVA requires that the potential outcome for any particular unit i following treatment t is stable, "in the sense that it would take the same value for all other treatment allocations such that unit i receives treatment t (Rubin 1990, p. 282). This is a stronger form of a basic assumption at the heart of the Rubin causal model: that for every set of allowable treatment allocations across units, there is a corresponding set of fixed (non-stochastic) potential outcomes that would be observed.

Rubin (1990) goes on to say that "The two most common ways in which SUTVA can be violated appear to occur when (a) there are versions of each treatment varying in effectiveness or (b) there exists interference between units" (ibid., p. 282).* But how exactly do "versions" and "interference" cause violations, and what are the consequences? Don't these violations occur frequently in political science and the other social sciences? In my research agenda, for example, treatment is peacekeeping, and peacekeeping is going to vary in effectiveness from country to country. Moreover, it is ridiculous to suppose a country's potential outcomes are independent of what is happening (or has already happened) to its neighbors, especially in the context of war and political conflict involving refugees, cross-border skirmishes, etc... (although this kind of independence is typically claimed—at least implicitly—whenever regression-based approaches are used.)

Why do multiple versions of treatment pose SUTVA problems? Because SUTVA posits, for each unit and treatment, a single fixed potential outcome, not a distribution of potential outcomes. Thus, if there is a potential outcome for the weak version of treatment A and a different potential outcome for the strong version of treatment A, then one cannot speak of the potential outcome that would have been observed following treatment A: there are in fact two treatments. Note that a causal question framed in terms of a single type of treatment A (eg., "What is the effect of treatment A-strong version?") does not present these problems. Similarly, as long as there is a single version of the control intervention, one could still coherently define causal effects for each unit in terms of the difference between (observed) potential outcomes under heterogeneous treatment interventions and (unobserved) potential outcomes under control. One might wonder if these causal effects are substantively interesting, and if and how they could be reliably estimated…these critically important issues are separate from and subsequent to the question of whether the inferential investigation is well-defined.

The problem posed by interference across units is very similar; if unit i's potential outcome under treatment A depends upon another unit j's assignment status, then there are really multiple (compound) treatments involving A for unit i, each of which involves a different assignment for unit j. Each of these multiple treatments is associated with a corresponding potential outcome. Note that this kind of interference across units does not necessarily present a problem for defining the effect of a single one of these compound treatment As. It just means that asking "What is the effect of treatment A?" makes no sense---it is not a well-posed causal question.

Because SUTVA is so frequently discussed in the context of matching-based methods, people often assume that the two are inextricably linked: that whatever SUTVA is useful for, it is useful only for matching-based analyses. A crucial point often missed is that SUTVA is useful for the discipline it imposes on study-design. Prior to the choice of analytical methodology (eg., regression, matching, etc.), SUTVA works to nail down the precise question under investigation.

Given these issues, can the peacekeeping question be addressed within Rubin's causal model? I return to this question in post II of this series.

*Rubin, Donald B. Formal Modes of Statistical Inference For Causal Effects. Journal of Statistical Planning and Inference. 25 (1990), 279-292.

Posted by James Greiner at 6:00 AM

27 February 2006

Resources for Multiple Imputation

Jens Hainmueller

As applied researchers, we all know this situation all too well. Like the alcoholic standing in front of the bar that is just about to open, you just downloaded (or somehow compiled) a new dataset. You open your preferred statistical software and begin to investigate the data. And there again you are struck by lightening: Holly cow - I have missing data!! So what do you do about it? Listwise deletion as usual? In the back of your mind you recall your stats teacher saying that listwise deletion is unlikley to result in valid estimates but hitherto you have simply ignored these caveats. Don't be a fool, you can do better -- use multiple imputation (MI).

As is well known in the statistcial literature on the missing data problem, MI is not the silver bullet for dealing with missing values. In some cases, better (primarily more efficent) estimates can be obtained using weighted estimation procedures or specialized numerical methods (EM, etc.) Yet, these methods are often complicated and problem specific and thus not for the faint of heart applied researcher. MI in contrast is relatively easy to implement and works well in most instances. Want to know how to MI? I suggest you take a look at www.multiple-imputation.com, a website that brings together various ressources regaring the method, software, and literature citations that will help you to add MI to your toolkit. A nice (non-technical) introduction is also provided on Joseph Schafer's multiple imputation FAQ page. Gary and co-authors have also written extensivley on this subject offering lots of practical advice for applied rearchers. Last but not least, I recommend searching for "multiple imputation" on Andrew Gelman's blog; you will find many of interesting entries on the topic. Good luck!

Posted by Jens Hainmueller at 6:00 AM

26 February 2006

Applied Statistics - Janet Rosenbaum

This week, the Applied Statistics Workshop will present a talk by Janet Rosenbaum, a Ph.D. candidate in the Program on Health Policy at Harvard. She majored in physics as an undergraduate at Harvard College and received an AM in statisics last year. Janet will present a talk entitled " Do virginity pledges cause virginity?: Estimating the efficacy of sexual abstinence pledges". She has a publication forthcoming in the American Journal of Public Health on related research. The presentation will be at noon on Wednesday, March 1 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided. The abstract of the paper follows on the jump:

Objectives: To determine the efficacy of virginity pledges in delaying sexual debut for sexually inexperienced adolescents in the National Longitudinal Study of Adolescent Health (Add Health).

Methods: Subjects were virgin respondents without wave 1 pledge who
reported their attitudes towards sexuality and birth control at wave 1
(n=3443). Nearest-neighbor matching within propensity score calipers
was used to match wave 2 virginity pledgers (n=291) with non-pledgers,
based on wave 1 attitudes, demographics, and religiosity. Treatment
effects due to treatment assignment were calculated.

Results (Preliminary): 17% of virginity pledgers are compliant with their pledge, and do not
recant at wave 3 their earlier report of having taken a pledge. Similar
proportions of virginity pledgers and non-pledgers report having had
pre-marital sex (54% and 61%, p=0.16) and test positive for chlamydia
(2.7% and 2.9%, p=0.89).

Conclusions: Five years after taking a virginity pledge, most virginity
pledgers fail to report having pledged. Virginity pledges do not affect
the incidence of self-reported pre-marital sex or assay-determined
chlamydia.

Posted by Mike Kellermann at 4:20 PM

24 February 2006

Unobservable Quantities in Competing Risks

Felix Elwert

As I remarked in an earlier entry, some researchers are troubled by the potential outcomes framework of causality because it makes explicit reference to unobservable quantities. The implication, of course, is that science should stick to what’s observable.

This position strikes me as needlessly restrictive. In any case, unobservble quantities are by no means exclusive to the potential outcomes framework of causal inference.

I hasten to add, of course, that I’m a stranger to the philosophical discourse on the issue. Interestingly, A.P Dawid has advanced the argument that many results from the potential outcomes framework of causality can be obtained without reference to unobservable quantities by sticking to conditional probabilities. Doing that, however, the math gets quite bit uglier than in the standard potential-outcomes way of presenting these results. Not coincidentally, I suppose, this is why some statisticians like Jamie Robins stress the pedagogic and heuristic value of thinking in potential outcomes, which appears to be uncontested even among those with philosophical objections to causal inference.

Heuristics aside, I’m a bit at a loss over the steadfast opposition to dealing with unobservable quantities in certain quarters. Didn’t we ditch the insistence on (and belief in) direct observation with the Wiener Kreis? And don’t references to unobservable quantities suffuse the way we think? Take, for example, the irrealis, or hypothetical subjunctive mood in English (If my wife were queen of Thebes…). Or, even more glaringly, the Konjunktiv II mood in German. Is the notion of potential outcomes really such a stretch?

Interestingly, unobservable quantities also pop up in other areas of statistics, not just in causal inference. Competing risk analysis, a branch of survival analysis, has been dealing in unobservables more or less since its inception in the 1960s. Within the first two or three pages of any treatment of competing risk analysis, the authors will discuss the interpretation of risk specific failure times, hazards, and survival functions. The most popular interpretation of risk specific survival times is “the time a case would fail due to this risk, if it hadn’t failed due to some other risk before.? An unobservable eventuality if I’ve ever seen one.

This is not to say that everybody is happy with this interpretation. Kalbfleisch and Prentice (2002), for example, in what’s easily the most authoritative text on survival analysis, ban this interpretation to a supplementary section because they want to “consider primarily statistical models for observable quantities only and avoid reference to hypothetical and unobserved times to failure? (p.249). Too bad. But even they seem to consider the interpretation a helpful heuristic.

Posted by Felix Elwert at 6:00 AM

23 February 2006

Making Votes Honest: Part I

Drew Thomas

First, apologies for my delay in posting to the blog. I've spent most of the last two months involved in the Canadian federal election as a candidate in my home riding. That I lost wasn't unexpected, nor was winning necessarily my goal. I wanted to talk about ideas that weren't being brought up by other candidates. First and foremost on the list was how an election shapes the debate - and why electoral reform is necessary to allow more ideas into the public forum.

While it's clear to me that, first and foremost, Canadians value our right to vote, how that valuation takes place depends directly on what a vote means. As in many party systems, there are two main interpretations for what a vote represents: a belief in the best candidate for the local job, and a belief in the best national party to lead the country. Quite often these two goals do not coincide.

In addition, "tactical" voting, in which a second-choice candidate is chosen merely to block a (much) less desirable candidate, reflects neither of these qualifications.

These problems, among others, anchor my belief that electoral reform is a must for Canada, as well as any multiparty democracy using single member districts and First Past the Post. But band-aid solutions, like the addition of proportionally allocated at-large seats to a FPTP single-member district scheme, would do little to explore the issue. The question before electoral reform revolves not around which of the two focuses - the candidate or the party - is most important to the voters, but rather whether the public can truly express their will through a system that encourages dishonest voting.

So here is my first quantitative question: How does one measure the "strategic effect" on vote counts alone? Survey data is commonly taken, but in comparison to the Ecological Inference problem, drawing this tactical inference from the data themselves would be a huge step towards determining how to reduce it - and what level we could consider acceptable.

Posted by Andrew C. Thomas at 6:00 AM

22 February 2006

Experimental prudence in political science (Part II)

Mike Kellermann

As I posted the other day, experiments in political science have great potential, but they have some unique risks as well, particularly when the manipulation may change the output of some political process. What happens if your experiment is so successful (in the sense of having a large causal effect) that it changes the outcome of some election? How would you explain such an outcome when you report your results? "The manipulation produced an estimated increase in party A's support of 5000 votes, with a standard error of 250. (Party A's margin of victory was 2000 votes. Sorry about that.)" This seems like a good way to alienate the public once word got out, not to mention your colleagues working with observational data who now have another variable that they have to account for in their studies.

Having said that, I am just an observer in this field, and I'm sure that many people reading this blog have thought a lot more about these issues than I have. So, to continue the conversation, I'd like to propose the following questions:

At what point does an experimental manipulation become so significant that researchers have an obligation to inform subjects that they are, in fact, subjects?

Do researchers have an obligation to design experiments such that the net effect of any particular experimental manipulation on political outcomes is expected to be zero?

Would it be appropriate for a researcher to work consistently with one party on a series of experiments designed to determine what manipulations increase the probability that the party will win elections? Do the researcher's personal preferences matter in this regard?

To what extent are concerns mitigated by the fact that, in general, political actors could conduct these experiments on their own initiative? What if those actors agree to fund the research themselves, as was the case in the 2002 Michigan experiments?

If a university were to fund experimental research that was likely to promote one political outcome over another, would it risk losing its tax-exempt status? This one is for our resident lawyer....

Posted by Mike Kellermann at 6:00 AM

21 February 2006

IQ and Risk-taking

A recent study by Shane Frederick at MIT, published in the Journal of Economic Perspectives [pdf], has gotten press attention in the last few weeks for its claim that performance on a simple math test predicted risk-taking behavior. I'm a bit skeptical about the conclusions Frederick's draws (and I'll explain why), but regardless, the study itself is quite interesting.

The study begins by asking subjects to take the Cognitive Reflection Test (CRT), which consists of three simple math questions:

1. A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?
2. If it takes 5 machines 5 minutes to make 5 widgets, how long does it take 100 machines to make 100 widgets?
3. In a lake, there is a patch of lily pads. Every day the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half the lake?

Then subjects are asked two other types of questions:

(a) Would you rather have $3400 now or $3800 in two weeks?
(b) Would you rather have a guaranteed $1000, or a 90% chance of $5000?

Questions of type (a) provide some measure of your "time preference" - how patient you are when it comes to money matters - while questions of type (b) provide a measure of your degree of risk-taking; people who prefer the more certain but lower-expected-value item are more risk-averse than those who choose the opposite. Interestingly, Frederick found that subjects who scored well on the CRT also tended to be more "patient" on questions like (a) and more risk-taking on questions like (b). Much of the discussion in the paper is centered around why and to what extent cognitive abilities, as measured by the CRT, would have an impact on these two things.

It's fascinating work, except it seems to me that there's an alternative explanation for these results that has little to do with cognitive abilities. One strand of such an explanation (which Frederick mentions himself) is that, in addition to mathematical skills, the test measures the ability to overcome impulsive answers. Each of the questions has an "obvious" answer (10 cents, 100 minutes, 24 days) that is incorrect; high-scorers thus need to be able to inhibit the wrong answer as well as calculate the correct one; they tend to be more patient and methodical as well as better at math. It's easy to see how these abilities, not cognitive ability per se, might account for the differential performance on questions like (a).

The deeper problem is that the study failed to control for socioeconomic differences between subjects. The high-performing subjects were taken from universities like Harvard, MIT, and Princeton; the lower-performing subjects were taken from universities like University of Michigan and Bowling Green. People at the latter universities are likely to be in a far more precarious financial situation than those at the former. Why does this matter? One of the principle findings of Kahneman & Tversky's prospect theory is that as you have less money, you become more risk averse. Thus it seems entirely possible to me that the difference between subjects was because of differences in their financial situation, and had nothing to do with cognitive abilities at all (except possibly indirectly, as mediated through socioeconomic factors). I'd be interested in seeing if this finding still holds up even when SES is controlled for.

Posted by Amy Perfors at 6:00 AM

20 February 2006

Applied Statistics - Rustam Ibragimov

This week, the Applied Statistics Workshop will present a talk by Rustam Ibragimov of the Harvard Department of Economics. Professor Ibragimov received a Ph.D. in mathematics from the Institute of Mathematics of Uzbek Academy of Sciences in 1996 and a Ph.D. in economics from Yale University in 2005 before joining the Harvard faculty at the beginning of this academic year. Professor Ibragimov will present a talk entitled " A tale of two tails: peakedness properties in inheritance models of evolutionary theory . The presentation will be at noon on Wednesday, February 22 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided. The abstract of the paper follows on the jump:

In this paper, we study transmission of traits through generations in multifactorial inheritance models with sex- and time-dependent heritability. We further analyze the implications of these models under heavy-tailedness of traits' distributions. Among other results, we show that in the case of a trait (for instance, a medical or behavioral disorder or a phenotype with significant heritability affecting human capital in an economy) with not very thick-tailed initial density, the trait distribution becomes increasingly more peaked, that is, increasingly more concentrated and unequally spread, with time. But these patterns are reversed for traits with sufficiently heavy-tailed initial distributions (e.g., a medical or behavioral disorder for which there is no strongly expressed risk group or a relatively equally distributed ability with significant genetic influence). Such traits' distributions become less peaked over time and increasingly more spread in the population.

In addition, we study the intergenerational transmission of the sex ratio in models of threshold (e.g., polygenic or temperature-dependent) sex determination with long-tailed sex-determining traits. Among other results, we show that if the distribution of the sex determining trait is not very thick-tailed, then several properties of these models are the same as in the case of log-concave densities analyzed by Karlin (1984, 1992). In particular, the excess of males (females) among parents leads to the same pattern for the population of the offspring. Thus, the excess of one sex over the other one accumulates with time and the sex ratio in the total alive population cannot stabilize at the balanced sex ratio value of 1/2. We further show that the above properties are reversed for sufficiently heavy-tailed distributions of sex determining traits. In such settings, the sex ratio of the offspring oscillates around the balanced sex ratio value and an excess of males (females) in the initial period leads to an excess of females (males) offspring next period. Therefore, the sex ratio in the total living population can, in fact, stabilize at 1/2. Interestingly, these results are related, in particular, to the analysis of correlation between human sex ratios and socioeconomic status of parents as well as to the study of the variation of the sex ratio due to parental hormonal levels.

The proof of the results in the paper is based on the general results on majorization properties of heavy-tailed distributions obtained recently in Ibragimov (2004) and several their extensions derived in this work.
<\blockquote>

Posted by Mike Kellermann at 12:54 PM

17 February 2006

Do People Think like Stolper-Samuelson? Part III

Jens Hainmueller and Michael Hiscox

In two previes entries here and here we wrote about a recent paper that re-examines the available evidence for the prominent claim that public attitudes toward trade follow the Stolper-Samuelson theorem (SST). We presented evidence that is largely at odds with this hypothesis. In this posting, we take issue with the last specific finding in this literature that has been interpreted as strong support for the SST: The claim that the skill effect of trade preferences is proportional to a country’s factor endowment. What the heck does this mean?

Recall that the according to the SST, skilled individuals will gain in terms of real wages (and thus should be likely to favor trade openness) in countries that are abundantly endowed with skilled labor, but the size of those gains should be proportional to the degree of skill abundance in each country. Of course, in countries that are actually poorly endowed with skilled labor relative to potential trading partners, those gains should become losses.

The seminal paper on this topic, Rodrik and Mayda (2004), shows evidence supporting this idea that the skill effect (proxied by education) is proportional to a country’s factor endowment: they find the largest positive effects in the richest (i.e. most skill adundant) and smaller positive effects in the somewhat poorer (skill scare) countries in their sample. For the only really poor country in their survey sample, the Philippines, they even find a (significant) negative effect (i.e. more educated are less likely to support trade liberalization). This finding constitutes R&M's smoking gun evidence that preferences do indeed follow the SST - the finding very often cited in the literature.

The central problem with the R&M findings, which are mainly based on data from the International Social Survey Programme (ISSP), is the lack of skill scare countries in their sample. Their data thus does not allow for a comprehensive test of the claim that the skill effect of trade preferences is proportional to a country’s factor endowment, simply because most countries in their sample are skill abundant, relatively rich economies. In the supplement to a recent paper we specifically reexamine the R&M claim, using data from the Global Attitudes Project survey administered by Pew in 2002. The PEW data has not been examined by scholars interested in attitudes toward trade, although it has some key advantages compared to the other datasets that have been used (ISSP, etc.). Most importantly, it covers a much broader range of economies that are very heterogeneous in terms of their levels of skill endowments. The PEW data does not only covers the Philippines, but additionally 43 countries, many of which are skill scare.

This figure summarizes our results from the PEW data. It plots the estimated marginal effect of an additional year of schooling on the probability of favouring free trade (evaluated at the sample means, using country specifc ordered probit models) against skill endowment as measured by the log of GDP per capital in 2002 (PPP). The solid diamonds decode the point estimates and the dashed lines shows the .90 confidence envelopes.

Two main findings emerge here: First, there is no clear relationship between the marginal effect of education on support for trade among respondents and their countries’ skill endowments. The pattern more resembles that of a drawing by expressionist painter Jackson Pollack than that of a clear upwards sloping line (what one would predict based upon a simple application of Stolper-Samuelson). Second, in all countries increased schooling has either a positive or zero effect on the probability of supporting free trade. This includes the Philippines which is the only case of a country abundant in low-skills for which Mayda Rodrik and found a negative relationship. Moreover, even most of the point estimates are positive, except for Canada, Ivory Coast, Mali, and Nigeria; not quite a cluster of countries with common skill endowments!

Overall these results strongly suggest that the impact of education levels on support for trade among individuals is not driven by differences in skill endowments across countries (and individual concerns about wage levels) as suggested by a simple application of the Stolper-Samuelson theorem.

Posted by Jens Hainmueller at 6:00 AM

16 February 2006

Two Objections to the Potential Outcomes Framework of Causality

Felix Elwert

Agreement with the Potential Outcomes Framework of Causality (counterfactual approach, Rubin model) is spreading like wildfire, but is still far from unanimous. Over the past few years I’ve had several conversations with friends in sociology, economics, statistics, and epidemiology who expressed considerable unease with the notion of potential outcomes, or even causality itself.

Two problems keep coming up.

The first is more of a public relations issue than an intellectual problem: Counterfactualists – I at any rate – apparently come on a bit strong at times. I’ve heard the term “counterfascism? (and left the room). I am told that this has to do with offering a simple operational definition for a notion – causality – that has defied a concise discourse for a few centuries too many. How can humble statistics propose a cure where respectable philosophy rails in confusion?

The second, more serious, issue relates to how far we want to go in dealing with the unobservable. The potential outcomes framework clearly and avowedly locates causal effects in the difference between potential outcomes, at least one of which remains unobservable (the “counterfactual' outcome). Direct observation of causal effects thus is impossible, although estimation is possible under certain well-defined circumstances. The exchange between A.P. Dawid (“Causal Inference without Counterfactuals?), Don Rubin, Jamie Robins, Judea Pearl, and others in JASA 1999 considers the problem at its most sophisticated. My conversations, shall we say, rarely reach such heights. But it’s eminently clear that many researchers are troubled to various degrees by admitting unobservable quantities into “science.? Positions here range from moderate empiricism to Vienna style positivism: “you either observe directly or you lie."

I’m in no place to offer solutions. But I do offer this complaint whenever the two issues are combined into a single charge--that counterfactualist potential outcomers are arrogant because they fancy themselves scientists when they deal in unobservable quantities. I’d say that the opposite is true: the potential outcomes framework of causality offers a cutting lesson in humility because it demonstrates the necessity of relying on unobservable (but not necessarily unestimable) quantities, not to mention strong prior theory, for a great many tasks dear to the scientific enterprise.

Posted by Felix Elwert at 6:00 AM

15 February 2006

Simulated Goats?

Sebastian Bauhoff

In this week's Gov 2001 class, Gary was showing how to get around difficult statistical problems by simulation rather than using complex analytics. That got me thinking about the trade-offs between the two approaches.

One class example was the Monte Hall game that you can probably recite backwards in your sleep: a contestant is asked to choose between 3 doors, 1 of which has a car behind it. Once the choice is made, the game show host opens one of the remaining doors that only has a goat. The contestant is offered to switch from her initial choice to the remaining door, and the question is whether that's a good strategy.

One can solve this analytically by thinking hard about the problem. Alternatively one can simulating the conditional probabilities of getting the prize given switching or not switching, and use this to get the intuition for the result.

During the debate in class I was wondering whether simulations are really such a good thing. Sure, they solve the particular problem at hand and it may be the only way to handle very complex problems fast. But it doesn't contribute to solving even closely related problems whereas one could glean insights from the analytic approach.

Maybe the simulation is still useful since writing code structures one's thoughts. But it also seems like it might depreciate critical skills. (Apart from the very real possibility that one makes a mistake in the code and tries to convince oneself of the wrong result.) Imagine you show up at Monty's show and they changed the game without telling you. It won't help if you would know how to implement a new simulation if you can't actually run it. Having solid practice in the analytical approach might be more useful.

I don't want to suggest that simulations are generally evil, but maybe they come at a cost. Oh, and the answer is yes, switch.

Posted by Sebastian Bauhoff at 6:00 AM

14 February 2006

Is Military Spending Justified by Security Threats?

You, Jong-Sung

In the recent ASSA meeting in Boston, Linda Bilmes, a Kennedy School lecturer, and Joseph Stiglitz, Columbia professor and a Nobel prize-winning economist, presented an interesting paper, “The Economic Costs of the Iraq War.? They estimated the total economic costs of the war, including direct costs and macroeconomic costs, lie between $1 and $2 trillion. Interestingly, the “$2 trillion? figure was already projected by William Nordhaus, Yale professor of economics, even before the war. In his paper, “The Economic Consequences of a War With Iraq?(2002), he predicted the costs of Iraq war would reach from $99 billion, if the war is short and favorable, to $1,924 billion, if the war is protracted and unfavorable.

In the same ASSA meeting, Nordhaus raised important questions about excessive military spending in his paper entitled “The Problem of Excessive Military Spending in the United States.? I am providing some excerpts from the paper below.

Nordhaus notes, “The U.S. has approximately half of total national security spending for the entire world. Total outlays for ‘defense’ as defined by the Congressional Budget Office were $493 billion for FY2005, while the national accounts concept of national defense totaled around $590 billion for 2005. It constitutes about $5000 per family. By comparison, the Federal government current expenditures in 2004 were $14 billion for energy, $4.7 billion for recreation and culture, and $1.8 billion for transit and railroads.? The question is whether the US is earning a good return on its national-security ‘investment,’ for it is clearly an investment in peace and safety. The bottom line he argues, is probably not.

Nordhaus asks whether it is plausible that the United States faces a variety and severity of objective security threats that are equal to the rest of the world put together. Then he points the following facts. “Unlike Israel, no serious country wishes to wipe the U.S. off the face of the earth. Unlike Russia, India, China, and much of Europe, no one has invaded the U.S. since the nineteenth century. We have common borders with two friendly democratic countries with which we have fought no wars for more than a century.?

He raises the issue of strategic and budgetary inertia. “Many costly programs are still in place a decade and a half after the end of the cold war. The U.S. has around 6000 deployed nuclear weapons, and Russia has around 4000 weapons. There can be little doubt that the world and the U.S. are more vulnerable rather than less vulnerable with such a large stock of weapons, yet they survive in the military budget. There is a kind of security Laffer curve in nuclear material, where more is less in the sense that the more nuclear material floating around the more difficult it is to control it and the more like it is that it can be stolen.? He argues that today’s slow decline in spending on obsolete systems arises largely because there are such weak budgetary and virtually non-existent political pressures on military spending – the ‘loose budget constraints.’

He suggests that an excessive military budget is not just economic waste but also causes problems rather than solving them by tempting leaders to use an existing military capability. “Countries without military capability cannot easily undertake ‘wars of choice’ or wars whose purposes evolve, as in Iraq, from dismantling wars of mass destruction to promoting democracy. To the extent that Vietnam and Iraq prove to be miscalculations and strategic blunders, the ability to conduct them is clearly a cost of having a large military budget.?

A final concern he raises is that the large national-security budget leads to loose budget constraints and poor control over spending and programs. “Congress exercises no visible oversight on defense spending and a substantial part is secret. Some of the abuses in recent military activities arise because Congress cannot possibly effectively oversee such a large operation where programs involving $24 billion are enacted as a single line item. Even worse, how can citizens or ordinary members of Congress understand the activities of an agency like the National Security Agency, whose spending level and justification are actually classified??

Posted by Jong-sung You at 6:00 AM

13 February 2006

Applied Statistics - Mike Kellermann

This week, I will be giving the talk at the Applied Statistics Workshop; as they say, turnabout is fair play. The talk is entitled "Estimating Ideal Points in the British House of Commons." I've blogged a bit about this project here. An abstract of the talk appears on the jump:

Estimating the policy preferences of individual legislators is important for many studies of legislative and partisan politics. Unfortunately, existing ideal point methods do not perform well when applied to legislatures characterized by strong party discipline and oppositional politics, such as the British House of Commons. This project develops a new approach for estimating the preferences of British legislators, using Early Day Motions as an alternative data source. Early Day Motions are petitions that allow MPs to express their opinions without being bound by party whips. Unlike voting data, however, EDMs do not allow legislators to express opposition to a particular policy. To deal with the differences between voting data and EDMs, I adapt existing Bayesian ideal point models to allow for the possibility (supported in the data) that some Members of Parliament are more likely to sign EDMs than others, regardless of policy content. The estimates obtained have much greater face validity than previous attempts to estimate ideal points in the House of Commons, and have the usual benefits associated with Bayesian ideal point models, including natural estimates of uncertainty and the ability to calculate auxiliary quantities of interest directly from the posterior distribution.

Posted by Mike Kellermann at 9:37 PM

Experimental prudence in political science (Part I)

Mike Kellermann

We've talked a fair bit on the blog about the use of experimental data to make causal inferences. While the inferential benefits of experimental research are clear, experiments raise prudential questions that we rarely face in observational research; they require "manipulation" in more than one sense of that word. As someone who is an interested observer of the experimental literature rather than an active participant, I wonder how well the institutional mechanisms for oversight have adapted to field experimentation in the social sciences in general (and political science in particular). In medical experiments, the ability in principle to obtain informed consent from subjects is critical in determining what is ethically acceptable, but this is often not possible in a political context; external validity may depend on concealing the experimental nature of the manipulation from the "subjects." Moreover, the effects of the manipulation may be large enough to change large-scale political outcomes, thus affecting individuals outside of the nominal pool of subjects.

As an example, consider the turnout experiments I discussed here and here. The large-scale phone experiments in Iowa and Michigan are typical in that they involve non-partisan GOTV (get out the vote) efforts. Treated voters are contacted by phone (or by mail, or in person) and urged to vote, while control voters are not contacted; neither group, as far as I can tell, know that they are experimental subjects. Such a design is possible because the act of voting is a matter of public record, and thus the cooperation of the subjects is not required to obtain the relevant data.

While the effects of such manipulations may provide some insight for political scientists as to the causes of voter turnout, their practical significance is a bit hard to measure; there are not that many genuinely non-partisan groups out there with both the means and the motivation to conduct large-scale voter mobilization efforts. There have been some recent efforts to study partisan voter mobilization strategies using field experiments. David Nickerson, Ryan Friedrichs, and David King have a forthcoming article reporting on an experiment in the 2002 Michigan gubernatorial campaign, in which a youth organization of the Michigan Democratic Party agreed to randomize their partisan GOTV efforts aimed at voters believed to be Democrats or independents. The authors find positive effects for all three of the common GOTV manipulations (direct literature, phone calls, and face-to-face canvassing). In the abstract, obtaining data from manipulations that are clearly relevant in the real world is good for the discipline. I have no doubt that both party activists and party scholars would love to do more such research, but it all makes me slightly uncomfortable. As researchers, should we be in a position where we are (potentially) influencing political outcomes not only through arguments based on the evidence that we collect, but through the process of collecting evidence as well?

Posted by Mike Kellermann at 6:00 AM

10 February 2006

What’s an Effect?

Felix Elwert

Though it hardly comports with my own views, there are plenty of people in the social sciences and economics that are troubled by the potential outcomes framework of causality. What intrigues me about this opposition is that most of those who object to the notion of causality appear comfortable with talk about regression “effects.?

If you object to talk about causality, what do you mean by “effect??

By way of preemptive self-defense, this question isn’t about my inability to understand that regression coefficients provide a neat summary of the sample data in a purely descriptive sense (I do get that). But if the goal is getting descriptives, why call regression coefficients “effects?? Doesn’t “effect? imply agency? Sure, the predicted Y might increase by b units if we change X by one unit (agency! ha!) but then that’s really the analyst’s doing (we shift X by one unit) - and didn’t we want the analysis to speak to what’s happening in the world outside of that scatter plot print out?

Here’s the task: Can anybody provide an interpretation of the word “effect? that (a) doesn’t just refer to what the analyst can do with that scatter plot on the desk, and that (b) does not take recourse to a manipulability (counterfactualist or potential outcomes) account of causality?

What’s your preferred non-causal explanation for why one might call regression coefficients “effects??

Posted by Felix Elwert at 6:00 AM

9 February 2006

Implicit learning and race

Since Martin Luther King Day was somewhat recent (okay - a month ago; stil...), I thought I'd blog about human statistical learning and its possible implications for racism. Some of this is a bit speculative (and I'm no sociologist) but it's a fascinating exploration of how cutting-edge research in cognitive science has implications for deep real-world problems.

In today's society racism is rarely so blatant as it was 50 or 100 years ago. More often it refers to subtle but ubiquitous inconsistencies in how minorities are treated (or, sometimes, perceive themselves to be treated). Different situations are probably different mixtures of the two. Racism might often be small effects that the person doing the treating might not even notice -- down to slight differences in body language and tone of voice -- that could nevertheless have large impacts on the outcome of a job interview or the likelihood of being suspected of a crime.

One of the things studying statistical learning teaches us is that almost everyone has subtly different, usually more negative, attitudes to minorities than to whites - even minorities themselves. Don't believe me? Check out the online Implicit Association Test, which measures the amount of subconscious connection you make between different races and concepts. The premise is simple and has been validated over and over in psychology: if two concepts are strongly linked in our minds, we will be faster to say so than if they are only weakly associated. For instance, you're faster to say that "nurse" and "female" are similar than "nurse" and "male", even though men can be nurses, too. I'm oversimplifying here, but in the IAT you essentially are called upon to link pictures of people of different races with descriptors like good/bad, dangerous/nice, etc. Horrifyingly, even knowing what the experiment measures, even taking it over and over again, most people are faster to link white faces with "good" words, black with bad.

Malcolm Gladwell's book "Blink" has an excellent chapter describing this, and it's worth quoting one of his paragraphs in detail: "The disturbing thing about this test is that it shows that our unconscious attitudes may be utterly incompatible with our stated values. As it turns out, for example, of the fifty thousand African Americans who have taken the Race IAT so far, about half of them, like me, have stronger associations with whites than with blacks. How could we not? We live in North America, where we are surrounded every day by cultural messages linking white with good." (85)

I think this is yet another example of where learning mechanisms that are usually helpful -- it makes sense to be sensitive to the statistical correlations in the environment, after all -- can go devastatingly awry in today's world. Because the media and gossip and stories are a very skewed reflection of "the real world", our perceptions formed by those sources (our culture, in other words) are also skewed.

What can we do? Two things, I think. #1: Constant vigilance! Our associations may be unconscious, but our actions aren't. If we know about our unconscious associations, we're more likely to watch ourselves vigilantly to make sure they don't come out in our actions; as enough people do that, slowly, the stereotypes and associations themselves may change. #2: This is the speculation part, but it may be possible to actually change our unconscious associations: not consciously or though sheer willpower, but by changing the input our brain receives. The best way to do that, I would guess, is to get to know people of the minority group in question. Suddenly your brain is receiving lots of very salient information about specific individuals with wholly different associations than the stereotypes: enough of this and your stereotype itself might change, or at least grow weaker. I would love to see this tested, or if someone has done so, what the results were.

Posted by Amy Perfors at 6:00 AM

8 February 2006

New Author's Committee Chair

I'd like to announce a change today in our Blog Author's Committee Chair from Jim Greiner to Amy Perfors. Amy was a Stanford undergrad and is now a 3rd year graduate student at MIT. She is interested in using Bayesian technology as models of how humans think, evolutionary linguistics, how humans learn, and a variety of other interesting topics. See her web site for lots more info. In addition to writing some of our most interesting blog entries, I especially recommend this great picture of her winning a line out inrugby!

Jim, the first chair of our author's committee, led this group from a pretty good idea to, in my view and judging from our large and fast growing readership, an enormously successful and informative blog. He will continue on as a member of our Author's Committee, but he's busy this semester running his innovative class in the Law School and Statistics Department, Quantitative Social Science, Law, Expert Witnesses, and Litigation.

Jim Greiner graduated with a B.A. in Government from the University of Virginia and received a J.D. from the University of Michigan Law School in 1995. He clerked for Judge Patrick Higginbotham on the U.S.
Court of Appeals and was a practicing lawyer in the Justice Department and private practice before joining the Harvard Statistics Department.

Posted by Gary King at 6:00 AM

7 February 2006

Do People Think like Stolper-Samuelson? Part II

Jens Hainmueller and Michael Hiscox

Last week, we introduced the question of whether the Stolper-Samuelson theorem, i.e., that more educated people favour trade because it will increase their factor returns, accurately reflects the way people think. We also introduced our recent paper on this subject, “Learning to Love Globalization: Education and Individual Attitudes Toward International Trade“, in which we examine the alternative theory that more educated respondents tend to be more exposed to economic ideas about the overall efficiency gains for the national economy associated with greater trade openness, and tend to be less prone to nationalist and anti-foreigner sentiments often linked with protectionism.

Which of the very different interpretations of the education-pro trade link is more correct? We re-examine the available survey data on individual attitudes toward trade, conducting a simple test of the effects of education on support for trade that distinguishes clearly between the Stolper-Samuelson interpretation of this relationship and alternative ideational and cultural accounts. We find that the impact of education on attitudes toward trade is almost identical among respondents currently in the active labor force and among those who are not (even those who are retired). That the effects of education on trade policy preferences are not mediated by whether individuals are actually being paid for the employment of their skills strongly suggests that it is not primarily a product of distributional concerns.

The analysis also reveals clear non-linearities in the relationship between education and trade preferences: while individuals who have been exposed to college or university education are far more likely to favor trade openness than those who have not, other types of educational attainment have no significant effects on attitudes and some even reduce the likelihood that individuals support trade even though they clearly contribute to skill acquisition. These findings indicate that the particular ideational and/or cultural effects associated with college education, and not the gradual accumulation of skills, are critical in shaping individual attitudes toward trade.

We conclude that the impact of education on how voters’ think about trade and globalization has more to do with exposure to economic ideas, and information about the aggregate and varied effects of these economic phenomena, than it does with individual calculations about how trade affects personal income or job security. This is not to say that the latter types of calculations are not important in shaping individuals’ views of trade – just that they are not being manifest in the simple association between education and support for trade openness. As we discuss in the concluding section, we think it is likely that concerns about the effects of trade on personal income and job security might actually hinge on the particular impact of trade openness in specific industries. One of the key implications of our findings is that future empirical tests of the determinants of individual trade preferences need to be substantially refined to identify the impact of distributional concerns on attitudes towards trade and globalization and distinguish these from the impact of ideational and cultural factors.

Posted by James Greiner at 6:00 AM

6 February 2006

Applied Statistics - Alexis Diamond

This week, the Applied Statistics Workshop will present a talk by Alexis Diamond, a Ph.D. candidate in Political Economy and Government. The talk is entitled "The Effect of UN Intervention after Civil War." An abstract of the talk appears on the jump:

A basic goal of political science is to understand the effects of political institutions on war and peace. Yet the impact of United Nations peacebuilding following civil war remains very much in doubt following King and Zeng (2006), which found that prior conclusions about these causal effects (Doyle and Sambanis 2000) had been based more on indefensible modeling assumptions than evidence. This paper revisits the Doyle and Sambanis causal questions and answers them using new matching-based methods that address issues raised by King and Zeng. The methods are validated for the Doyle and Sambanis data via their application to a dataset with similar features for which the correct answer is known. These new methods do not require assumptions that plagued prior work and are broadly applicable to important inferential problems in political science and beyond. When the methods are applied to the Doyle and Sambanis data, there is a preponderance of evidence to suggest that UN peacebuilding has a positive effect on peace and democracy in the aftermath of civil war.

Posted by Mike Kellermann at 11:41 AM

Another paradox of turnout? (Part II)

Mike Kellermann

Last week I highlighted a new article by Arceneaux, Gerber, and Green that suggests that matching methods have difficulty in replicating the experimentally estimated causal effect of a phone-based voter mobilization effort, given a relatively rich set of covariates and a large control pool from which to draw matches. Matching methods have been touted as producing experiment-like estimates from observational data, so this result is kind of disheartening. How might advocates of matching methods respond to this claim?

Let's assume that the results in the paper hold up to further scrutiny (someone should - and I have no doubt will - put this data through the ringer, although hopefully it won't suffer the fate of the NSW dataset). Why should turnout be problematic? Explaining voter turnout has presented quandaries and paradoxes in other branches of political science, so it is hardly surprising that it mucks up the works here. Turnout has been called "the paradox that ate rational choice," due to the great difficulty in finding a plausible model that can justify turnout on instrumental terms. To my mind, the most reasonable (and least interesting) rational choice models of turnout resort to the psychic benefits of voting or "civic duty" - the infamous "D" term - to account for the fairly solid empirical generalization that some people do, in fact, vote. What, exactly, the "D" term represents is something of a mystery, but it seems reasonable that people who feel a duty to go to the polls are also more likely to listen to a phone call urging them to vote, even conditional on things like age, gender, and voting behavior in the previous two elections.

The authors are somewhat pessimistic about the possibility of detecting such problems when researchers do not have an experimental estimate to benchmark their results (and, hence, when matching or some other technique is actually needed). They ask, "How does one know whether matched observations are balanced in terms of the unobserved causes of the dependent variable?" That is indeed the question, but I think that they may be a little too skeptical about the ability to ferret out such problems, especially in this particular context. If the matched data is truly balanced on both the observed and unobserved outcomes, then there should be no difference in expected value of some auxiliary variable (excluded from the matching process) that was observed before the treatment was applied, unless we want to start thinking in terms of reverse temporal causation. The authors could have dropped, say, turnout in 2000 from their matching procedure, matched on the other covariates, and then checked for a difference in the turnout in 2000 between the treatment and control groups in 2002. My guess is that they would find a pretty big difference. Of course, since these matches are not the same as those used in the analysis, any problems that result could be "fixed" by the inclusion of 2000 voter turnout in the matching procedure, but that is putting a lot of weight on one variable.

Even if the prospects for identifying bias due to unobserved covariates are better than Arceneaux, Gerber, and Green suggest, it is not at all apparent that we can do anything about it. In this case, if we knew what "duty" was, we might be able to find covariates that would allow us to satisfy the unconfoundedness constraint. On the other hand, it is not obvious how we would identify those variables from observational studies, since we would likely have similar problems with confoundedness. No one said this was supposed to be easy.

Posted by Mike Kellermann at 6:00 AM

3 February 2006

To Your Health

Sebastian Bauhoff

A common excuse for wine lovers is that "a few glasses of wine are good for the heart". Well maybe for warming your heart but possibly not for preventing heart attacks.

A recent note in The Lancet (Vol 366, December 3, 2005, pages 1911-1912) suggests that earlier reports that light to moderate alcohol consumption can lower the risk of ischaemic heart disease were severely affected by confounders in non-randomized trials.

Some people believed that the early results were due to misclassification of former drinkers with cardio-vascular diseases ("CVD") as never-drinkers. This raised the CVD rate among the non-drinkers group. Another possible story is that the studies didn't properly control for confounders -- apparently some risk factors for CVD are more prevalent among non-drinkers, and the non-randmized studies didn't control well enough for those. But as the note points out, confounding could bias results both in favor or against a protective effect. Heavy drinking offers really good protection but those people don't live healtily lifes, and the health benefits would be obscured.

But don't fear, the British Heart Foundation says that low to moderate alcohol consumption probably doesn't do your heart any harm. For protection against CVD you should really quit smoking, do sports, and eat a balanced diet. Not quite as appealing as a good glass of wine, of course.

In any case, food for thought and a great 2-page piece for your next causal inference class. Cheers to that.

Posted by Sebastian Bauhoff at 6:00 AM

2 February 2006

Bayesian vs. frequentist in cogsci

Bayesian vs. frequentist - it's an old debate. The Bayesian approach views probabilities as degrees of belief in a proposition, while the frequentist says that a probability refers to a set of events, i.e., is derived from observed or imaginary frequency distributions. In order to avoid the well-trod ground comparing these two approaches in pure statistics, I'll consider instead how the debate changes when applied to cognitive science.

One of the main arguments made against using Bayesian probability in statistics is that it's ill-grounded and subjective. If probability is just "degree of belief", then even a question like "what is the probability of heads or tails" can change depending on who is asking the question and what their prior beliefs about coins are. Suddenly there is no "objective standard", and that's nerve-wracking. For this reason, most statistical tests in most disciplines rely on frequentist notions like confidence intervals rather than Bayesian notions like the relative probability of two hypotheses. However, there are drawbacks to doing this, even in non-cogsci areas. To begin with, many things we want to express statistical knowledge about don't make sense in terms of reference sets, e.g., the probability that it will rain tomorrow (since it will only rain once). For another, some argue that the seeming objectivity of the frequentist approach is illusory, since we can't ever be sure that our sampling process hasn't biased or distorted the data. At least with a Bayesian approach, we can explicitly deal with and/or try to correct that.

But it's in trying to model the mind that we can really see the power of Bayesian probability. Unlike as for other social scientists, this sort of subjectivity isn't a problem: we cognitive scientists are interested in degree of belief. In a sense, we study subjectivity. In making models of human reasoning, then, an approach that incorporates subjectivity is a benefit, not a problem.

Furthermore, (unlike many statistical models) the brain generally doesn't just want to correctly capture the statistical properties of the world. Actually, its main goal is generalization -- prediction, not just estimation, in other words -- and one of the things people excel at is generalization based on very little data. Incorporating the Bayesian notion of prior beliefs, which act to constrain generalization in ways that go beyond the actual data, allows us to formally study this in ways that we couldn't if we just stuck to frequentist ideas of probability.

Posted by Amy Perfors at 6:00 AM

1 February 2006

Do People Think like Stolper-Samuelson? Part I

Jens Hainmueller and Michael Hiscox

In face of the fierce political disagreements over free trade taking place in the US and elsewhere, it's critical we try to understand how people think about trade policies. A growing body of scholarly research has examined survey data on attitudes toward trade among voters, focusing on individual determinants of protectionist sentiments. These studies have converged upon one central finding: fears about the distributional effects of trade openness among less-educated, blue-collar workers lie at the heart of much of the backlash against globalization in the United States and other advanced economies. Support for new trade restrictions is highest among respondents with the lowest levels of education (e.g., Scheve and Slaughter 2001a, 2001b; Mayda and Rodrik 2005; O’Rourke and Sinnott 2002). These findings are interpreted as strong support for the Stolper-Samuelson theorem, a classic economic treatment of the income effects of trade. It predicts that trade openness benefits those owning factors of production with which their economy is relatively well endowed (those with high skill levels in the advanced economies) while hurting others (low skilled and unskilled workers).


But is it really true that people think like Stolper-Samuelson (i.e. that more educated people favour trade because it will increase their factor returns)? The positive relationship between education and support for trade liberalization might also – and perhaps primarily – reflect the fact that more educated respondents tend to be more exposed to economic ideas about the overall efficiency gains for the national economy associated with greater trade openness, and tend to be less prone to nationalist and anti-foreigner sentiments often linked with protectionism. In our recent paper “Learning to Love Globalization:
Education and Individual Attitudes Toward International Trade“
we try to shed light on this issue. More on this in a subsequent post tomorrow.

Posted by Jens Hainmueller at 6:00 AM