January 13, 2009
Like many of us, I'm always on the lookout for good examples to use in undergraduate methods courses. My high school chemistry teacher (a former nun) said that the best teaching examples involved sex, food, or money, and that seems like reasonable advice for statistics as well. In that vein, I noted a recent article on the "Axe effect" in Metro:
'Axe effect' really works, a new study swears
Researchers in the U.K. asked women to rate the attractiveness of men wearing Axe's British counterpart, Lynx, against those who were wearing an odorless placebo.
On a 7-point scale, men wearing Lynx scored a 4.2, 0.4 point higher than those wearing the placebo.
But here's the catch: The women did not meet the men face-to-face. They watched them on video.
So what explains the discrepancy in ratings? Men wearing Lynx reported feeling more confident about themselves. So the difference in attitude appears more responsible for getting you lucky than the scent itself.
This story was not just reported in a subway tabloid; a long article appeared in the Economist. (Although at least the Metro story reported an effect size, unlike the Economist).
Is there an Axe effect? The news stories are reporting on a study in the International Journal of Cosmetic Science, "Manipulation of body odour alters men's self-confidence and judgements of their visual attractiveness by women". The researchers recruited male students and staff members from the University of Liverpool, randomly assigned some of them to use deodorant or a placebo. They then took photographs of the men as well as videos of them pretending to chat up an attractive woman. The photos and videos of the men were evaluated by "a panel of eight independent female raters" for attractiveness and self-confidence.
|Photo||Not significant||(not asked)|
|Video, no sound||Significant!||Not significant|
|Video w/ sound||Not significant||Not significant|
There may be an Axe effect on women's perception of men's attractiveness (but not self-confidence) if they see them on video if they can't hear them. Or it might be a fluke. This seems like a classic multiple comparison problem. With five tests, it is not that unlikely that one of them would be (barely) statistically significant. The proposed mechanism for the one "effect" (which attracted all of the media attention) was increased self-confidence on the part of the male subjects, so it seems a little odd that an effect would be found on perceived attractiveness and not on self-confidence. We might be more confident that something is going on if the effect sizes were reported for the non-significant results, but they don't appear in the paper. So, the Axe effect may be for real, but only if you keep your mouth shut.
June 26, 2008
A few bloggers at other sites (Concurring Opinions and Election Law Blog) have pointed out an interesting footnote in the Supreme Court's recent decision on punitive damages in the Exxon Valdez case. Justice Souter took note of experimental research on jury decisionmaking done by Cass Sunstein, Daniel Kahneman, and others, but then dismissed it for the purposes of the decision because Exxon had contributed funding for the research:
The Court is aware of a body of literature running parallel to anecdotal reports, examining the predictability of punitive awards by conducting numerous “mock juries,” where different “jurors” are confronted with the same hypothetical case. See, e.g., C. Sunstein, R. Hastie, J. Payne, D. Schkade, W. Viscusi, Punitive Damages: How Juries Decide (2002); Schkade, Sunstein, & Kahneman, Deliberating About Dollars: The Severity Shift, 100 Colum. L. Rev. 1139 (2000); Hastie, Schkade, & Payne, Juror Judgments in Civil Cases: Effects of Plaintiff’s Requests and Plaintiff’s Identity on Punitive Damage Awards, 23 Law & Hum. Behav. 445 (1999); Sunstein, Kahneman, & Schkade, Assessing Punitive Damages (with Notes on Cognition and Valuation in Law), 107 Yale L. J. 2071 (1998). Because this research was funded in part by Exxon, we decline to rely on it.
It will be interesting to see whether this position is taken up by the lower courts; if so, we might see less incentive for private actors to fund social science research. That could be good or bad, I suppose, depending on one's views of likelihood that researchers will be unduly influenced by their funding sources.
June 13, 2008
Two awards given by the Society for Political Methodology were announced today, and both of them went to IQSS faculty members (and co-authors).
The Gosnell Prize is given to the "best paper on political methodology given at a conference", and this year's prize was awarded to Kevin Quinn for his paper "What Can be Learned from a Simple Table? Bayesian Inference and Sensitivity Analysis for Causal Effects from 2x2 and 2x2xK Tables in the Presence of Unmeasured Confounding." From the announcement:
Quinn's paper offers a set of steps to improve inference with binary independent and dependent variables and unmeasured confounds. He derives large sample, non-parametric bounds on the average treatment effect and shows how these bounds do not rely on auxiliary assumptions. He then provides a graphical way to depict the robustness of inferences as one changes assumptions about the confounds. Finally, he shows how one can use a Bayesian framework relying on substantive knowledge to restrict the set of assumptions on the confounds to improve inference.
The Warren Miller prize is given annually to the best paper appearing in Political Analysis. This year's prize has been awarded to Daniel E. Ho, Kosuke Imai, Gary King, and Elizabeth A. Stuart for their article, "Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference." The abstract of their paper follows:
Although published works rarely include causal estimates from more than a few model specifications, authors usually choose the presented estimates from numerous trial runs readers never see. Given the often large variation in estimates across choices of control variables, functional forms, and other modeling assumptions, how can researchers ensure that the few estimates presented are accurate or representative? How do readers know that publications are not merely demonstrations that it is possible to find a specification that fits the author's favorite hypothesis? And how do we evaluate or even define statistical properties like unbiasedness or mean squared error when no unique model or estimator even exists? Matching methods, which offer the promise of causal inference with fewer assumptions, constitute one possible way forward, but crucial results in this fast-growing methodological literature are often grossly misinterpreted. We explain how to avoid these misinterpretations and propose a unified approach that makes it possible for researchers to preprocess data with matching (such as with the easy-to-use software we offer) and then to apply the best parametric techniques they would have used anyway. This procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.
May 19, 2008
Mark Blumenthal from pollster.com has been posting interviews with scholars at the 2008 AAPOR conference, including two with our very own Sunshine Hillygus and Chase Harrison from the Program on Survey Research:
April 3, 2008
It's a day or so past April 1, but if you haven't seen this post [Edit: link fixed] over at Andrew Gelman's blog, it is worth a look. It's about as good an apologia from a "born-again frequentist" as you are likely to find. An exerpt:
I like unbiased estimates and I like confidence intervals that really have their advertised confidence coverage. I know that these aren't always going to be possible, but I think the right way forward is to get as close to these goals as possible and to develop robust methods that work with minimal assumptions. The Bayesian approach--to give up even trying to approximate unbiasedness and to instead rely on stronger and stronger assumptions--that seems like the wrong way to go.
Fortunately, Gelman's conversion experience appears to have ended after about a day...
March 11, 2008
While the Democratic nomination contest drags on (and on and on...; Tom Hanks declared himself bored with the race last week), attention is turning to hypothetical general election matchups between Hilary Clinton or Barack Obama and John McCain. Mystery Pollster has a post up reporting on state-by-state hypothetical matchup numbers obtained from surveys of 600 registered voters in each state conducted by Survey USA. There is some debate about the quality of the data (Survey USA uses Interactive Voice Response to conduct its surveys, there is no likely voter screen, etc.). But we have what we have.
At this point, the results are primarily of interest to the extent that they speak to the "electability" question on the Democratic side; who is more likely to beat McCain? MP goes through the results state by state, classifying each state into Strong McCain, Lean McCain, Toss-up, etc. From this you can calculate the number of electoral votes in each category, which provides some information but isn't exactly what we're interested in.
This problem is a natural one for the application of some simple, naive Bayesian ideas. If we throw on some flat priors, make all sorts of unreasonably strong independence assumptions, and assume that the results were derived from simple random sampling, we can quickly get posterior distributions for the support for each candidate in each state and can calculate estimates of the probability of victory. From there, it is easy to calculate the posterior distribution of the number of electoral votes for each candidate and find posterior probabilities that Obama beats McCain, Clinton beats McCain, or the probability that Obama would receive more electoral votes than Clinton.
While I was sitting around at lunch yesterday, I ran a very quick analysis using the reported SurveyUSA marginals. Essentially, I took samples from 50 independent Dirichlet posteriors for both hypothetical matchups, assuming a flat prior and multinomial sampling density (to allow for undecideds); to avoid dealing with the posterior predictive distributions, I'm just going to assume that all registered voters will vote so I can just compare posterior proportions. When you run this, you obtain estimates (conditional on the data and, most importantly, the model) that the probability of an Obama victory over McCain is about 88% and the probability of a Clinton victory is about 72%. There is a roughly 70% posterior probability that Obama would win more electoral votes than Clinton.
As I mentioned, this is an extremely naive Bayesian approach. There are a lot of ways that one could make the model better: adding additional sources of uncertainty, allowing for correlations between the states, using historical information to inform priors, and imposing a hierarchical structure to shrink outlying estimates toward the grand mean. One place to start would be by modeling the pairs of responses to the two hypothetical matchup questions. Any of these things, however, is going to be much easier to do in a Bayesian framework, since calculating posterior distributions of functions of the model parameters is extremely easy.
March 5, 2008
The dramatic increase in cases of autism in children over the past few years has been in the news again in recent days. Most notably, presumptive Republican presidential nominee John McCain said at a recent stop, "there’s strong evidence that indicates that it’s got to do with a preservative in vaccines." Which would be fine if such strong evidence existed; unfortunately, that is a mischaracterization of the current state of the literature to say the least. McCain has since backed away from his initial comments (see this article in yesterday's New York Times), but the debate prompted by his comments will undoubtedly continue.
By coincidence, the Robert Wood Johnson program at Harvard is sponsoring a talk tomorrow on this topic. Professor Peter Bearman (chair of the Statistics Department at Columbia) will be speaking on "Early Thoughts on the Autism Epidemic." Professor Bearman is currently leading a project on the social determinants of autism. The talk is in N262 on the second floor of the Knafel Building at CGIS from 11:00 to 12:30.
February 5, 2008
Another Tuesday, another primary election or twenty, and another opportunity for things to go wrong with pre-election polls. The super Tuesday states, which had not seen much attention from pollsters earlier in January, have seen a deluge of polls released in the last week, nicely summarized at the Mystery Pollster blog. As Mark Blumenthal's recent post points out, "Somebody's gonna be wrong". The amount of dispersion present in the recent polls on both sides of the election exhibit is far more than can be accounted for by sampling variability. There are always house effects present in any polling context, but this borders on the ridiculuous. Unlike New Hampshire, no matter how the results turn out (barring a possible McCain collapse), we probably won't see as great a hue and cry about the pollsters this time because their pre-election predictions are all over the map.
Speaking of New Hampshire, Adam Berinsky from MIT e-mailed a few weeks ago to point to several studies that do look at errors in polling when a black candidate is on the ballot. These include Voting Hopes Or Fears?: White Voters, Black Candidates & Racial Politics by Keith Reeves, "Race-of-Interviewer Effects in a Preelection Poll: Virginia 1989" by Finkel, Guterbock, and Borg, and last but not least, Berinsky's Silent Voices: Public Opinion and Political Participation in America. This just goes to show me that a quick Google Scholar search on "Bradley effect" misses a lot of good stuff. We'll see if there is more evidence of such an effect tonight, but it is worth noting that in South Carolina things went the other way; Obama did much better than projected by the pre-election polls. Since they didn't get the outcome wrong, however, the pollsters didn't get nearly as much grief as they did after New Hampshire.
February 1, 2008
Abstracts are now being accepted for the 2008 useR! conference in Dortmund, Germany. This conference is designed to bring R users and developers together to trade ideas and find out what is new in the sprawling world of R. Several of us went to the Vienna conference a few years ago, and found it very useful. Previous editions have had a good mix of academic and private sector participants, and I learned more than I have at some of the more traditional academic conferences. The announcement from the useR webpage is below; the website is at http://www.statistik.uni-dortmund.de/useR-2008/
useR! 2008, the R user conference, takes place at the Fakultät Statistik, Technische Universität Dortmund, Germany from 2008-08-12 to 2008-08-14. Pre-conference tutorials will take place on August 11.
The conference is organized by the Fakultät Statistik, Technische Universität Dortmund and the Austrian Association for Statistical Computing (AASC). It is funded by the R Foundation for Statistical Computing.
Following the successful useR! 2004, useR! 2006, and useR! 2007 conferences, the conference is focused on
- R as the `lingua franca' of data analysis and statistical computing,
- providing a platform for R users to discuss and exchange ideas how R can be used to do statistical computations, data analysis, visualization and exciting applications in various fields,
- giving an overview of the new features of the rapidly evolving R project.
As for the predecessor conference, the program consists of two parts:
- invited lectures discussing new R developments and exciting applications of R,
- user-contributed presentations reflecting the wide range of fields in which R is used to analyze data.
A major goal of the useR! conference is to bring users from various fields together and provide a platform for discussion and exchange of ideas: both in the formal framework of presentations as well as in the informal part of the conference in Dortmund's famous beer pubs and restaurants.
Prior to the conference, on 2008-08-11, there are tutorials offered at the conference site. Each tutorial has a length of 3 hours and takes place either in the morning or afternoon.
Call for Papers
We invite all R users to submit abstracts presenting innovations or exciting applications of R on topics such as:
Applied Statistics & Biostatistics
Chemometrics and Computational Physics
Econometrics & Finance
Environmetrics & Ecological Modeling
High Performance Computing
Marketing & Business Analytics
Statistics in the Social and Political Sciences
Visualization & Graphics
and many more.
We recommend a length of about one page in pdf format. The program committee decided on the presentation format. There is no proceedings volume, but the abstracts are available in an online collection linked from the conference program and in a single pdf file.
Deadline for submission of abstracts: 2008-03-31.
January 9, 2008
New Hampshire voted last night, and managed to set off another frenzy of introspection among pollsters and pundits. On the Democratic side, public polls released after Iowa showed Obama leading Clinton by an average of about 10 points, but in the end Clinton of course edged out a narrow victory. The polls were much closer on the Republican side, but the "miss" on the Democratic side has already produced much concern about "New Hampshire's Polling Fiasco". Perhaps the witch-hunt that ensues whenever polls appear to be inaccurate in a major election should be viewed as a positive sign about the acceptance of survey research in the media and electorate; at the very least, these kinds of things keep a fair number of our colleagues gainfully employed. From my perspective, it would have been nice to have polls that were more consistent with the eventual outcome since we were planning to use them as examples in an undergraduate class; they will still be examples, but now the focus will be more on total survey error.
Why did the poll results diverge from the outcome? Several hypotheses are floating around. Jon Krosnick from Stanford has an opinion piece pointing to the ballot order effect; Hillary Clinton won the random draw to end up near the top of the ballot. There is certainly a lot of evidence that ballot-order effects matter, but my sense of the literature is that these effects tend to be smaller for better-known candidates, and it is hard to imagine a candidate better known than Hillary Clinton. Dan Ho and Kosuke Imai have written two articles on elections California that take advantage of randomization to estimate ballot-order effects:
More comment has focused on the possibility that Obama suffered from the "Bradley effect", in which some white voters say that they will support a black candidate when responding to poll questions but end up voting for a white candidate at the ballot box. There is not much academic literature on this supposed effect; here is a Pew Research Center note from last year; ironically, it is titled "Can you trust what polls say about Obama's electoral prospects?"
Finally, many observers have pointed to political prediction markets as either a supplement or alternative to traditional polls for predicting election outcomes, on the idea that these can incorporate other sources of information and require participants to put their money where their mouth is. They didn't do so well, either, as Jon Tierney notes on his New York Times blog, although the market prices did begin to move during the day. There is an interesting research agenda regarding the relative merits of polls and markets (and how markets integrate the information from various polls); Bob Erikson and Justin Wolfers, who are leading contributors to this literature, have an interesting exchange on this question on Andrew Gelman's blog (posted a week before New Hampshire, but even more interesting today).
January 4, 2008
James Fowler sent the following message to the Polmeth list, regarding a conference that we will apparently be hosting in June that may be of interest:
The study of networks has exploded over the last decade, both in the social and hard sciences. From sociology to biology, there has been a paradigm shift from a focus on the units of the system to the relationships among those units. Despite a tradition incorporating network ideas dating back at least 70 years, political science has been largely left out of this recent creative surge. This has begun to change, as witnessed, for example, by an exponential increase in network-related research presented at the major disciplinary conferences.
We therefore announce an open call for paper proposals for presentation at a conference on "Networks in Political Science" (NIPS), aimed at _all_ of the subdisciplines of political science. NIPS is supported by the National Science Foundation, and sponsored by the Program on Networked Governance at Harvard University.
The conference will take place June 13-14. Preceding the conference will be a series of workshops introducing existing substantive areas of research, statistical methods (and software packages) for dealing with the distinctive dependencies of network data, and network visualization. There will be a $50 conference fee. Limited funding will be available to defray the costs of attendance for doctoral students and recent (post 2005) PhDs. Funding may be available for graduate students not presenting papers, but preference will be given to students using network analysis in their dissertations. Women and minorities are especially encouraged to apply.
The deadline for submitting a paper proposal is March 1, 2008. Proposals should include a title and a one-paragraph abstract. Graduate students and recent Ph.D.'s applying for funding should also include their CV, a letter of support from their advisor, and a brief statement about their intended use of network analysis. Send them to firstname.lastname@example.org. The final program will be available at www.ksg.harvard.edu/netgov.
January 3, 2008
The presidential caucuses in Iowa will be held tonight, giving us our first "official" measure of popular support for the candidates in each party. We've had lots of "unofficial" measures from polls taken over the past few months - over 50 polls taken since Labor Day are posted on pollster.com - but polling to predict the outcome of the caucuses (as opposed to polling designed to measure overall support for the candidates) presents a number of difficult problems.
The first of these problems, and the one that has received the most attention, is identifying likely caucus participants from the sample of respondents. The Iowa caucuses require a significant time commitment (on the order of two to three hours) in order to participate, and turnout has historically been much lower than it has in the New Hampshire primary, to say nothing of a general elections. Identifying likely voters is a key challenge for any poll, but the low turnout levels make survey results unusually sensitive to the screening assumptions used. The recent Des Moines Register poll showing Obama at 32% ahead of Clinton at 25% prompted a great deal of discussion on this topic. These estimates were produced from a screen that implied that 60% of participants tonight would be first time caucus-goers and only 54% would be registered as Democrats before the caucus. While these could be valid estimates (we will soon see), this would be a very different population of caucus participants than we have seen in the past. For more discussion of the screening issue, see this post at Mystery Pollster, which is one of the best sites for coverage of the various polling-related issues in the current campaign.
On the Republican side, identifying likely caucus-goers is the main methodological problem. Iowa Republicans basically take a straw poll and report the results, so a random sample of participants (if they could be identified without error and did not change their minds) would produce an unbiased estimate of the outcome. Things are much more complicated on the Democratic side (their caucus guide is thirteen pages long); caucus participants first break into preference groups for each candidate. After this is accomplished, there is the opportunity for realignment subject to a "viability threshold" which varies from precinct to precinct but is always at least 15%. Pollsters may attempt to model this by asking respondents for their second choice and reallocating those respondents supporting candidates estimated to be below the threshold on a statewide basis. This, of course, assumes a uniform distribution of support across the state, which may or may not be a reasonable assumption (or may be more reasonable for some candidates than for others). If support for Kucinich, hypothetically, is concentrated in Ames and Iowa City, then he may be viable in the precincts where most of his support is found despite being well below the threshold statewide.
Finally, the Democrats in Iowa do not act or report results on a "one-person-one-vote" basis. The precinct caucuses elect delegates pledged to each of the candidates, and the number of delegates is based on the historical support for Democratic candidates in that precinct, not the number of people who participate in the caucuses. The state party then takes these results and calculates the "state delegate equivalent" share. The raw vote totals are available to the state party (which would be the closest to the parameter that the pre-caucus polling is trying to estimate) but the party does not release those results to the media (an op-ed criticizing this practice appeared in the New York Times last month). To my knowledge, none of the groups polling in Iowa attempts to take this weighting into account. The degree to which this causes the reported results to diverge from the raw votes will depend largely on the degree to which turnout in the caucuses diverges from historical Democratic turnout; if the Register poll is correct, this divergence could be quite large (and Obama supporters might not be too happy).
To sum up, polling the Iowa caucuses in order to predict the outcome for the Democrats is a serious problem: the population is hard to define, preferences are likely to be unusually malleable since the party rules require some participants to change their votes, and the results that are reported are not the quantity that would be estimated by a simple random sample of participants. If the polls appear to have gotten it wrong, it will be hard to parse out which of these factors (in addition to the normal sources of bias in any survey) were the main contributors.
December 11, 2007
A recent message to the Polmeth mailing list announced that a research group at the University of Pittsburgh is looking for beta testers for some new coding reliability software that they have developed:
The Coding Analysis Toolkit (or “CAT”) was developed in the summer of 2007. The system consists of a web-based suite of tools custom built from the ground-up to facilitate efficient and effective analysis of text datasets that have been coded using the commercial-off-the-shelf package ATLAS.ti (http://www.atlasti.com). We have recently posted a narrated slide show about CAT and a tutorial online. The Coding Analysis Toolkit was designed to use keystrokes and automation to clarify and speed-up the validation or consensus adjudication process. Special attention was paid during the design process to the need to eliminate the role of the computer mouse, thereby streamlining the physical and mental tasks in the coding analysis process. We anticipate that CAT will open new avenues for researchers interested in measuring and accurately reporting coder validity and reliability, as well as for those practicing consensus-based adjudication. The availability of CAT can improve the practice of qualitative data analysis at the University of Pittsburgh and beyond.
More information is avaliable at this website: http://www.qdap.pitt.edu/cat.htm. This is far from my area of expertise, but it looks like it might be useful for some projects...
December 7, 2007
Via the ELS Blog, there is news of a new effort organized by the law libraries at UCLA and Cornell to construct a bibliography of empirical research looking at questions in the legal realm. As a political scientist, it's kind of hard for me to conceptualize what an equivalent bibliography would look like for our field (other than unwieldy), but it looks like it could be quite useful for researchers both inside and outside of the legal academy. Now all we need is a translation of the journal abbreviations used by law reviews...
November 30, 2007
IQSS is sponsoring a conference next Friday on the emerging area of computational social science. Below is the announcement:
The Conference on Computational Social Science (part of the Eric M. Mindich Conference series)
Friday, December 7, 2007
Center for Government and International Studies South, Tsai Auditorium (Room S010)
1730 Cambridge Street, Cambridge, MA
The development of enormous computational power and the capacity to collect enormous amounts of data has proven transformational in a number of scientific fields. The emergence of a computational social science has been slower than in the sciences. However, the combination of the still exponentially increasing computational power with a massive increase in the capturing of data about human behavior makes the emergence of a field of computational social science desirable, but not inevitable. The creation of a field of computational social science poses enormous challenges, but offers enormous promise to achieve the public good. The hope is that we can produce an understanding of the global network on which many global
problems exist: SARS and infectious disease, global warming, strife due to cultural collisions, and the livability of our cities. That is, can sensing our society lead to a sensible society?
To solve these problems will require trading off privacy versus convenience, individual freedom versus societal benefit, and our sense of individuality versus group identity. How will we decide what the sensible society will look like? This conference brings together the wide array of individuals who are working in this emerging research area to discuss how we might address these global challenges, and to evaluate the potential emergence of a field of "computational social science.
Registration is required; more information is available here.
November 28, 2007
In political science, as in many other branches of social science, more attention is being paid to the genetic bases of political behavior (I won't say effects, because that opens a whole other barrel of worms). As I was looking around for an overview of some of the statistical issues involved, I came across a couple of blog posts by Cosma Shalizi at Carnegie Mellon that were both informative and amusing. An excerpt:
When we take our favorite population of organisms (e.g., last year's residents of the Morewood Gardens dorm at CMU), and measure the value of our favorite quantitative trait for each organism (e.g., their present zip code), we get a certain distribution of this trait:
(Note to our institutional review board: No undergraduates had their DNA sequenced in the writing of this essay.)
If we are limited to the tools of early 20th century statistics (in particular, if we are the great R. A. Fisher, and so simultaneously forging those tools while helping to found evolutionary genetics), we summarize the distribution with a mean and a variance. We can inquire as to where the variance in the population comes from. In particular, assuming the organisms are not all clones, it is reasonable to suppose that some of the variation goes along with differences in genes. The fraction of variance which does so is, roughly speaking, the "heritability" of the trait.
The most basic sort of analysis of variance (see also: Fisher) would make this conceptually simple, though practically unsuccessful. Simply take all the organisms in the population, and group them by their genotypes. For each group of genetically identical organisms, compute the average value of the trait. Compare the variance of these within-genotype averages (that is, the across-genotype variance) to the total population variance; this is the fraction of variation associated with genotypes. In most mammalian populations, where clones (identical twins, triplets, ...) are rare and every organism otherwise has a unique genotype, this would tell you that almost all of the variance of any trait is associated with genetic differences. On such an analysis, almost all of the variance in zip codes in my example would be "due to" genetic differences, and the same would be true of telephone numbers, social security numbers, etc.
To see why, look at my table again. With one exception (the twins who live in 15213 and 48104), in this population changing zip code means changing your genotype. The vast majority (81%) of the variance in zip codes is between genotypes, not within them. With real human data, a quarter of the people wouldn't be twins living apart, and the proportion of variance in zip codes "due to" genotype would be even higher.
Naively, then, on this analysis we would say that the "heritability" of zip code, the fraction of its variance which goes along with genetic variations, is 81%. It is crucial to be clear on what this means, which is merely and exactly this: in this population, if we take a random group of genetically identical people, the variance within that group should be 19% (=100-81) of the total variance in the population.
November 19, 2007
There was a good non-technical article by Adam Liptak in the New York Times this weekend reviewing the renewed debate about the supposed deterrent effect of capital punishment (The web version of the article linked to seven different academic articles; many thanks to the editorial staff). I've blogged about this before (here) and tend to agree with those who say that there just isn't enough information in the data. In that context, I particularly liked the quote from Justin Wolfers at the end of the article:
Professor Wolfers said the answer to the question of whether the death penalty deterred was “not unknowable in the abstract,” given enough data.
“If I was allowed 1,000 executions and 1,000 exonerations, and I was allowed to do it in a random, focused way,” he said, “I could probably give you an answer.”
November 16, 2007
In general, my impression is that cutting-edge research in social science rarely makes the leap from academic interest to media coverage to popular culture, but there are always some studies that capture the public's attention. One such study was the recent article by Nicholas Christakis from Harvard and James Fowler from UCSD ("The Spread of Obesity in a Large Social Network over 32 Years"). They find evidence that clusters of obese individuals are present and that they do not appear to be driven entirely by selection effects. This received widespread media attention, and was picked up by the writers of Boston Legal. At the end of this promo, we see how the character Denny Crane (played by William Shatner, a man who does not appear to push away from the table all that often himself) interprets the results of this study:
November 14, 2007
There was an interesting article this weekend in the Washington Post reviewing research on the relationship between the time at which adolescents become sexually active and subsequent anti-social behaviors ("Study debunks theory on teen sex, delinquency", Nov. 11, 2007). To sum up, existing research shows a strong and stable correlation between early "sexual debut" and delinquency later in life. This has often been interpreted as a causal relationship by policy advocates, despite the obvious potential for confounding. It seems clear that unobserved characteristics - thrill-seeking, risk-taking preferences (or even a simple lack of adult supervision) - would encourage both early sexual activity and delinquency.
The WaPo article contrasts these existing results with a new study by researchers at the University of Virginia, who look at differences in the timing of sexual debut and delinquency among pairs of twins. As the Post reports, "Other things being equal, a more probing study has found, youngsters who have consensual sex in their early-teen or even preteen years are, if anything, less likely to engage in delinquent behavior later on." This is a fairly accurate and measured interpretation of the results of the paper, which is worth commending since we often give the media a hard time on this blog for over-selling the results of scientific papers. (Now if we can just get them to link to the papers from their website, as the New York Times does fairly regularly.)
The authors of the twin study are somewhat more ambitious in their claims. Here is the abstract to the paper:
Rethinking Timing of First Sex and Delinquency
K. Paige Harden , Jane Mendle, Jennifer E. Hill, Eric Turkheimer and Robert E. Emery
(1) Department of Psychology, University of Virginia, Charlottesville, VA 22904-4400, USA
Abstract The relation between timing of first sex and later delinquency was examined using a genetically informed sample of 534 same-sex twin pairs from the National Longitudinal Study of Adolescent Health, who were assessed at three time points over a 7-year interval. Genetic and environmental differences between families were found to account for the association between earlier age at first sex and increases in delinquency. After controlling for these genetic and environmental confounds using a quasi-experimental design, earlier age at first sex predicted lower levels of delinquency in early adulthood. The current study is contrasted with previous research with non-genetically informative samples, including Armour and Haynie (2007, Journal of Youth and Adolescence, 36, 141–152). Results suggest a more nuanced perspective on the meaning and consequences of adolescent sexuality than is commonly put forth in the literature.
The current study suggests that there may be positive functions for early initiation of sexual activity, in that the co-twin with earlier age at first sex demonstrated lower levels of delinquency in early adulthood
Twin studies have been quite influential in a number of areas, and they have many benefits; they allow for balance on genetic and common environmental characteristics that would be exceedingly difficult to achieve in a typical observational study. Moreover, studies comparing idential and fraternal twins at least offer the possibility of teasing out the effects of genetic and environmental factors. At the same time, as twin studies move from biomedical to behavioral questions, there are some issues that deserve further consideration.
The first of these problems is selection within the sets of twins. If there is one thing that we know, it is that sex involves selection. Moreover, not to put to fine a point on it, but one of the parties involved in that selection process is choosing between twins (in many cases, identical twins!). The fact that the non-twin partner chose one twin over the other suggests that unobserved differences between the twins play an important role. The authors allude to this problem and describe their results as ``quasi-causal'', but they may be underestimating the importance of these ``uncontrolled confounds'' given the non-random character of the assignment process. Focusing on twins to achieve balance on genetic and shared environmental characteristic may end up increasing the overall bias of the estimates by increasing the imbalance in the unobserved unit-specific characteristics.
The second, and in my opinion more interesting, problem with the study is that it doesn't take into account the interaction within each set of twins. In effect, the researchers are conflating two treatments: the timing of each subject's sexual debut and the timing of the sexual debut for each subject's twin. This suggests an interference problem, because the delinquency outcome for subjects may depend on whether they became active before or after their twin did. One could easily imagine a scenario in which one twin becomes active and the other twin responds by acting out due to frustrations of one sort or another. In that case, it isn't so much that earlier sexual activity has a "positive function" for the twin engaging in it but rather a "negative function" for the twin that is not active; this is something that the data cannot answer.
I think that this is a general problem when using twin studies to estimate the effects of behavioral treatments. Some treatments will have a greater effect on the untreated twin that others, and my guess is the more the treatment is in the realm of social science, the more we should worry about these issues. At the very least, we should be skeptical about how the estimates obtained from twin studies would generalize to the population at large given the inherent interference problems.
November 1, 2007
I often share the mixed feelings about media coverage of scientific papers that Amy discussed in her post yesterday on the statistics of race. Apparently we aren't the only ones; Mark Liberman at Language Log linked to yesterday's Dilbert cartoon:
Language Log is one of my favorite blogs, and many of the posts there are relevant for those of us reporting our own statistical results and trying to promote better coverage in the media. Some of my favorites:
I know that I've committed some of these sins myself; in fact, I think I need to go reinterpret some odds ratios...
October 30, 2007
The Clay Mathematics Institute and the Harvard Mathematics Department are sponsoring a lecture by Terry Speed from the Department of Statistics at Berkeley on "Technology-driven statistics," with a focus on the challenges presented to statistical theory and practice presented by the massive amounts of data that are generated by modern scientific instruments (microarrays, mass spectrometers, etc.). These issues have not yet been as salient in the social sciences, but they are clearly on the horizon. The talk is at 7PM tonight (Oct. 30) in Science Center B at Harvard. The abstract for the talk is after the jump:
Terry Speed, UC Berkeley and WEHI in Melbourne, Australia
Tuesday, October 30, 2007, at 7:00 PM
Harvard University Science Center -- Hall B
Forty years ago, biologists collected data in their notebooks. If they needed help from a statistician in analyzing and interpreting it, they would pass over a piece of paper with numbers on it. The theory on which statistical analyses was built a couple of decades earlier seemed entirely adequate for the task. When computers became widely available, analyses became easier and a little different. with the term "computer intensive" entering the lexicon. Now, in contemporary biology and many other areas, new technologies generate data whose quantity and complexity stretches both our hardware and our theory. Genome sequencing, genechips, mass spectrometers and a host of other technologies are now pushing statistics very hard, especially its theory. Terry Speed will talk about this revolution in data availability, and the revolution we need in the way we theorize about it.
Terry Speed splits his time between the Department of Statistics at the University of California, Berkeley and the Walter & Eliza Hall Institute of Medical Research (WEHI) in Melbourne, Australia. Originally trained in mathematics and statistics, he has had a life-long interest in genetics. After teaching mathematics and statistics in universities in Australia and the United Kingdom, and a spell in Australia's Commonwealth Scientific and Industrial Research Organization, he went to Berkeley 20 years ago. Since that time, his research and teaching interests have concerned the application of statistics to genetics and molecular biology. Within that subfield, eventually to be named bioinformatics, his interests are broad, including biomolecular sequence analysis, the mapping of genes in experimental animals and humans, and functional genomics. He has been particularly involved in the low level analysis of microarray data. Ten years ago he took the WEHI job, and now spends half of his time there, half in Berkeley, and the remaining half in the air somewhere in between.
October 26, 2007
Andrew Gelman has an interesting post up about voting behavior in rich states and poor states, showing how voting patterns differ across the country when you condition on the income of the voters. There is not much of a relationship between per capita income and support for Democrats among poor voters, but there is a strong relationship among rich voters: rich voters in poor states are much more likely to support the Republicans than rich voters in rich states.
On a related note, Larry Bartels will be speaking at the Inequality seminar at Harvard on Monday, October 29 at noon in the Taubman Dining Room at the Kennedy School. His talk is entitled "Partisan Biases in Electoral Accountability," and draws on a forthcoming book. Much of his evidence focuses on differences in the reactions of lower, middle, and upper-income voters to economic performance. Gelman and Bartels are great examples of political scientists who are trying (with limited success, perhaps) to knock down some of the conventional wisdom about the "Red State, Blue State", "values voters" divide with careful data analysis, and are always well worth attention.
October 19, 2007
The Red Sox beat the Indians last night in Game 5 of the ALCS, sending the series back to Fenway and enabling the majority of us at Harvard who are (at least fair-weather) Sox fans to, as Kevin Youkilis said last night, come down off the bridge for a few more days. Why do I bring this up? Well, after Boston's loss in Game 4, a commenter on this blog asked the following question:
In the disastrous inning of the Red Sox game tonight, the announcer (maybe Tim McCarver?) said “One would think that a lead-off walk would lead to more runs than a lead-off home-run, but it’s not true. We’ve researched it and this year a lead-off home-run has led to more multi-run innings than have lead-off walks.”
I must not be "one", b/c I think a lead-off home-run is much more likely to lead to multiple-run innings, b/c after the home-run, you have a run and need only 1 more to have multiple, and the actions after the first batter are mostly independent of the results of the first batter. So, I think he has it totally backwards. I was a fair stats student, so I need confirmation. He was backwards, right?
The short answer is that it was Tim McCarver, and as an empirical matter he was wrong to be surprised. I don't have access to full inning-by-inning statistics over a long period of time, but the most convincing analysis I found in a quick search (here) suggests that between 1974 and 2002, the probability of a multi-run inning conditional on a leadoff walk is .242 and the probability of a multirun inning after a leadoff home run is .276.
The blogosphere has had a lot of fun at McCarver's expense (not that it takes much to provoke such a reaction, granted): It's Math!, Zero > One, Tim McCarver Does Research, etc. His observation, though, is a good example of Bayesian updating at work: while I doubt that most baseball observers "would think that a lead-off walk would lead to more runs than a lead-off home-run," it is very clear that Tim McCarver thought that at some point. As evidence, in a 2006 game he made the following comment:
"There is nothing that opens up big innings any more than a leadoff walk. Leadoff home runs don't do it. Leadoff singles, maybe. But a leadoff walk. It changes the mindset of a pitcher. Since he walked the first hitter, now all of a sudden he wants to find the fatter part of the plate with the succeeding hitters. And that could make for a big inning."
In 2004, he said during the Yankees-Red Sox ALCS that "a walk is as good as a home run." And back in 2002, he made a similar comment during the playoffs; in fact, it was that comment that prompted the analysis that I linked to above! Clearly, he had a strong prior belief (from where, I don't know) that leadoff walks somehow get in the pitcher's head and produce more big innings. Now that he's been confronted by data, those belief are updating, but since his posterior has shifted so much from his prior it's not surprising that he thinks this is some great discovery. In a couple of years, he'll probably think that he always knew a leadoff home run was better.
As for the intuition, it looks like the commenter is also correct. Using the data cited above, the probability of scoring zero runs in an inning is approx. .723, while the probability of scoring no additional runs after a leadoff homer is approx. .724; the rest of distribution is similar as well.
The Martin-Quinn estimates of judicial preferences, developed by Andrew Martin and our own Kevin Quinn, are an interesting example of top-notch methods work that has received fairly widespread attention outside of the methods community. On SCOTUSBlog, there is an interview with Andrew; while it's aimed at legal practitioners rather than statisticians, its good to see them getting some screen time.
October 16, 2007
The winners of the 2007 Economics prize were announced yesterday in Stockholm; the award will go to Leonid Hurwicz, Eric Maskin, and Roger Myerson "for having laid the foundations of mechanism design theory." Not quite as well known as this year's Peace prize winner, but big names in the world of economic theory. Marginal Revolution has much more detail on the winners and their work (here, here, and elsewhere). I don't have much to add, other than a few comments on why I'm blogging this on a statistics blog.
I think it's fair to say that the Bank of Sweden Prize in Economic Sciences in honor of Alfred Nobel (yes, that's more or less the official name) is the most visible award in the social sciences. The prize has occasionally been awarded to econometricians (Engle and Granger in 2003, Heckman and McFadden in 2000, Haavelmo in 1989, and Klein in 1980), but it is striking how rare it is for econometrics (or, for that matter, empirical work in economics) to be recognized by the prize committee. This is not true of other fields. To get a sense of the discrepancy, compare economics with physics, a discipline not known for being particularly atheoretical. Each award carries with it a citation recognizing the work for which the prize was given. If we look at the ratio of citations with the word "theory" to those with the word "discovery", in economics the ratio is 19 to 1 (and the one "discovery" is the Coase Theorem), while in physics the ratio is more like 1 to 3.8. I think this reflects the productive interplay between theory and empirics in physics, and the lack of a similar dynamic in economics (and social science generally). It will be interesting to see when and if the current movement toward behavioral economics will be recognized by the selection committee.
October 10, 2007
Today's applied stats talk by Fernanda Viegas and Martin Wattenberg covered a wide array of interesting data visualization tools that they and their colleagues have been developing over at IBM Research. One of the early efforts that they described is an applet called History Flow, which allows users to visualize the evolution of a text document that was edited by a number of people, such as Wikipedia entries or computer source code. You can track which authors contributed over time, how long certain parts of the text have remained in place, and how text moves from one part of the document to another. To give you a flavor of what is possible, here is a visualization of the history of the Wikipedia page for Gary King (who is the only blog contributor who has one at the moment):
This shows how the page became longer over time and that it was primarily written by one author. The applet also allows you to connect textual passages from earlier versions to their authors. We noticed this one from Gary's entry:
"Ratherclumsy"'s contribution to the article only survived for 24 minutes, and was deleted by another user with best wishes for becoming "un-screwed". All kidding aside, this is a really interesting tool for text-based projects. Leaving aside the possibility for analysis, this would be useful for people working on coding projects. I can think of more than one R function that I've worked on where it would be nice to know who wrote a particular section of code....
October 5, 2007
On a lighter note this Friday afternoon, there has been an interesting and largely good-natured debate on various blogs in response to a recent New York Times article on the happiness gap (or the change in the happiness gap, or reversal, or something) between men and women (He's happier, she's less so). Much of the discussion has been on the substantive significance of the results and how those results are likely to be interpreted by the (non-statistically minded) public. This post at Language Log summarizes the debate and provides links to previous entries on both sides. Most of these are quite serious, while a few are (ahem) less so. On the other hand, any time that Stata code appears on a pop culture website, it is worth noting...
October 3, 2007
Zachary Johnson sent along a link to his new comparative politics data visualization website, the World Freedom Atlas. This is how he describes the site:
The World Freedom Atlas is a geovisualization tool for world statistics. It was designed for social scientists, journalists, NGO/IGO workers, and others who wish to have a better understanding of issues of freedom, democracy, human rights, and good governance. It covers the years 1990 to 2006.
When I took a look around, I was impressed. The site allows you to pick variables, compare variables from different years (which makes it easy to compare, say, polity scores in 1995 with the level of corruption 5 years later), produce interactive scatterplots and boxplots, etc. The data is taken from existing published sources, some of it good and some of it less so (I have a particular beef with the Vanhanen "Index of Democratization", which has always struck me as possibly the silliest attempt to measure a concept yet produced in the comparative politics literature). A couple of suggestions to incorporate in the next version: When you brush a point in the scatterplot, it only brings up the name of one country. Given the lumpiness of the data, this often conceals several other country names. Also, it would be nice to incorporate a function that allows you to print out nice image files of particular map/scatterplots/etc. I don't know how hard that would be to do. All in all, it's worth a look.
September 25, 2007
I was reminded again the other day that the word “data” is plural, since it means more than one “datum”, and thus “data” requires a plural verb. The Economist style guide says so, as does the European Union translation manual. The Oxford English Dictionary doesn’t even have an entry for “data,” subsuming it under “datum,” and it identifies sentences with singular constructions as “irregular or confused usage.”
End of story, right? Maybe, maybe not. There are a couple of problems with the “data is the plural of datum” story. (These have been discussed widely on the web, and I’m drawing freely on those discussions). First, it is not quite right even in Latin to say that “data” is the plural of the singular count noun “datum”; both are conjugations of the verb dare, to give. Second, in English, we hardly ever refer to one piece of data as a datum; at least in political science it is an observation, a case, or perhaps a data point. When the word datum is used, it usually has a specialized meaning and takes the plural form “datums.”
The bigger problem, from my perspective, is that fully adhering to “data” as a plural count noun forces you into constructions like
How many data are enough?
How much data is enough?
The first of these “How many data are…” is correct for a plural count noun, while the second, “How much data is…” is appropriate for a mass noun such as “gold” or “water.” The second sentence sounds much better to me. It also wins on a Google Scholar search by a margin of 10 to 1 (2120 to 198). There are also about 400 hits for “How much data are…”, no doubt from those who want to treat “data” as a mass noun but have been reminded that “data is plural.” It seems to me that data has come to be like the mass nouns described in this post from Language Log:
A great many M nouns denote collectivities of things, but small things, especially small things whose indivual identities are not usually important to us: CORN, RICE, BARLEY, CHAFF, CONFETTI, etc. Some of these contrast minimally with C nouns of similar denotations, like BEAN, PEA, LENTIL. In any case, it would be easy to think of barley in "The barley was almost cooked" as "meaning more than one" in much the same way as lentils in "The lentils were almost cooked" does -- and in fact, every so often someone misidentifies little-thing M nouns as "plural".
I kind of like the idea of data as a collection of small things that aren’t that important to us as individual objects but that are meaningful when taken together.
So, in the end, is “data” a plural count noun or a mass noun? I would certainly prefer the latter, but at least on this side of the Atlantic it looks like it will be both. Here are some usage notes to ponder:
Data leads a life of its own quite independent of datum, of which it was originally the plural. It occurs in two constructions: as a plural noun (like earnings), taking a plural verb and plural modifiers (as these, many, a few) but not cardinal numbers, and serving as a referent for plural pronouns (as they, them); and as an abstract mass noun (like information), taking a singular verb and singular modifiers (as this, much, little), and being referred to by a singular pronoun (it). Both constructions are standard. The plural construction is more common in print, evidently because the house style of several publishers mandates it.
The word data is the plural of Latin datum, “something given,” but it is not always treated as a plural noun in English. The plural usage is still common, as this headline from the New York Times attests: “Data Are Elusive on the Homeless.” Sometimes scientists think of data as plural, as in These data do not support the conclusions. But more often scientists and researchers think of data as a singular mass entity like information, and most people now follow this in general usage. Sixty percent of the Usage Panel accepts the use of data with a singular verb and pronoun in the sentence Once the data is in, we can begin to analyze it. A still larger number, 77 percent, accepts the sentence We have very little data on the efficacy of such programs, where the quantifier very little, which is not used with similar plural nouns such as facts and results, implies that data here is indeed singular.
September 17, 2007
The applied statistics workshop begins this Wednesday (9/19) at 1200pm in N-354. The applied stats workshop is billed as a tour of the applied statistics community at Harvard University, with scholars from Economics, Political Science, Public Health, Sociology, Statistics, and other fields coming together to present cutting edge research. We are happy to have Ben Goodrich (Government G-5) presenting his work on Semi-Exploratory Factor Analysis. Below is a summary of his talk:
I develop a new estimator called semi-exploratory factor analysis (SEFA) that is slightly more restrictive than exploratory factor analysis (EFA) and considerably less restrictive than confirmatory factor analysis. SEFA has three main advantages over EFA: the objective function has a unique global optimum, rotation is unnecessary, and hypotheses about models can easily be tested. SEFA represents a very difficult constrained optimization problem with nonlinear inequality constraints that, for all practical purposes, can only be solved with a genetic optimization algorithm, such as RGENOUD (Mebane and Sekhon 2007). This use of new features of RGENOUD is potentially fruitful for difficult optimization problems besides those in factor analysis.
We have a preliminary schedule posted on the course website; please contact me (Justin Grimmer, email@example.com) if you are interested in presenting in one of our few remaining open spots. And of course, a light lunch will be provided.
June 20, 2007
The Society for Political Methodology has announced the winner of its inaugural Career Achievement Award. The first recipient will be Chris Achen, currently the Roger Williams Straus Professor of Social Sciences at Princeton University. The award will be presented at the APSA meeting this summer at the society's business meeting. Chris was chosen to receive the award by a committee consisting of Simon Jackman, Mike Alvarez, Liz Gerber and Marco Steenbergen, and their citation does a fine job of summarizing his many accomplishments over the years.
On a personal note, Chris was my senior thesis advisor back in 00-01 when he was at Michigan. That came about through a bit of luck; I had never taken a class from him, and one of the other professors at Michigan asked him to meet with me as a favor. Despite this, he was unfailingly generous with both support and constructive criticism. At least at the time, Chris had the habit of working rather late in the evenings. When I was working on my thesis, I'd often send him an e-mail asking a few questions when I left the computer lab at night, and by the time I got home there would be an answer in my inbox pointing out what I had missed or suggesting some new approach to try. If Chris hadn't taken me on as an advisee back then, I probably would not be in graduate school today.
The citation follows on the jump:
Christopher H. Achen is the inaugural recipient of the Career Achievement Award of the Society for Political Methodology. Achen is the Roger William Straus Professor of Social Sciences in the Woodrow Wilson School of Public and International Affairs, and Professor of Politics in the Department of Politics, at Princeton University. He was a founding member and first president of the Society for Political Methodology, and has held faculty appointments at the University of Michigan, the University of California, Berkeley, the University of Chicago, the University of Rochester, and Yale University. He has a Ph.D. from Yale, and was an undergraduate at Berkeley.
In the words of one of the many colleagues writing to nominate Achen for this award, "Chris more or less made the field of political methodology''. In a series of articles and books now spanning some thirty years, Achen has consistently reminded us of the intimate connection between methodological rigor and substantive insights in political science. To summarize (and again, borrowing from another colleague's letter of nomination), Achen's methodological contributions are "invariably practical, invariably forceful, and invariably presented with clarity and liveliness''. In a series of papers in 1970s, Chris basically showed how us how to do political methodology, elegantly demonstrating how methodological insights are indispensable to understanding a phenomenon as central to political science as representation. Achen's "little green Sage book'', Interpreting and Using Regression (1982) has remained in print for 25 years, and has provided generations of social scientists with a compact yet rigorous introduction to the linear regression model (the workhorse of quantitative social science), and is probably the most widely read methodological book authored by a political methodologist. Achen's 1983 review essay "Towards Theories of Data: The State of Political Methodology'' set an agenda for the field that still powerfully shapes both the practice of political methodology and the field's self-conception. Achen's 1986 book The Statistical Analysis of Quasi-Experiments provides a brilliant exposition of the statistical problems stemming from non-random assignment to "treatment'', a topic very much in vogue again today. Achen's 1995 book with Phil Shivley, Cross-Level Inference, provides a similarly clear and wise exposition of the issues arising when aggregated data are used to make inferences about individual behavior ("ecological inference''). A series of papers on party identification --- an influential 1989 conference paper, "Social Psychology, Demographic Variables, and Linear Regression: Breaking the Iron Triangle in Voting Research'' (Political Behavior, 1992) and "Parental Socialization and Rational Party Identification'' (Political Behavior, 2002) --- have helped formalize the "revisionist'' theory of party identification outlined by Fiorina in his 1981 Retrospective Voting book, and now the subject of a lively debate among scholars of American politics.
In addition to being a productive and extremely influential scholar, Achen has an especially distinguished record in training graduate students in methodology, American politics, comparative politics, and international relations. His students at Berkeley in the late 1970s and early 1980s included Larry Bartels (now at Princeton), Barbara Geddes (UCLA), Steven Rosenstone (Minnesota), and John Zaller (UCLA), among many others. His students at Michigan in the 1990s include Bear Braumoeller (now at Harvard), Ken Goldstein (Wisconsin), Simon Hug (Texas-Austin), Anne Sartori (Princeton), and Karen Long Jusko (Stanford). In addition to being the founding president of the Society for Political Methodology, Chris has been a fellow at the Center for Advanced Study in the Behavioral Sciences, has served as a member of the APSA Council, has won campus-wide awards for both research and teaching, and is a member of the American Academy of Arts and Sciences.
June 13, 2007
A few days ago, the AP moved a story reporting on academic studies of the deterrent effect of the death penalty on potential murderers. Many media outlets picked up the story under headlines such as "Studies say death penalty deters crime", "Death penalty works: studies", and my favorite, "Do more executions mean fewer murders?" Presumably the answer to the last question is yes, at least in the limit; if the state were to execute everyone (except the executioner, of course), clearly there would be fewer murderers.
I was surprised when I read the article on Monday morning, since my sense of the state of play in this area is that it is probably impossible to tell one way or the other. Those are the findings of a recent study by Donohue and Wolfers, which finds most existing studies to be flawed and, more importantly, points out a variety of reasons why estimating the correct deterrent effect is difficult in principle. Here is some of what Andrew Gelman had to say about their study last year:
My first comment is that death-penalty deterrence is a difficult topic to study. The treatment is observational, the data and the effect itself are aggregate, and changes in death-penalty policies are associated with other policy changes.... Much of the discussion of the deterrence studies reminds me of a little-known statistical principle, which is that statisticians (or, more generally, data analysts) look best when they are studying large, clear effects. This is a messy problem, and nobody is going to come out of it looking so great.
My second comment is that a quick analysis of the data, at least since 1960, will find that homicide rates went up when the death penalty went away, and then homicide rates declined when the death penalty was re-instituted (see Figure 1 of the Donohue and Wolfers paper), and similar patterns have happened within states. So it's not a surprise that regression analyses have found a deterrent effect. But, as noted, the difficulties arise because of the observational nature of the treatment, and the fact that other policies are changed along with the death penalty. There are also various technical issues that arise, which Donohue and Wolfers discussed.
Given the tone of the article (and certainly the headlines), you would have thought that the Donohue and Wolfers paper had been overlooked by the reporter, but no: he cites it in the article, and he interviewed Justin Wolfers! He seems to have missed the point, however; the issue is not that some studies say that "there is a deterrent effect" and some say "we're just not sure yet". The problem is that we aren't sure, and we probably never will be unless someone gets to randomly assign death penalty policy to states or countries. This raises a problem that we often face in social science: there are questions that are interesting, and there are questions that we can answer, and the intersection of those two categories is probably a lot smaller than any of us would like. This doesn't seem to be a realization that has crept into the media as of yet, so it is no surprise that studies that purport to give answers to interesting questions will get more coverage than those pointing out why those answers probably don't mean very much.
June 7, 2007
Congratulations to the 2007 Gosnell Prize winners - Harvard's very own Alberto Abadie, Alexis Diamond, and Jens Hainmueller! They won for their paper "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program", which was presented at this year's MPSA conference in Chicago. We saw an earlier version of the paper this past semester at the Applied Stats workshop, and I have to say, the award is well deserved. The Gosnell Prize is awarded to the best paper presented at any political science conference in the preceding year. Alexis is a two-time recipient, having shared the award with Jas Sekhon in 2005 for their paper on genetic matching.
May 23, 2007
The New York Times has an article today ("For Drug Makers, a Downside to Full Disclosure") discussing the recent creation of archives for pharmecutical clinical trial data, including data from trials that did not result in publications. This effort is an attempt to deal with the age old problem of publication bias, a problem supposedly identified by the ancient Greeks, as described in a letter to the editor of Lancet by Mark Pettigrew:
The writings of Francis Bacon (1561-1626) are a good starting point. In his 1605 book, The Advancement of Learning, he alludes to this particular bias by pointing out that it is human nature for "the affirmative or active to effect more than the negative or privative. So that a few times hitting, or presence, countervails oft-times failing or absence". This is a clear description of the human tendency to ignore negative results, and Bacon would be an acceptable father figure. Bacon, however, goes further and supports his claim with a story about Diagoras the Atheist of Melos, the fifth century Greek poet.
Diagoras was the original atheist and free thinker. He mocked the Eleusinian mysteries, an autumnal fertility festival which involved psychogenic drug-taking, and was outlawed from Athens for hurling the wooden statue of a god into a fire and sarcastically urging it to perform a miracle to save itself. In the context of publication bias, his contribution is shown in a story of his visit to a votive temple on the Aegean island of Samothrace. Those who escaped from shipwrecks or were saved from drowning at sea would display portraits of themselves here in thanks to the great sea god Neptune. "Surely", Diagoras was challenged by a believer, "these portraits are proof that the gods really do intervene in human affairs?" Diagoras' reply cements his claim to be the "father of publication bias": "yea, but . . . where are they painted that are drowned?"
While dealing with publication bias would seem to be a good thing, the Times article suggests (perhaps in an attempt to avoid publication bias itself) that some people are worried about this practice:
Some experts also believe that releasing the results of hundreds of studies involving drugs or medical devices might create confusion and anxiety for patients who are typically not well prepared to understand the studies or to put them in context.
“I would be very concerned about wholesale posting of thousands of clinical trials leading to mass confusion,” said Dr. Steven Galson, the director for the Center for Drug Evaluation and Research at the F.D.A.
It is a little hard for me to believe that this confusion would be worse than the litany of possible side effects given at the end of every pharmecutical commercial, but that is a different issue. From a purely statistical point of view, it seems like this is a no-brainer, a natural extension of efforts to ensure that published results can be replicated. Whether you are a frequentist or a Bayesian, inferences should be better when conditioned on all of the data that has been collected, not just the data that researchers decided to use in their publications. There could be a reasonable argument about what to do with (and how do define) corrupted data - data from trials that blew up in one way or another - but this seems like a second-order consideration.
It would be great if we could extend this effort into the social sciences. It would be easier to do this for experimental work since the data collection process is generally well defined. On the other hand, I suspect that there is less of a need for archives of experimental data in the social sciences, for two reasons. First, experimental work is still rare enough (at least in political science) that I think you have a decent chance of getting published even with "non-results". Second, my sense is that, with the possible exception of researchers closely associated with particular policy interventions, the incentives facing social scientists are not the same as those facing pharmecutical researchers. Social scientists may have a preference for "significant" results, but in most cases they don't care as much about the direction.
The kind of data archive described above would be more useful for observational research, but much harder to define. Most social scientists have invested significant time and energy collecting observational data only to find that there are no results that reviewers would think were worth publishing. On the other hand, how do we define a trial for observational data? Should there be an obligation to make one's data available any time that it is collected, or should it be restricted to data that has been analyzed and found uninteresting? Or should we think of data and models together, and ask researcher to share both their data and their analysis? I'm not sure what the answer is, but it is something that we need to think about as a discipline.
May 22, 2007
Over at the Volokh Conspiracy, Professor Elmer Elhauge from Harvard Law School has a post about the future of empirical legal studies, comparing the law today to baseball before the rise of sabermetrics. From the post:
In short, in law, we are currently still largely in the position of the baseball scouts lampooned so effectively in Moneyball for their reliance on traditional beliefs that had no empirical foundation. But all this is changing. At Harvard Law School, as traditional a place as you can get, we now have by my count 10 professors who have done significant statistical analysis of legal issues. We just hired our first JD with a PhD in statistics. The movement is not at all limited to Harvard, and seems to be growing at all law schools.
So we are hardly devoid of empirical analysis of law. We are just, rather, in our early Bill James era, and can expect the analysis to get more sophisticated and systematic as things progress. I expect within a couple of decades we will have our own book distilling the highlights of things we will know then that conflict with what is now conventional legal wisdom.
We are all pretty pleased that Harvard Law now has a stats Ph.D. on faculty. But one of the commenters raises an interesting question; if empirical legal studies are like sabermetrics, who is the legal equivalent of Joe Morgan?
May 10, 2007
And while we're doing announcements, the Society for Political Methodology is also soliciting nominations for the Gosnell Prize, awarded to the best paper in methods presented at any political science conference:
The Gosnell Prize for Excellence in Political Methodology is awarded for the best work in political methodology presented at any political science conference during the preceding year, 1 June 2006-31 May 2007.
The Award Committee also includes Michael Crespin and Patrick Brandt.
We look forward to submissions for this important award in the next few weeks, as our decision will be made toward the end of the month. Yes, this month. Right now it is a wide open field. There were a lot of great papers presented at APSA, MPSA, Methods, ISA, and elsewhere in the past year. Please send a short nomination paragraph along with the originally presented paper (not a revision) in PDF format to me or any of the committee members.
Thanks for your help in nominating worthy manuscripts.
Michael D. Ward, Professor of Political Science
University of Washington, Seattle, WA, 98195-3530, USA
The Program on Survey Research at Harvard is hosting an afternoon conference tomorrow on the challenges of surveying multiethnic populations:
Surveying Multiethnic America
May 11, 2007
12:30 – 5:00
Institute for Quantitative Social Science
1737 Cambridge St.
Cambridge, MA 02138
Across a variety of different academic disciplines, scholars are interested in topics related to multiethnic populations, and sample surveys are one of the primary means of studying these populations. Surveys of multiethnic populations face a number of distinctive methodological challenges, including issues related to defining and measuring ethnic identity, and locating, sampling, and communicating with the groups of interest.
This afternoon panel sponsored by the Program on Survey Research at Harvard University will look at recent survey research projects on multiethnic populations in the US. Researchers will discuss how they confronted the unique methodological challenges in their survey projects and will consider the implications of their approach for their key theoretical and empirical findings.
12:30 - 2:45
Sunshine Hillygus, Harvard University, Introduction
Manuel de la Puente, US Bureau of the Census, Current Issues in Multiethnic Survey Methods
Guillermina Jasso, New York University, New Immigrant Study
Deborah Schildkraut, Tufts University, The 21st Century Americanism Study
Yoshiko Herrera, Harvard University, Discussant
3:00 - 5:00
Tami Buhr, Harvard University, Harvard Multi-Ethnic Health Survey
Ronald Brown, Wayne State University, National Ethnic Pluralism Survey
Valerie Martinez-Ebers, Texas Christian University, National Latino Politics Survey
Kim Williams, Harvard University, Discussant
Simon Jackman sent around the following today on behalf of the Society for Political Methodology:
The Society for Political Methodology will award its first Political Methodology Career Award this year, to recognize an outstanding career of intellectual accomplishment and service to the profession in the Political Methodology field. The award committee -- Simon Jackman (chair), Elisabeth Gerber, Marco Steenbergen, Mike Alvarez -- is calling for nominations for this award, due no later than Monday May 28, 2007. Nominations may be sent to me. Needless to say, a brief statement in support of the nominee will greatly assist the committee in our deliberations.
May 7, 2007
Just as a reminder, the Applied Statistics Workshop has wrapped up for this academic year. Thanks to all who came to the talks, and we look forward to seeing you again in September.
May 1, 2007
The New York Times has an article discussing a working paper by Justin Wolfers and Joseph Price, looking at the rate at which white referees call fouls on black players (and black referees call fouls on white players). The paper can be found here. I haven't had a chance to read it yet, but if it uses "multivariable regression analysis" as it says in the Times article, then I'm sure it must be good.
April 30, 2007
The final session of the Applied Statistics workshop will be held this week. We will present a talk by Adam Glynn, assistant professor of political science at Harvard. Professor Glynn received his Ph.D. in statistics from the University of Washington. His research and teaching interests include political methodology, inference for combined aggregate and individual level data, causal inference, and sampling design. His current research involves optimal sampling design conditional on aggregate data and the use of aggregate data for the reduction of estimation error.
Professor Glynn will present a talk entitled "Alleviating Ecological Bias in Generalized Linear Models with Optimal Subsample Design." A background paper is posted on the course website. The presentation will be at noon on Wednesday, May 2 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided.
April 25, 2007
Tomorrow afternoon, the Harvard-MIT Positive Political Economy seminar will be presenting at talk by Robert Erikson, professor of Political Science at Columbia University. He will be giving a talk entitled "Are Political Markets Really Superior to Polls as Election Predictors?". The seminar will meet on Thursday, April 26 at 4:30 in room N354 at CGIS North (this is also the room where the Applied Statistics workshop meets on Wednesdays). An abstract follows on the jump:
Election markets have been praised for their ability to forecast election outcomes, and to forecast better than trial-heat polls. This paper challenges that optimistic assessment of election markets, based on an analysis of Iowa Electronic Market (IEM) data from presidential elections between 1988 and 2004. We argue that it is inappropriate to naively compare market forecasts of an election outcome with exact poll results on the day prices are recorded, that is, market prices reflect forecasts of what will happen on Election Day whereas trial-heat polls register preferences on the day of the poll. We then show that when poll leads are properly discounted, poll-based forecasts outperform vote-share market prices. Moreover, we show that win-projections based on the polls dominate prices from winner-take-all markets. Traders in these markets generally see more uncertainty ahead in the campaign than the polling numbers warrant—in effect, they overestimate the role of election campaigns. Reasons for the performance of the IEM election markets are considered in concluding sections.
April 24, 2007
Several units on campus are sponsoring a lecture series by Michael Stein, professor of statistics and director of the Center for Integrating Statistical and Environmental Science at the University of Chicago. He will be talking about issues in space-time statistical modeling. There will be three lectures from April 25-27, but the lectures are at different times and locations; click on the links for an abstract of the lecture:
Models and Diagnostics for Spatial and Spatial- Temporal Processes
Wednesday, April 25: 3:30-5:00
HSPH Kresge G1
(simulcast to CGIS N031)
Models and Diagnostics for Spatial and Spatial- Temporal Processes
Thursday, April 26: 3:30-5:00
HSPH Kresge G2
(simulcast to CGIS N031)
Statistical Processes on a Global Scale
Friday, April 27: 11:00-12:00
(simulcast to HSPH Kresge G3)
April 23, 2007
The American Economic Association has announced that this year's John Bates Clark Medal has been awarded to Susan Athey, professor of economics here at Harvard. The Clark Medal is awarded every other year to an American economist under the age of 40 who has made a significant contribution to economic thought. Previous winners include Kenneth Arrow, Dale Jorgenson, James Heckman, Jerry Hausman, and (most recently) Daron Acemoglu. Professor Athey stands out in one respect, however; she is the first woman to be awarded the Clark Medal (and about time, too!). For more information, see the AEA announcment or coverage in the Harvard Crimson.
This week, the Applied Statistics Workshop will present a talk by John Campbell, the Morton L. and Carole S. Olshan Professor of Economics at Harvard University. Professor Cambell received his Ph.D. from Yale University and served on the faculty at Princeton before coming to Harvard in 1994. He is the author or editor of four books, and he has published widely in journals in economics and finance, including the American Economic Review, Econometrica, and the Quarterly Journal of Economics. He recently served as the president of the American Finance Association.
Professor Campbell will present a talk entitled "Fight or Flight: Portfolio Rebalancing By Individual Investors." The talk is based on joint work with Laurent E. Calvet and Paolo Sodini; their paper is available from the course website. The presentation will be at noon on Wednesday, April 25 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided. An abstract of the talk follows on the jump:
Fight Or Flight? Portfolio Rebalancing By Individual Investors Laurent E. Calvet, John Y. Campbell and Paolo Sodini
This paper investigates the dynamics of individual portfolios in a unique dataset containing the disaggregated wealth and income of all households in Sweden. Between 1999 and 2002, stockmarket participation slightly increased but the average share of risky assets in the financial portfolio of participants fell moderately, implying little aggregate rebalancing in response to the decline in risky asset prices during this period. We show that these aggregate results conceal strong household level evidence of active rebalancing, which on average offsets about one half of idiosyncratic passive variations in the risky asset share. Sophisticated households with greater education, wealth, and income, and holding better diversified portfolios, tend to rebalance more aggressively. We also study the decisions to enter and exit risky financial markets. More sophisticated households are more likely to enter, and less likely to exit. Portfolio characteristics and performance also influence exit decisions. Households with poorly diversified portfolios and poor returns on their mutual funds are more likely to exit; however, consistent with the literature on the disposition effect, households with poor returns on their directly held stocks are less likely to exit.
April 19, 2007
Since Jim's post has brought us back to the SUTVA problem, here is another situation to consider. Let's say that I am interested in the effect of starting order on the performance of athletes in some competition. For the sake of argument, let's say cycling. We might conjecture that starting in the first position in a pack of cyclists conveys some advantage, since the leader can stay out of trouble in the back of the pack. On the other hand, there might be an advantage to starting in a lower position so that the cyclist can take advantage of the draft behind the leaders.
It is pretty clear what we would like to estimate in this case. If X is the starting position from 1 to n, and Y is the length of time that it takes the athlete to complete the race, then the most intuitive quantity for the causal effect of starting first instead of second is E[Y_i|X_i=1] - E[Y_i|X_i=2], etc. We still have the fundamental problem of causal inference in that we only observe one of the potential outcomes, but average treatment effects also make sense in this case, defining the ATE as E[Y|X=1] - E[Y|X=2]. Moreover, there is a clear manipulation involved (I can make you start first or I can make you start second) and such a manipulation would be easy to implement using a physical randomization to ensure balance on covariates in expectation. Indeed, this procedure is used in several sports; one example is the keiren race in cycling, which is a paced sprint competition among 6-9 riders.
So far, so good, but there is a problem...
It is pretty clear that we have a SUTVA violation here. It is not that if Cyclist A is assigned to start in position 2, then Cyclist B has to be assigned to start in some other position; SUTVA (as I understand it) doesn't require that it be possible for all subjects to be assigned to all values of the treatment. The problem is that the potential outcome for Cyclist A starting in position 2 may depend on whether Cyclist B is assigned to position 1 and Cyclist C is assigned to position 3 or vice versa. What if B is a strong cyclist who likes to lead from the front, enabling A to draft for most of the race, while C is a weak starter who invariably falls to the back of the pack? In that case, E[Y_A| X_A= 2, X_B = 1, X_C = 3] will not be equal to E[Y_A| X_B = 3, X_A = 2, X_C = 1]. In other words, in this case there is interference between units. So, the non-interference aspect of SUTVA is violated and therefore E[Y|X=1] - E[Y|X=2] isn't a Rubin causal effect. Bummer.
On the other hand, if we are able to run this race over and over again with the same cyclists, we are in a sense going to average over all of the assignment vectors. If we then take the observed data and plot E[Y|X = x], we are going to get a relationship in the data that is purely a function of the manipulation that we carried out. How should we think about this quantity? I would think that a reasonably informed lay person would interpret the difference in race times in a causal manner, but what, precisely, are we estimating and how should we talk about it? I'd love to hear any suggestions, particularly since it relates to a project that I've been working on (and might have more to say about in a few weeks).
April 16, 2007
This week, the Applied Statistics Workshop will present a talk by Skyler Cranmer, a Ph.D. candidate in the Department of Political Science at the University of California - Davis and a visiting scholar at IQSS. He earned a BA in Criminal Justice and an MA in International Relations before starting the program at Davis. His research interests in political methodology include statistical computing, missing data problems, and formal theory.
Skyler will present a talk entitled "Hot Deck Imputation for Discrete Data." The paper is available from the course website. The presentation will be at noon on Wednesday, April 18 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided. An abstract follows on the jump:
Hot Deck Imputation for Discrete Data
Skyler J. Cranmer
In this paper, I develop a technique for imputing missing observations in discrete data. The technique used is a variant of hot deck imputation called fractional hot deck imputation. Because the imputed value is a draw from the conditional distribution of the variable with the missing observation, the discrete nature of the variable is maintained as its missing values are imputed. I introduce a discrete weighting system to the fractional hot deck imputation method. I weight imputed values by the fraction of the original weight of the missing element assigned to the value of the donor observation based on its degree of affinity with the incomplete observation and am thus able to make confidence statements about imputed results; hot decking in the past has been limited by the inability to make such confidence statements.
April 9, 2007
This week, the Applied Statistics Workshop will present a talk by Gary King, the David Florence Professor of Government at Harvard and the Director of the Institute for Quantitative Social Science. He has published over 100 articles, and his work has appeared journals in public heath, law, sociology, and statistics, as well as in every major journal in political science. He is the author or co-author of seven books, many of which are standards in their field. His research has been recognized with numerous awards, and he is one of the most cited authors in political science. He is also the faculty convenor of this blog.
Professor King will present a talk entitled "How to How to Read 100 Million Blogs (and How to Classify Deaths without Physicians)." The talk is based on two papers, one co-authored with Dan Hopkins and the other with Ying Lu. The presentation will be at noon on Wednesday, April 11 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided. An abstract of the talk and links to the papers follow on the jump:
How to Read 100 Million Blogs (and How to Classify Deaths without Physicians) Gary King We develop a new method of computerized content analysis that gives approximately unbiased and statistically consistent estimates of quantities of theoretical interest to social scientists. With a small subset of documents hand coded into investigator-chosen categories, our approach can give accurate estimates of the proportion of text documents in each category in a larger population. The hand coded subset need not be a random sample, and may differ in dramatic but specific ways from the population. Previous methods require random samples, which are often infeasible in social science text analysis applications; they also attempt to maximize the percent of individual documents correctly classified, a criterion which leaves open the possibility of substantial estimation bias for the aggregate proportions of interest. We also correct, apparently for the first time, for the far less-than-perfect levels of inter-coder reliability that typically characterize human attempts to classify documents, an approach that will normally outperform even population hand coding when that is feasible. We illustrate the effectiveness of this approach by tracking the daily opinions of millions of people about candidates for the 2008 presidential nominations in online blogs, data we introduce and make available with this article. We demonstrate the broad applicability of our approach through additional evaluations in a variety of available corpora from other areas, including large databases of movie reviews and university web sites. We also offer easy-to-use software that implements all methods described.
The methods for a key part of this paper build on King and Lu (2007), which the talk will also briefly cover. This paper offers a new method of estimating cause-specific mortality in areas without medical death certification from "verbal autopsy data" (symptom questionnaires given to caregivers). This method turned out to give estimates considerably better than the existing approaches which included expensive and unreliable physician reviews (where three physicians spend 20 minutes with the answers to the symptom questions from each deceased to decide on the cause of death), expert rule-based algorithms, or model-dependent parametric statistical models.
Copies of the two papers are available at:
It looks like those of us who would like more sophisticated reporting of statistical results in major media outlets have an ally in Byron Calame, the public editor for the New York Times. We've blogged before about his concerns about the Times' coverage of statistical data. This week, he's taking on the ubiquitous Nielsen television ratings, ratings generated from surveys yet never reported with uncertainty estimates. The best paragraph from the piece:
Why not at least tell readers that Nielsen didn’t provide the margin of error for its “estimates”? I put that question to Bruce Headlam, the editor in charge of the Monday business section, where charts of Nielsen’s audience data appear weekly. “If we run a large disclaimer saying, in effect, this company is withholding a critical piece of information, I imagine many readers would simply turn the page,” he wrote in an e-mail.
Imagine that; readers might want their news to be, well, news!
April 6, 2007
We've talked a lot on the blog about good ways of visualizing data. For something a little lighter this Friday, here is one of the more unusual visualizations that I've come across: a time series of real housing prices represented as a roller coaster, which you can 'ride'. It isn't perfect; they need a little ticker that shows you what year you are in, but it is a neat idea. It would be fun to do something similar with presidential approval.
(hat tip: Big Picture)
April 4, 2007
The Cambridge Colloquium on Complexity and Social Networks is sponsoring a talk tomorrow that may be of some interest to readers of this blog. Details below:
"Taking Person, Place, and Time Seriously in Infectious Disease Epidemiology and
Devon D. Brewer, University of Washington
Thursday, April 5, 2007
12:00 - 1:30 p.m.
CGIS North, 1737 Cambridge Street, Room N262
Abstract: Social scientists and field epidemiologists have long appreciated the role of social networks in diffusion processes. The cardinal goal of descriptive epidemiology is to examine "person, place, and time" in relation to the occurrence of disease or other health events. In the last 20 years, most infectious disease epidemiologist have moved away from the field epidemiologistÿÿs understanding of transmission as embedded in contact structures and shaped by temporal and locational factors. Instead, infectious disease epidemiologists have employed research designs that are best suited to studying non-infectious chronic diseases but unable to provide meaningful insight on transmission processes. A comprehensive and contextualized infectious disease epidemiology requires assessment of person (contact structure and individual characteristics), place, and time, together with measurement of specific behaviors, physical settings/fomites, and the molecular biology of pathogens, infected persons, and susceptible persons. In this presentation, I highlight examples of research that include multiple elements of this standard. From this overview, I show in particular how the main routes of HIV transmission in poor countries remain unknown as a consequence of inappropriate design in epidemiologic research. In addition, these examples highlight how diffusion research in the social sciences might be improved with greater attention to temporal and locational factors.
Devon D. Brewer, Ph.D., Director, has broad training and experience in thesocial and health sciences. Much of his past research has focused onsocial networks, research methods and design, memory and cognition, drug abuse, violence, crime, sexual behavior, and infectious disease (including sexually transmitted diseases, HIV, and hepatitis C). He earned his
bachelor's degree in anthropology from the University of Washington and his doctorate in social science from the University of California, Irvine. Prior to founding Interdisciplinary Scientific Research, Dr. Brewer held research positions at the University of Washington, an administrative position with Public Health-Seattle and King County, and teaching positions at the University of Washington, Pacific Lutheran University, and Tulane University. He has been a principal investigator on federal research grants and authored/co-authored more than 60 scientific publications.
April 2, 2007
This week, the Applied Statistics Workshop will present a talk by Richard Berk, professor of criminology and statistics at the University of Pennsylvania. Professor Berk received his Ph.D. from Johns Hopkins University and served on the faculties of Northwestern, UC-Santa Barbara and UCLA before moving to Penn in 2006. He has published widely in journals in statistics and criminology. His research focuses on the application of statistical methods to questions arising in the criminal justice system. One of his current projects is the development and application of statistical learning procedures to anticipate failures on probation or parole and to forecast crime “hot spots” a week in advance.
Professor Berk will present a talk entitled "Counting the Homeless in Los Angeles County," which is based on joint work with Brian Kriegler and Donald Ylvisaker. Their paper is available through the workshop website. The presentation will be at noon on Wednesday, April 2 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided. An abstract of the paper follows on the jump:
Counting the Homeless in Los Angeles County
Department of Criminology
University of Pennsylvania
Over the past two decades, a variety of methods have been used to count the homeless in large metropolitan areas. In this paper, we report on a recent effort to count the homeless in Los Angeles County. A number of complications are discussed including the need to impute homeless counts to areas of the County not sampled and to take the relative costs of underestimates and overestimates of the number of homeless individuals into account. We conclude that despite their imperfections, the estimated counts provided useful and credible information to the stakeholders involved. Of course, not all stakeholders agreed.
Joint work with Brian Kriegler and Donald Ylvisaker.
March 29, 2007
The mayor of New York, Michael Bloomberg, announced today that the city is proceeding with its plan target poverty using cash incentives for school attendance, medical checkups and the like. The first phase of the plan is an experimental test of the efficacy of the incentives. From the NY Times:
Under the program, which is based on a similar effort in Mexico but is believed to be the first of its kind in the nation, families would receive payments every two months for meeting any of 20 or so criteria per individual. The payments would range from perhaps $25 for an elementary school student’s attendance to $300 for greatly improved performance on a standardized test, officials said.
Conceived as an experiment, the program, first announced last fall and set to begin in September, is to serve 2,500 randomly selected families whose progress will be tracked against another 2,500 randomly selected families who will not receive the assistance.
Now, I think most of us in the social science statistical community would be very much in favor of this kind of evaluation. In fact, the degree to which these kinds of designs are becoming the standard for policy evaluation is an impressive change from the way projects were evaluated even twenty years ago. Gary King and several graduate students here at IQSS have been working on the evaluation of a similar project in Mexico involving the roll-out of Seguro Popular, a health insurance scheme for low-income Mexicans.
On the other hand, the political scientist in me wonders if (when?) we are going to start to see pushback from those being experimented on (or, more likely, from the interest groups that purport to represent them). The image of 2,500 families randomly selected to not receive benefits probably doesn't do much to help the cause of people (like me) who would like to see more of this. How can we in the statistical community make these kind of randomized field experiments more palatable (beyond saying, "you need to do this if you want the right answer")?
March 27, 2007
The government released its report on new home sales for the month of February; here is how the story was reported by Reuters (as seen on the New York Times website):
WASHINGTON, March 26 (Reuters) — Sales of new homes unexpectedly fell in February, hitting their lowest level in nearly seven years, according to a report released on Monday. New-home sales slid 3.9 percent, to an annual rate of 848,000 units, the lowest since June 2000, from a downwardly revised pace of 882,000 in January, the Commerce Department said. Sales for November and December were revised down as well.
And here is the Census Bureau press release:
Sales of new one-family houses in February 2007 were at a seasonally adjusted annual rate of 848,000, according to estimates released jointly today by the U.S. Census Bureau and the Department of Housing and Urban Development. This is 3.9 percent (±17.4%)* below the revised January rate of 882,000 and is 18.3 percent (±12.2%) below the February 2006 estimate of 1,038,000.
There are several amazing things about this. First, with all of the resources of the federal government, we can't get better than a 17.4% half-width for a 90% confidence interval? Second, people treat these point estimates like they mean something; the DJIA dropped by about 50 points after this "news" hit the wires. And finally, why can't I get stuff published with confidence intervals that wide?
March 26, 2007
In light of Jim's post below, it is worth pointing out an ongoing conversation at the Northwestern Law Review on ideological change on the Supreme Court. The discussion was prompted by a forthcoming article entitled "Ideological Drift among Supreme Court Justices: Who, When, and How Important?", authored by a who's who of empirical court scholars: Lee Epstein, Andrew Martin, Jeffrey Segal, and our own Kevin Quinn. In addition to their comments on the article, there is a response by Linda Greenhouse, who covers the Supreme Court for the New York Times. (It also got a plug in the Washington Post this morning).
I'm more sympathetic to the project of modelling judicial decisions than I take Jim to be; I think that the ideal point framework gives us a useful way of thinking about the preferences of political actors, including judges. On the other hand, his points about precedent and interference across units are well-taken. Consider the following graph, which appears in the Epstein et al. paper:
It is explained as the estimated probability of a "liberal" vote by Justice O'Connor on two of the key social policy cases decided by the court in the past few years: Lawrence (which struck down Texas' anti-sodomy law) and Grutter (upholding the University of Michigan's law school admissions policy; the undergraduate policy was struck down in Gratz v. Bollinger). I assume that these probabilities were calculated using the posterior distribution of the case parameters in Lawrence and Grutter and combining them with the posterior distribution for O'Connor's ideal points in each year. Fair enough, but what does this actually mean? If Grutter had come before the court in 1985, it would not have been Grutter. I don't say this to be flippant; the University of Michigan used different admissions policies in the 1980s (in fact, when I went to Michigan as an undergrad, I was admitted under a different policy than the procedure struck down in Gratz); Adarand, Hopwood, and related cases would not have been on the books, etc. I just don't see how the implied counterfactual ("What is the probability that O'Conner would cast a liberal vote if Grutter had been decided in year X") makes any sense.
As many of you know, Harvard is on spring break this week, so the Applied Statistics Workshop will not meet. Please join us next Wednesday, April 4, for a presentation by Professor Richard Berk of the University of Pennsylvania. And for those of you at Harvard, enjoy some time off (or at least some time without students!).
March 21, 2007
March 19, 2007
This week, the Applied Statistics Workshop will present a talk by Ken Kleinman, associate professor in the Department of Ambulatory Care and Prevention at the Harvard Medical School. Professor Kleinman received his Sc.D. from the Harvard School of Public Health. He has published widely in journals in medicine and epidemiology. His statistical research centers mainly on methods for clustered and longitudinal repeated measures data. Much of his recent work focuses on spatial surveillance for public health, with a particular interest in applications related to problems in detecting bioterrorism.
Professor Kleinman will present a talk entitled "Statistical issues (and some
solutions) around spatial surveillance for public health." The presentation will be at noon on Wednesday, March 14 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided.
March 16, 2007
As we often say, one of the goals of this blog is to share the conversations that take place around the halls of IQSS. Well, the conversations at the Institute (along with just about every other office in the country) have been heavily slanted toward college basketball this week. As I've posted here before, the relationship between sports and statistics has been both profitable for both sides. And so, in that spirit, here are links to some recent papers on the NCAA Men's Basketball Tournament:
These authors (biostatisticians associated with the University of Minnesota) tackle one of the most important questions surrounding March Madness: how do I maximize my chances of winning the office pool? They find that, in pools that do not reward picking upset, strategies that maximize the expected score in the pool do not necessarily maximize the chances of winning the pool, since these brackets look too much like the brackets of other players. Too late for this year, but maybe you'll get some pointers for next year. For another paper that comes to a similar conclusion, take a look at Optimal Strategies for Sports Betting Pools .
Since it is too late to change your picks for this year, is there a way to tell when you don't need to pay attention anymore because you have no chance of winning? A group of computer scientists from MIT consider this question, and show that the generic problem of determining whether a particular participant has been mathematically eliminated is NP-complete. "Even if a participant were omnipotent in the sense that he could controll the outcome of any remaining games, he still would not be able to efficiently determine whether doing so would allow him to win the pool." Of course, in a finite tournament with a finite number of players in the pool, it is possible to determine who could still win the pool. I haven;t been eliminated yet, but things aren't looking too good.
March 12, 2007
This week, the Applied Statistics Workshop will present a talk by Christopher Zorn, associate professor of political science at the University of South Carolina. Professor Zorn received his Ph.D. from The Ohio State University and was on the faculty at Emory University from 1997 to 2005. He has served as program director for the NSF Program on Law and Social Science. His work has appeared in numerous journals, including the American Political Science Review and Political Analysis. While much of his work has looked at judicial politics in the United States, his interests are broad, extending from "The etiology of public support for the designated hitter rule" (joint with Jeff Gill) to “Agglomerative Clustering of Rankings Data, with an Application to Prison Rodeo Events.”
Professor Zorn will present a talk entitled "Measuring Supreme Court Ideology," which is based on joint work with Greg Caldeira. The slides are available from the workshop website. The presentation will be at noon on Wednesday, March 14 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided.
March 9, 2007
...particularly when the data keeps changing. The ability to replicate results is essential to the scientific enterprise. One of the great benefits of experimental research is that, in principle, we can repeat the experiment and generate a fresh set of data. While this is impossible for many questions in social science, at a minimum one would hope that we could replicate our original results using the same dataset. As many students in Gov 2001 can tell you, however, social science often fails to clear even that low bar.
Of course, even this type of replication is impossible if someone else has changed the dataset since the original analysis was conducted. But that would never happen, right? Maybe not. In an interesting paper, Alexander Ljungqvist, Christopher Malloy, and Felicia Marston take a look at the I/B/E/S dataset of analyst stock recommendations "made" during the period from 1993 to 2000. Here is what they found:
Comparing two snapshots of the entire historical I/B/E/S database of research analyst stock recommendations, taken in 2002 and 2004 but each covering the same time period 1993-2002, we identify tens of thousands of changes which collectively call into question the principle of replicability of empirical research. The changes are of four types: 1) The non-random removal of 19,904 analyst names from historic recommendations (“anonymizations”); 2) the addition of 19,204 new records that were not previously part of the database; 3) the removal of 4,923 records that had been in the data; and 4) alterations to 10,698 historical recommendation levels. In total, we document 54,729 ex post changes to a database originally containing 280,463 observations.
Our main contribution is to document the characteristics and effects of these pervasive changes. The academic literature on analyst stock recommendations, using I/B/E/S data, is truly vast: As of December 12, 2006, Google Scholar identifies 565 articles and working papers using the keywords “I/B/E/S”, “analysts”, and “recommendations”. Given this keen academic interest, as well as the intense scrutiny that research analysts face in the marketplace and the growing popularity of trading strategies based on analyst output, changes to the historical I/B/E/S database are of obvious interest to academics and practitioners alike. We demonstrate that the changes have a significant effect on the distribution of recommendations, both overall and for individual stocks and individual brokerage firms. Equally important, they affect trading signal classifications, back-testing inferences, the track records of individual analysts, and models of analysts’ career outcomes in the years since the changes occurred. Regrettably, none of the changes can easily be “undone” by researchers, which makes replicating extant studies difficult. Our findings thus have potentially important ramifications for existing and future empirical studies of equity analysts.
Not surprisingly, they find that these changes typically make it appear as if analysts were (a) more cautious and (b) more accurate in their predictions. The clear implication from the paper is that analysts and their employers had a vested interest in selectively editing this particular dataset; while I doubt that anyone cares enough about most questions in political science to do something similar, it is an important cautionary tale. The rest of their paper, "Rewriting History," is available from SSRN. (Hat tip: Big Picture)
March 8, 2007
Janet Rosenbaum (guest blogger)
The public often reports being confused by contradictory diet studies, and there is some effort to find the "best diet", but is that the right question to be asking? A study released today in JAMA compared four common diets in 311 overweight/obese women over a 12 month period.
Most weight loss occurred within the first two months, with no visible change for three of the four diets between months 2 and 6. Given the amount of weight that these women could lose, some have commented that the effect sizes seem to be
While I'm of course a fan of randomized controlled trials, I'm not sure that an RCT is answering the most salient question. An RCT is answering the question of how much weight will people lose on average on each diet. While understanding average behavior may have implications for our understanding of human biology, in practice the most important question for an overweight person and their health care provider is which diet will be best for them, given their assessment for why they are overweight, which diets have worked for them in the past, and their personal tastes.
People may differ substantially across these factors. Someone who eats 100 calories too much at every meal may need to employ different strategies than someone who eats a 500 calorie snack every other day, even though they have the same calorie surplus. Likewise, someone with a tendency to eat too much of a given food category needs to know whether moderation or total abstinence is the best long term strategy. My sense of the research is that there is quite a lot of psychological research on strategies for good short term outcomes, but no RCTs focus on the medical questions of long term outcomes.
Weight loss plans employ different strategies --- for instance, Weight Watchers tries for moderation, while Atkins advocates abstinence --- but studying the individual plans confounds the question of which strategies are best with other characteristics across which the plans differ, and it averages effects over groups of individuals with heterogeneous reasons for their overweight.
It seems to me that weight loss research needs to determine if there are in fact distinct groups of overweight, and focus studies more narrowly on these groups.
Studying more homogeneous groups on a more limited set of questions would answer the questions that are most relevant for clinicians and individuals, although it would be more expensive.
March 5, 2007
This week, the Applied Statistics Workshop will present a talk by Anna Mikusheva, a Ph.D. candidate in the Economics Department at Harvard. Before joining the graduate program at Harvard, she received a Ph.D. in mathematics from Moscow State University. She will present a talk entitled "Uniform inferences in autoregressive processes." The paper is available from the workshop website. The presentation will be at noon on Wednesday, March 7 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided. An abstract of the paper follows on the jump:
UNIFORM INFERENCE IN AUTOREGRESSIVE MODELS
The purpose of this paper is to provide theoretical justification for some existing methods
of constructing confidence intervals for the sum of coefficients in autoregressive models.
We show that the methods of Stock (1991), Andrews (1993), and Hansen (1999) provide
asymptotically valid confidence intervals, whereas the subsampling method of Romano and
Wolf (2001) does not. In addition, we generalize the three valid methods to a larger class
of statistics. We also clarify the difference between uniform and point-wise asymptotic
approximations, and show that a point-wise convergence of coverage probabilities for all
values of the parameter does not guarantee the validity of the confidence set.
February 26, 2007
This week, the Applied Statistics Workshop will present a talk by Donald Rubin, the John Loeb Professor of Statistics at Harvard. Professor Rubin has published widely on numerous topics in statistics, and is perhaps best known for his work on missing data and causal inference. His articles have appeared in over thirty journals, and he is the author or co-author of several books on missing data, causal inference, and Bayesian data analysis, many of which are the standards in their fields. In 1995, Professor Rubin received the Samuel S. Wilks Memorial Award from the American Statistical Association.
Professor Rubin will present a talk entitled "Principal Stratification for Causal Inference with Extended Partial Compliance," which is based on joint work with Hui Jin. Their paper is available from the workshop website. The presentation will be at noon on Wednesday, February 28 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided. An abstract of the paper follows on the jump:
Principal Stratification for Causal Inference with Extended Partial Compliance
Hui Jin and Donald B. Rubin
Many double-blind placebo-controlled randomized experiments with active drugs suffer from complications beyond simple noncompliance. First, the compliance with assigned dose is often partial, with patients taking only part of the assigned dose, whether active or placebo. Second, the blinding may be imperfect in the sense that there may be detectable positive or negative side-effects of the active drug, and consequently, simple compliance has to be extended to allow different compliances to active drug and placebo. Efron and Feldman (1991) presented an analysis of such a situation and discussed inference for dose-response from the non-randomized data in the active treatment arm, which stimulated active discussion, including concerning the role of the intention-to-treat principle in such studies. Here, we formulate the problem within the principal stratification framework of Frangakis and Rubin (2002), which adheres to the intention-to-treat principle, and we present a new analysis of the Efron-Feldman data within this framework. Moreover, we describe precise assumptions under which dose-response can be inferred from such non-randomized data, which seem debatable in the setting of this example. Although this article only deals in detail with the specific Efron-Feldman data, the same framework can be applied to various circumstances in both natural science and social science.
February 19, 2007
This week, the Applied Statistics Workshop will present a talk by Dan Hopkins, a Ph.D. candidate at in the Government Department at Harvard. Dan has a long-standing association with Harvard, having graduated from the College in 2000. His research focuses on political behavior, state and local politics, and political methodology. His work has appeared in the American Political Science Review. He will present a talk entitled "Flooded Communities: Estimating the Post-Katrina Migration's Impact on Attitudes towards the Poor and African Americans." The paper is available from the workshop website. The presentation will be at noon on Wednesday, February 21 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided. An abstract of the paper follows on the jump:
Flooded Communities: Estimating the Post-Katrina Migration's Impact on Attitudes towards the Poor and African Americans
This paper uses the post-Katrina migration as a quasi-experiment to confront concerns of selection bias and measurement error that have long plagued research on environmental effects. Drawing primarily on a phone survey of 3,879 respondents, it demonstrates that despite the attention to issues of race and poverty following Hurricane Katrina, people in communities
that took in evacuees actually became less supportive of the poor, of African Americans, and of policies to help those groups. The patterns uncovered suggest that the key mechanism is not direct contact, physical proximity, or persuasion by local elites. Instead, the empirical observations accord with a new theory of environmental effects emphasizing the interaction of changing demographics and the media environment. Under the theory of politicized change, sudden changes in local demographics make demographics salient to local residents. Media coverage can convey information about these shifts and can also frame people's thinking on issues related to them.
Gary Langer, the director of polling for ABC News, has posted an interesting piece on some recent coverage (or mis-coverage) of social science and medical research. One of his targets is an article that appeared on the front page of the New York Times announcing that "51% of Women Are Now Living Without Spouse." I had heard a lot about this particular story, not least because one of my colleagues has it posted on the bulletin board in our office. As it turns out, the magic 51% number was obtained by including women aged 15-17 in the data, something that was not particularly transparent in the article. So, while there is nothing necessarily wrong with the data itself, it is not clear that these are the numbers that you should be looking at (unless you are concerned about the national epidemic of unwed teenagers living with their parents).
In addition to leading the polling unit, Langer serves as kind of a "statistical watchdog" for ABC News. He was on a panel here at IQSS about a year ago and told some great stories about the amount of garbage that crosses their desks on a regular basis. It would be nice if all of the major news organizations had similar arrangements in place to vet their coverage of statistical reportage. (Hat tip: Mystery Pollster)
February 16, 2007
The Initiative in Innovative Computing, an interdisciplinary program that aims to "foster the creative use of computational resources to address issues at the forefront of data-intensive science," is hosting a talk by Edward Tufte next week. It is easy to forget that Tufte began his career as a political scientist, long before he became known for his work on the visual representation of evidence. His 1975 article on "Determinants of the Outcomes of Midterm Elections" is one of the 20 most-cited articles published in the first 100 years of the APSR. I don't know that I would want to leave political science in the way that Tufte did, but having a job entitled "Senior Critic" sounds like a lot of fun. The details of the talk follow:
February 21, 2007; 7:00pm
Biolabs Room 1068, 16 Divinity Avenue
Edward Tufte, Professor Emeritus of Political Science, Statistics, and Computer Science, and Senior Critic in the School of Art at Yale
An Academic and Otherwise Life, An N = 1
Edward Tufte will talk about his education and careers in statistics, political economy, analytical design, landscape sculpture, book publishing, and consulting. A question session will follow the talk
Edward Tufte's most recent book is Beautiful Evidence. He taught at Princeton and Yale for 32 years, and is Professor Emeritus of Political Science, Statistics, and Computer Science, and Senior Critic in the School of Art at Yale.
February 12, 2007
This week, the Applied Statistics Workshop will present a talk by Jens Hainmueller, a Ph.D. candidate at in the Government Department at Harvard. Prior to joining the department, he received degrees from the London School of Economics and the Kennedy School of Government. His work has appeared in International Organization and the Journal of Legislative Studies. He will present a talk entitled "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program." This talk is based on joint work with Alberto Abadie and Alexis Diamond; their paper and supporting software are available from the workshop website. The presentation will be at noon on Wednesday, February 14 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided. An abstract of the paper follows on the jump:
Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program
Alberto Abadie – Harvard University and NBER
Alexis Diamond – Harvard University
Jens Hainmueller – Harvard University
Building on an idea in Abadie and Gardeazabal (2003), this article investigates
the application of synthetic control methods to comparative case studies.
We discuss the advantages of these methods and apply them to study the effects
of Proposition 99, a large-scale tobacco control program that California
implemented in 1988. We demonstrate that following Proposition 99 tobacco
consumption fell markedly in California relative to a comparable synthetic control
region. We estimate that by the year 2000 annual per-capita cigarette
sales in California were about 26 packs lower than what they would have been
in the absence of Proposition 99. Given that many policy interventions and
events of interest in social sciences take place at an aggregate level (countries,
regions, cities, etc.) and affect a small number of aggregate units, the potential
applicability of synthetic control methods to comparative case studies is very
large, especially in situations where traditional regression methods are not appropriate.
The methods proposed in this article produce informative inference
regardless of the number of available comparison units, the number of available
time periods, and whether the data are individual (micro) or aggregate (macro).
Software to compute the estimators proposed in this article is available at the
February 5, 2007
This week, the Applied Statistics Workshop will present a talk by Jim Greiner, a Ph.D. candidate in the Statistics Department. The talk is entitled "Potential Outcomes and Immutable Characteristics," and is based on joint work with Don Rubin from the Statistics Department. An abstract of the talk follows on the jump.
Jim graduated with a B.A. in Government from the University of Virginia in 1991 and then received a J.D. from the University of Michigan Law School in 1995. He clerked for Judge Patrick Higginbotham on the U.S. Court of Appeals for the Fifth Circuit and was a practicing lawyer in the Justice Department and private practice before joining the Statistics Department here at Harvard. His research interests focus on causal inference and ecological inference, particularly as they relate to issues arising in the legal process. He is also the former chair of and a current contributor to this blog.
The Applied Statistics Workshop will meet in Room N354 in the CGIS Knafel Building (next to the Design School) at 12:00 on Wednesday, February 7th. Everyone is welcome, and lunch is provided. We hope to see you there!
Potential Outcomes And Immutable Characteristics
D. James Greiner & Donald B. Rubin
In the United States legal system, various directives attempt to reduce the relevance of "immutable characteristics" (e.g., race, sex) in specified societal settings (e.g., employment, voting, capital punishment). Typically, the directive is phrased in terms of a prohibition on action taken "because of" or "on account of" a prohibited trait, suggesting some kind of causal inquiry. Some researchers, however, have suggested that causal reasoning is inappropriate in such settings because immutable characteristics cannot be manipulated or randomized. We demonstrate that a shift in focus from "actual" characteristics to perceptions of traits allows application of the potential outcomes framework of causation to some (but not all) civil rights concerns. We articulate assumptions necessary for such an application to produce well-posed questions and believable answers. To demonstrate the principles we discuss, we reanalyze data from one of the most famous empirical studies in the law, the so-called "Baldus Study" of the role of race in the administration of capital punishment in Georgia.
January 29, 2007
The Applied Statistics Workshop resumes this week with a talk by Holger Lutz Kern, a Ph.D. candidate at Cornell University currently visiting at Harvard. His research focuses on comparative political economy and behavior with a focus on causal inference from observational data. His work has appeared in the Journal of Legislative Studies. He will present a talk entitled "The Effect of Free Foreign Media on the Stability of Authoritarian Regimes: Evidence from a Natural Experiment in Communist East Germany." The talk is based on joint work with Jens Hainmueller. The presentation will be at noon on Wednesday, January 31 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided. An abstract of the talk follows on the jump:
A common claim in the democratization literature is that free foreign media undermine authoritarian rule. In this case study, we exploit a natural experiment to estimate the causal effect of exposure to West German television on East Germans' political attitudes. While one could receive West German television in most parts of East Germany, residents of Dresden were cut off from West German television due to Dresden's location in the Elbe valley. Using an ordered probit LARF instrumental variable estimator and survey data collected in 1988/9, we find that East Germans who watched West German television were *more* satisfied with life in East Germany and the communist regime. To explain this surprising finding, we demonstrate that East Germans consumed West German television primarily for its entertainment value and not because of its unbiased news reporting. Archival data on the internal debates of the East German regime corroborate our argument.
January 22, 2007
... it may extend your life by up to two years, according to a new paper by Matthew Rablen and Andrew Oswald from the University of Warwick, as reported in this week's Economist. They suggest that the increase in status associated with winning a Nobel Prize increases longevity compared to those who are nominated but never win. As the authors note, looking at Nobel nominees and laureates presents some problems because nominees have to be alive at the time of their nomination (and, since 1971, have to remain alive until the prize is awarded). This implies that living longer increases your chances of winning the prize in the first place.
One way that the authors try to deal with this problem is by matching Nobel winners to nominees who were nominated at the same age but never won. Now we obviously like matching here at Harvard, but my sense is that this doesn't quite take care of the problem. By dividing the groups into "winners" and "never winners", you still have the problem that some of the "never winners" stay in that category because they don't live long enough to be recognized with a prize. It seems to me that a better approach would be to compare winners to individuals who were unsuccessful nominees at the same age, whether or not they went on to win a Nobel later in life. I think is closer to the actual treatment, which is not "win or don't win", but rather "win now or stay in the pool." My guess is that this comparison would reduce the matching-based estimate of the increase in lifespan.
On the other hand, there doesn't appear to be any evidence that winning a Nobel shortens your lifespan, so tell your friends that they should go ahead and nominate you (unless you agree with Andrew Gelman on this...).
January 16, 2007
The Applied Statistics Workshop will resume for the spring semester on January 31, 2007. We will continue to meet in the CGIS Knafel Building, Room N354 on the third floor at noon on Wednesdays. The Workshop has a new website that has the tentative schedule posted for the semester. We will be moving the archives of papers from the previous semesters to the new site in the coming weeks, so you can track down your favorite talks from years past. As a preview of what's to come, here are the names and affiliations of some of the speakers presenting in the next month:
Holger Lutz Kern
Department of Government
Department of Statistics
Alberto Abadie, Alexis Diamond, and Jens Hainmueller
Kennedy School of Government and Department of Government
Department of Government
December 19, 2006
It's that time of year again; there is snow on the ground, a fire in the hearth, classes at Harvard end today, and things at IQSS are settling down for a brief winter's nap (at least that is the way I imagine it - the fluorescent lights in my office ensure that seasons have no meaning), so posts to the blog will be irregular for the next couple of weeks. We'll be back to our regularly scheduled programming in early January, but in the meantime, Happy New Year!
December 15, 2006
News that two clinical trials in Africa have been halted because the preliminary results were so strong that it was considered unethical to continue them has received major play in the media (New York Times, Guardian, Washington Post). The reason: the experimental treatment was male circumcision and the outcome of interest was the risk of female-to-male transmission of HIV. This is a topic that has been discussed previously in the Applied Statistics workshop (see posts here and here). The two studies suggest that circumcision reduces the probability of transmission by about 50%, which is similar to an earlier randomized trial in South Africa (and, it should be noted, the estimated effect is also consistent with the results from a number of observational studies, see this NIAID FAQ for more details on the studies). In short, the evidence seems overwhelming at this point that, from a biomedical perspective, circumcision is effective at reducing transmission.
Is the same true from a policy perspective? In other words, would a policy promoting circumcision reduce the number of new HIV cases? The answer to that question is much less obvious, the concern being that the men who were circumcised would engage in riskier behavior given their newfound knowledge. This is a classic moral hazard problem; the people implementing the policy cannot control the actions taken by the treated individuals. Indeed, the researchers behind the study were falling all over themselves to emphasize the need for continued prevention measures. Despite this, it seems likely to me that one of the effects of the study (as opposed to the effect of the treatment) is going to be an increase in HIV transmission, at least at the margin, among the male subpopulation that is already circumcised.
This study thus highlights a couple of issues that face us as social science. First, the scientific quantity of interest (does circumcision reduce the risk of HIV transmission) need not be, and often isn't, the policy quantity of interest (will circumcision reduce the number of new HIV cases). Second, unlike our colleagues in the natural sciences, we do have to worry that the behavior of our subjects (broadly defined) will be influenced by the results of our research. A biologist doesn't have to worry that the dolphins she is studying are reading Marine Mammal Science (although to the extent that the modal number of times that a political science article is cited is still zero, we may not have to worry about our subjects reading the results of our research either!). From my perspective, the possibility of feedback - that behavior will change in response to research, in ways that could either reinforce or mitigate the conclusions that we draw - is one of the key characteristics that distinguish the social sciences from the natural sciences, a distinction that seems underappreciated and that makes our jobs as researchers substantially harder.
November 28, 2006
The Harvard Program on Survey Research is hosting a talk by Mark Blumenthal (aka the Mystery Pollster):
December 1, 2006
3:00 - 5:00 p.m. with reception to follow
December 1, 2006, Mark Blumenthal will join us to discuss surveys and polls in the 2006 elections. Blumenthal is the founder of the influential blog and website MysteryPollster.com, and one of the developers of the more recent website Pollster.com. His analysis of political polling and survey methodology is widely read and admired. Blumenthal has more than 20 years experience as a survey researcher, conducting and analyzing political polls and focus groups for Democratic candidates and market research surveys for major corporations. His experience includes work with pollsters Harrison Hickman, Paul Maslin, Kirk Brown, Celinda Lake, Stan Greenberg and the last 15 with his partners David Petts and Anna Bennett in the firm Bennett, Petts and Blumenthal (BPB).
1737 Cambridge St.
Cambridge, MA 02138
November 17, 2006
The Economist is agog over the increasing prominence of instrumental variables in econometrics ("Winds of Change", November 4, 2006). While it is always nice to get some square inches in a publication with a circulation greater than a few thousand, I'm afraid that I tend to sympathize more with the "instrument police" than the "instrumentalists."
For a variable to be a valid instrument, it must be (a) correlated with the variable for which we are trying to estimate a causal effect, and (b) only affect the outcome through the proposed causal variable, such that an exclusion restriction is satisfied. This is true for every estimation in which a proposed instrument is used; one must make a separate case for the validity of the exclusion restriction with respect to each analysis. Leaving aside what should be the second-order problem of actually carrying out an IV analysis, which may be a first-order problem in practice ("what do you mean it has no mean?"), our inability to verify the exclusion restriction in the case of naturally occuring instruments forces us to move from the substance of the problem we are trying to investigate to a duel of "just-so stories" for or against the restriction, a debate that typically cannot be resolved by looking at the empirical evidence.
Consider the two papers desribed in the Economist article. The first attempts to estimate the effect of colonialism on current economic outcomes. The authors propose wind speed and direction as an instrument for colonization, arguing (plausibly) that Europeans were more likely to colonize an island if they were more likely to encounter it while sailing. So far so good. Then they argue that, while colonization in the past has an effect on economic outcomes in the present, being situated in a location favorable for sailing in the past (i.e., before steam-powered ships) does not. Is this really plausible? The authors think so, I don't, and it isn't obvious that there is a way to resolve the matter. In the second example, the failure of ruling dynasties to produce an heir in Indian princely states is used as an instrument for the imposition of direct rule by the British. Here the exclusion restriction may be more plausible (or - shameless plug - maybe not, if it is the shift from a hereditary to a non-hereditary regime rather than colonialism per se that affects outcomes). One way or the other, is this really what we should be arguing about?
None of this is to say that instrumental variable models can never be useful. When we can be more confident that the exclusion restriction is satisfied (usually because we designed the instrument ourselves), then IV approaches make a lot of sense. Unfortunately (or fortunately), we can't go back and randomly assign island discoveries using something like a coin flip rather than the trade winds. Despite this, nothing seems to slow down the pursuit of more and more tortured instruments. The observation that "the instrumental variable now enjoys an almost imperial grip on the imagination of economists" carries more irony that was perhaps intended.
November 7, 2006
As everyone must know (unless you are lucky enough to not own a television), today is Election Day in the US. I always think of analyzing elections (and pre-election polling) as the quintessential statistical problem in political science, so I'm sure that many of us are eagerly waiting to get our hands of the results. Recent elections in the U.S. have been somewhat controversial, to say the least, which is probably bad for the country but unquestionably good for the discipline (see the Caltech/MIT Voting Technology Project for one example), and my guess is that this election will continue the trend. Law professor Rick Hasen of electionlawblog.org sets the threat level for post-election litigation at orange; anyone looking for an interesting applied statistics project would be well advised to check out his site in the coming weeks. In the meantime, the Mystery Pollster (Mark Blumenthal) has an interesting post on the exit polling strategy for today's election; apparently we shouldn't expect preliminary and incomplete results to be leaked until 5pm this year.
October 27, 2006
Here's a question (alright, a bleg) for any economist-types out there: can you recommend any articles or books that integrate the potential outcomes framework for causal inference with the type of equilibrium analysis that is usually used in microeconomic modeling? I'm not exactly looking for cases where someone says "my comparative statics say the effect should be positive and, voila, it is!", but rather an applied article in which the potential outcomes arise naturally from the structure of the model. Or, even better, something more philosophical that attempts to integrate the potential outcomes approach with equilibriumist models of behavior. A quick Google Scholar search on "potential outcomes", "causal inference", and "equilibrium" only bring up about 80 hits, many of which appear to be by James Heckman, so any pointers to more sympathetic treatments would be appreciated!
October 20, 2006
With the World Series about to get underway, featuring the rubber match between the Detroit Tigers and the St. Louis Cardinals (Round 1 went to the Cardinals in 1934, Round 2 to the Tigers in 1968, but maybe this is a best of five and we won't see the end until 2076), it is worth reflecting on the influence baseball has had on statistics and vice versa. I mentioned Frederick Mosteller's analysis after the 1946 World Series in a previous post, but many statisticians share his interest in baseball. Dozens of baseball-related articles have appeared in statistical journals over the years, attempting to answer substantive questions ("Did Shoeless Joe Jackson throw the 1919 World Series?") or to motivate statistical techinques ("Parametric Empirical Bayes Inference: Theory and Applications", with application to Ty Cobb's batting average). Within political science, more than one methodologist has told me about the hours that they spent tracking batting averages and OBP's when they were growing up (OK, so it may have been cricket in a few cases). Going in the other direction, there is no question that the Moneyball approach to baseball has been enormously influential, even if the jury is still out about its implications for the post-season. As Harry Reasoner once said, "Statistics are to baseball what a flaky crust is to Mom's apple pie." To which I can only add, Go Tigers!
October 5, 2006
People who read this blog regularly know that few things get authors and commentators as worked up as questions about causal inference, either philosophical (here, here, and here) or technical (here, here, here, etc.). I wouldn't want to miss out on the fun this time around -- and how could I pass up the opportunity to have the IV post on causation and manipulation?
Jens and Felix have both discussed whether non-manipulable characteristics such as race or gender ("attributes" for Holland) can be considered causes within the potential outcomes framework. I agree with them that, at least as far as Holland is concerned, the answer is (almost always) no - no causation without manipulation. The fact that we are having this discussion 20 years later suggests (to me, at least) that this answer is intuitively unsatisfying. It is worth remembering a comment made by Clark Glymour in his discussion of the Holland (1986) article:
People talk as they will, and if they talk in a way that does not fit some piece of philosophical analysis and seem to understand each other well enough when they do, then there is something going on that the analysis has not caught.
Identifying perceptions of an attribute (rather than the attribute itself) as the factor subject to manipulation makes a lot of sense in situations where the potential outcomes are to a certain degree out of the control of the individual possessing the attribute, as in the discrimination example. Extending this idea to situations in which outcomes are generated by the subject possessing the attribute (in which "self-perceptions" would be manipulated) would commit researchers to a very particular understanding of attributes such as race and gender that would hardly be uncontroversial.
In these cases, I think that it makes more sense to look at the differences in well-specified Rubin-Holland causal effects (i.e. the results of manipulation) conditional on values of the attribute rather than identifying a causal effect as such. So, for example, in the gender discrimination example we could think of the manipulation as either applying or not applying for a particular job. This is clearly something that we could randomize, so the causal effect would be well defined. We could calculate the average treatment effect separately for men and women and compare those two quantities, giving us the difference in conditional causal effects. I'm sure that there is a catchy name for this difference out there in the literature, but I haven't run across it.
So, is this quantity (the difference in conditional causal effects) of interest to applied researchers in the social sciences? I would argue that it is, if for nothing else than giving us a more nuanced view of the consequences of something that we can manipulate. Is it a Rubin-Holland causal effect? No, but that is a problem only to the extent that we privilege "causal" over other useful forms of inference.
September 29, 2006
With the 2006 election coming up soon, here are a couple of blogs that might appeal to both the political junkie and the methods geek in all of us. Political Arithmetik , a blog by Charles Franklin from Wisconsin, is full of cool graphs that illustrate the power of simple visualization and non-parametric techniques, something that we spend a lot of time talking about in the introductory methods courses in the Gov Department. (On a side note, I think that the plots like this of presidential approval poll results that you find on his site and others have to be one of the best tools for illustrating sampling variability to students who are new to statistics.) Professor Franklin also contributes to another good polling blog, Mystery Pollster, run by pollster Mark Blumenthal. It just moved to a new site, which now has lots of state-level polling data for upcoming races. All in all, plenty of good stuff to distract you from the "serious" work of arguing about causal inference, etc.
July 30, 2006
C. Frederick Mosteller, the first chairman of the Statistics Department at Harvard, passed away last week at the age of 89. He served as chair of the Statistics Department from 1957 to 1969, and later chaired the departments of Biostatistics and Health Policy and Management at the Harvard School of Public Health. His obituary in the New York Times mentions his work reviewing the performance of pollsters in the Dewey-Truman election of 1948 and his explanation of the Red Sox inexplicable loss in the 1946 World Series ("There should be no confusion here between the 'winning team' and the 'better team'"), but doesn't say that he took a leave of absence in the early sixties to record a lecture series for NBC. According to one history of the Statistics Department, 75,000 students took the course for credit and 1.20 million (give or take) watched the lectures on television. Imagine doing that today....
May 1, 2006
This week the Applied Statistics Workshop will present a talk by Ben Hansen, Assistant Professor of Statistics at the University of Michigan. Professor Hansen received his Ph.D. from the University of California at Berkeley and was an NSF Post-doctoral Fellow before joining the faculty at Michigan in 2003. His research interests include optimal matching and stratification, causal inference in comparative studies, and length-optimal exact confidence procedures. His work has appeared in JASA and the Journal of Computational and Graphical Statistics, among others.
Professor Hansen will present a talk entitled "Matching with prognosis scores: A new method of adjustment for comparative studies." The corresponding paper is available from the course website. The presentation will be at noon on Wednesday, May 3 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided. An abstract of the paper appears on the jump:
In one common route to causal inferences from observational data, the statistician builds a model to predict membership in treatment and control groups from pre-treatment variables, X, in order to obtain propensity scores, reductions f(X) of the covariate possessing certain favorable properties. The prediction of outcomes as a function of covariates, using control observations only, produces an alternate score, the prognosis score, with favorable properties of its own. As with propensity scores, stratification on the prognosis score brings to uncontrolled studies a concrete and desirable form of balance, a balance that is more familiar as an objective of experimental control. In parallel with the propensity score, prognosis scores reduce the dimension of the covariate; yet causal inferences conditional on them are as valid as are inferences conditional only on the unreduced covariate. They suggest themselves in certain studies for which propensity score adjustment is infeasible. Other settings call for a combination of prognosis and propensity scores; as compared to propensity scores alone, the pairing can be expected to reduce both the variance and bias of estimated treatment effects. Why have methodologists largely ignored the prognosis score, at a time of increasing popularity for propensity scores? The answer lies in part with older literature, in which a similar, somewhat atheoretical concept was first celebrated and then found to be flawed. Prognosis scores avoid this flaw, as emerges from theory presented herein.
April 24, 2006
This week the Applied Statistics Workshop will present a talk by Brian Ripley, Professor of Applied Statistics at the University of Oxford. Professor Ripley received his Ph.D. from the University of Cambridge and has been on the faculties of Imperial College, Strathclyde, and Oxford. His current research interests are in pattern recognition and related areas, although he has worked extensively in spatial statistics and simulation, and continues to maintain an interest in those subjects. New statistical methods need good software if they are going to be adopted rapidly, so he maintains an interest in statistical computing. He is the co-author of Modern Applied Statistics with S , currently in its fourth edition. Professor Ripley is also a member of the R core team, which coordinates the R statistical computing project, a widely adopted open-source language for statistical analysis.
Professor Ripley will present a talk entitled "Visualization for classification and clustering." Slides for the talk are available from the course website. The presentation will be at noon on Wednesday, April 26 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.
April 17, 2006
This week the Applied Statistics Workshop will present a talk by Gerard van den Berg, Professor of Labor Economics at the Free University of Amsterdam. Before joining the faculty at Amsterdam in 1996, he worked at Northwestern University, New York University, Stockholm School of Economics, Tilburg University, Groningen University, and INSEE-CREST. From 2001 to 2004, he was Joint Managing Editor of The Economic Journal, and has published in Econometrica, Review of Economic Studies, American Economic Review, and other journals. He is currently a visiting scholar at the Center for Health and Wellbeing at Princeton University. His research interests are in the fields of econometrics, labor economics, and health economics, notably duration analysis, treatment evaluation, and search theory.
Professor van den Berg will present a talk entitled "An Economic Analysis of Exclusion Restrictions for Instrumental Variable Estimation." The paper is available from the course website. The presentation will be at noon on Wednesday, April 19 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.
April 10, 2006
This week the Applied Statistics Workshop will present a talk by Matthew Harding, a Ph.D. candidate in the Department of Economics at MIT. He received his BA from University College London and an M.Phil in economics from the University of Oxford. His work in econometrics focuses on stochastic eigenanalysis with applications to economic forecasting, modeling of belief distributions, and international political economy. He also works on modeling heterogeneity in nonlinear random coefficients models, duration models, and panels.
Matt will present a talk entitled "Evaluating Policy Counterfactuals in Voting Models with Aggregate Heterogeneity." A link to a background paper for the presentation is available from the workshop website. The presentation will be at noon on Wednesday, April 5 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.
April 3, 2006
This week the Applied Statistics Workshop will present a talk by L.J. Wei and Tianxi Cai of the Department of Biostatistics at the Harvard School of Public Health. Professor Wei received his Ph.D. in statistics from the University of Wisconsin at Madison and has served on the faculty of several universities before coming to Harvard in 1991. Professor Cai received her Sc.D. from the Harvard School of Public Health in 1999 and was a faculty member at the University of Washington before returning to HSPH in 2002. Professors Wei and Cai will present a talk entitled "Evaluating Prediction Rules for t-Year Survivors With Censored Regression Models." The presentation will be at noon on Wednesday, April 5 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided. The abstract of the paper follows on the jump:
Suppose that we are interested in establishing simple, but reliable rules for predicting future t-year survivors via censored regression models. In this article, we present inference procedures for evaluating such binary classification rules based on various prediction precision measures quantified by the overall misclassification rate, sensitivity and specificity, and positive and negative predictive values. Specifically, under various working models we derive consistent estimators for the above measures via substitution and cross validation estimation procedures. Furthermore, we provide large sample approximations to the distributions of these nonsmooth estimators without assuming that the working model is correctly specified. Confidence intervals, for example, for the difference of the precision measures between two competing rules can then be constructed. All the proposals are illustrated with two real examples and their finite sample properties are evaluated via a simulation study.
March 22, 2006
Today at noon, the Applied Statistics Workshop will present a talk by Jeff Gill of the Department of Political Science at the University of California at Davis. Professor Gill received his Ph.D from American University and served on the faculty at Cal Poly and the University of Florida before moving to Davis in 2004. His research focuses on the application of Bayesian methods and statistical computing to substantive questions in political science. He is the organizer for this year's Summer Methods Meeting sponsored by the Society for Political Methodology, which will be held at Davis in July. He will be a visiting professor in the Harvard Government Department during the 2006-2007 academic year.
Professor Gill will present a talk entitled "Elicited Priors for Bayesian Model Specifications in Political Science Research." This talk is based on joint work with Lee Walker, who is currently a visiting scholar at IQSS. The presentation will be at noon on Wednesday, March 22 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided. The abstract of the paper follows on the jump:
We explain how to use elicited priors in Bayesian political science research. These are a form of prior information produced by previous knowledge from structured interviews with subjective area experts who
have little or no concern for the statistical aspects of the project. The purpose is to introduce qualitative and area-specific information into an empirical model in a systematic and organized manner in order to produce parsimonious yet realistic implications. Currently, there is no work in political science that articulates elicited priors in a Bayesian specification. We demonstrate the value of the approach by applying elicited priors to a problem in judicial comparative politics using data and elicitations we collected in Nicaragua.
March 13, 2006
This week, the Applied Statistics Workshop will present a talk by Felix Elwert, a Ph.D. candidate in the Harvard Department of Sociology. Felix received a B.A. from the Free University of Berlin and an M.A. in sociology from the New School for Social Research before joining the doctoral program at Harvard. His article on widowhood and race, joint work with Nicholas Christakis, is forthcoming in the American Sociological Review. He is also a fellow blogger on the Social Science Statistics blog. On Wednesday, he will present a talk entitled "Trial Marriage Reconsidered: The effect of cohabitation on divorce". The presentation will be at noon on Wednesday, March 15 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.
March 9, 2006
As you know, Alan Greenspan retired from Fed about a month ago (and already has an $8M book deal, but I digress...). Jens' post below reminded me of one of my favorite Greenspan quotes: "I suspect greater payoffs will come from more data than from more technique." He was speaking to economics about models for forcasting economic growth, but I suspect his comments apply at least as strongly to political science and other social sciences. You might have the most cutting-edge, whiz-bang, TSCS-2SLS-MCMC evolutionary Bayesian beta-beta-beta-binomial model that will tell you the meaning of life and wash your car at the same time, but if the data that you put in is either non-existent or garbage, it isn't going to do you a lot of good. Unfortunately, the incentives in the profession do not seem sufficient to reward the long, tedious efforts required to collect high-quality data and to make it publicly available to the academic community. Most scholars would surely like to have better data; they would just prefer that someone else collect it.
Having said all that, it is worth noting efforts that make data collection and dissemination a more rewarding pursuit. One such effort is the Dataset Award given by the APSA Comparative Politics section for "a publicly available data set that has made an important contribution to the field of comparative politics." This year's request for nominations hits the nail on the head:
The interrelated goals of the award include a concern with encouraging development of high-quality data sets that contribute to the shared base of empirical knowledge in comparative politics; acknowledging the hard work that goes into preparing good data sets; recognizing data sets that have made important substantive contributions to the field of comparative politics; and calling attention to the contribution of scholars who make their data publicly available in a well-documented form.
March 7, 2006
This week, the Applied Statistics Workshop will present a talk by Roland Fryer, a Junior Fellow of Harvard Society of Fellows, resident in the Economics Department. Dr. Fryer received his Ph.D. in economics from The Pennsylvania State University in 2002, and was an NSF post-doctoral fellow before coming to Harvard. His work has appeared in several journals, including the Quarterly Journal of Economics and the Review of Economics and Statistics. Dr. Fryer will present a talk entitled "Measuring the Compactness of Political Districting Plans". The presentation will be at noon on Wednesday, March 8 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.
February 26, 2006
This week, the Applied Statistics Workshop will present a talk by Janet Rosenbaum, a Ph.D. candidate in the Program on Health Policy at Harvard. She majored in physics as an undergraduate at Harvard College and received an AM in statisics last year. Janet will present a talk entitled " Do virginity pledges cause virginity?: Estimating the efficacy of sexual abstinence pledges". She has a publication forthcoming in the American Journal of Public Health on related research. The presentation will be at noon on Wednesday, March 1 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided. The abstract of the paper follows on the jump:
Objectives: To determine the efficacy of virginity pledges in delaying sexual debut for sexually inexperienced adolescents in the National Longitudinal Study of Adolescent Health (Add Health).
Methods: Subjects were virgin respondents without wave 1 pledge who
reported their attitudes towards sexuality and birth control at wave 1
(n=3443). Nearest-neighbor matching within propensity score calipers
was used to match wave 2 virginity pledgers (n=291) with non-pledgers,
based on wave 1 attitudes, demographics, and religiosity. Treatment
effects due to treatment assignment were calculated.
Results (Preliminary): 17% of virginity pledgers are compliant with their pledge, and do not
recant at wave 3 their earlier report of having taken a pledge. Similar
proportions of virginity pledgers and non-pledgers report having had
pre-marital sex (54% and 61%, p=0.16) and test positive for chlamydia
(2.7% and 2.9%, p=0.89).
Conclusions: Five years after taking a virginity pledge, most virginity
pledgers fail to report having pledged. Virginity pledges do not affect
the incidence of self-reported pre-marital sex or assay-determined
February 22, 2006
As I posted the other day, experiments in political science have great potential, but they have some unique risks as well, particularly when the manipulation may change the output of some political process. What happens if your experiment is so successful (in the sense of having a large causal effect) that it changes the outcome of some election? How would you explain such an outcome when you report your results? "The manipulation produced an estimated increase in party A's support of 5000 votes, with a standard error of 250. (Party A's margin of victory was 2000 votes. Sorry about that.)" This seems like a good way to alienate the public once word got out, not to mention your colleagues working with observational data who now have another variable that they have to account for in their studies.
Having said that, I am just an observer in this field, and I'm sure that many people reading this blog have thought a lot more about these issues than I have. So, to continue the conversation, I'd like to propose the following questions:
At what point does an experimental manipulation become so significant that researchers have an obligation to inform subjects that they are, in fact, subjects?
Do researchers have an obligation to design experiments such that the net effect of any particular experimental manipulation on political outcomes is expected to be zero?
Would it be appropriate for a researcher to work consistently with one party on a series of experiments designed to determine what manipulations increase the probability that the party will win elections? Do the researcher's personal preferences matter in this regard?
To what extent are concerns mitigated by the fact that, in general, political actors could conduct these experiments on their own initiative? What if those actors agree to fund the research themselves, as was the case in the 2002 Michigan experiments?
If a university were to fund experimental research that was likely to promote one political outcome over another, would it risk losing its tax-exempt status? This one is for our resident lawyer....
February 20, 2006
This week, the Applied Statistics Workshop will present a talk by Rustam Ibragimov of the Harvard Department of Economics. Professor Ibragimov received a Ph.D. in mathematics from the Institute of Mathematics of Uzbek Academy of Sciences in 1996 and a Ph.D. in economics from Yale University in 2005 before joining the Harvard faculty at the beginning of this academic year. Professor Ibragimov will present a talk entitled " A tale of two tails: peakedness properties in inheritance models of evolutionary theory . The presentation will be at noon on Wednesday, February 22 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided. The abstract of the paper follows on the jump:
In this paper, we study transmission of traits through generations in multifactorial inheritance models with sex- and time-dependent heritability. We further analyze the implications of these models under heavy-tailedness of traits' distributions. Among other results, we show that in the case of a trait (for instance, a medical or behavioral disorder or a phenotype with significant heritability affecting human capital in an economy) with not very thick-tailed initial density, the trait distribution becomes increasingly more peaked, that is, increasingly more concentrated and unequally spread, with time. But these patterns are reversed for traits with sufficiently heavy-tailed initial distributions (e.g., a medical or behavioral disorder for which there is no strongly expressed risk group or a relatively equally distributed ability with significant genetic influence). Such traits' distributions become less peaked over time and increasingly more spread in the population.
In addition, we study the intergenerational transmission of the sex ratio in models of threshold (e.g., polygenic or temperature-dependent) sex determination with long-tailed sex-determining traits. Among other results, we show that if the distribution of the sex determining trait is not very thick-tailed, then several properties of these models are the same as in the case of log-concave densities analyzed by Karlin (1984, 1992). In particular, the excess of males (females) among parents leads to the same pattern for the population of the offspring. Thus, the excess of one sex over the other one accumulates with time and the sex ratio in the total alive population cannot stabilize at the balanced sex ratio value of 1/2. We further show that the above properties are reversed for sufficiently heavy-tailed distributions of sex determining traits. In such settings, the sex ratio of the offspring oscillates around the balanced sex ratio value and an excess of males (females) in the initial period leads to an excess of females (males) offspring next period. Therefore, the sex ratio in the total living population can, in fact, stabilize at 1/2. Interestingly, these results are related, in particular, to the analysis of correlation between human sex ratios and socioeconomic status of parents as well as to the study of the variation of the sex ratio due to parental hormonal levels.
The proof of the results in the paper is based on the general results on majorization properties of heavy-tailed distributions obtained recently in Ibragimov (2004) and several their extensions derived in this work.
February 13, 2006
Applied Statistics - Mike Kellermann
This week, I will be giving the talk at the Applied Statistics Workshop; as they say, turnabout is fair play. The talk is entitled "Estimating Ideal Points in the British House of Commons." I've blogged a bit about this project here. An abstract of the talk appears on the jump:Estimating the policy preferences of individual legislators is important for many studies of legislative and partisan politics. Unfortunately, existing ideal point methods do not perform well when applied to legislatures characterized by strong party discipline and oppositional politics, such as the British House of Commons. This project develops a new approach for estimating the preferences of British legislators, using Early Day Motions as an alternative data source. Early Day Motions are petitions that allow MPs to express their opinions without being bound by party whips. Unlike voting data, however, EDMs do not allow legislators to express opposition to a particular policy. To deal with the differences between voting data and EDMs, I adapt existing Bayesian ideal point models to allow for the possibility (supported in the data) that some Members of Parliament are more likely to sign EDMs than others, regardless of policy content. The estimates obtained have much greater face validity than previous attempts to estimate ideal points in the House of Commons, and have the usual benefits associated with Bayesian ideal point models, including natural estimates of uncertainty and the ability to calculate auxiliary quantities of interest directly from the posterior distribution.
Experimental prudence in political science (Part I)
We've talked a fair bit on the blog about the use of experimental data to make causal inferences. While the inferential benefits of experimental research are clear, experiments raise prudential questions that we rarely face in observational research; they require "manipulation" in more than one sense of that word. As someone who is an interested observer of the experimental literature rather than an active participant, I wonder how well the institutional mechanisms for oversight have adapted to field experimentation in the social sciences in general (and political science in particular). In medical experiments, the ability in principle to obtain informed consent from subjects is critical in determining what is ethically acceptable, but this is often not possible in a political context; external validity may depend on concealing the experimental nature of the manipulation from the "subjects." Moreover, the effects of the manipulation may be large enough to change large-scale political outcomes, thus affecting individuals outside of the nominal pool of subjects.
As an example, consider the turnout experiments I discussed here and here. The large-scale phone experiments in Iowa and Michigan are typical in that they involve non-partisan GOTV (get out the vote) efforts. Treated voters are contacted by phone (or by mail, or in person) and urged to vote, while control voters are not contacted; neither group, as far as I can tell, know that they are experimental subjects. Such a design is possible because the act of voting is a matter of public record, and thus the cooperation of the subjects is not required to obtain the relevant data.
While the effects of such manipulations may provide some insight for political scientists as to the causes of voter turnout, their practical significance is a bit hard to measure; there are not that many genuinely non-partisan groups out there with both the means and the motivation to conduct large-scale voter mobilization efforts. There have been some recent efforts to study partisan voter mobilization strategies using field experiments. David Nickerson, Ryan Friedrichs, and David King have a forthcoming article reporting on an experiment in the 2002 Michigan gubernatorial campaign, in which a youth organization of the Michigan Democratic Party agreed to randomize their partisan GOTV efforts aimed at voters believed to be Democrats or independents. The authors find positive effects for all three of the common GOTV manipulations (direct literature, phone calls, and face-to-face canvassing). In the abstract, obtaining data from manipulations that are clearly relevant in the real world is good for the discipline. I have no doubt that both party activists and party scholars would love to do more such research, but it all makes me slightly uncomfortable. As researchers, should we be in a position where we are (potentially) influencing political outcomes not only through arguments based on the evidence that we collect, but through the process of collecting evidence as well?
February 6, 2006
Applied Statistics - Alexis Diamond
This week, the Applied Statistics Workshop will present a talk by Alexis Diamond, a Ph.D. candidate in Political Economy and Government. The talk is entitled "The Effect of UN Intervention after Civil War." An abstract of the talk appears on the jump:A basic goal of political science is to understand the effects of political institutions on war and peace. Yet the impact of United Nations peacebuilding following civil war remains very much in doubt following King and Zeng (2006), which found that prior conclusions about these causal effects (Doyle and Sambanis 2000) had been based more on indefensible modeling assumptions than evidence. This paper revisits the Doyle and Sambanis causal questions and answers them using new matching-based methods that address issues raised by King and Zeng. The methods are validated for the Doyle and Sambanis data via their application to a dataset with similar features for which the correct answer is known. These new methods do not require assumptions that plagued prior work and are broadly applicable to important inferential problems in political science and beyond. When the methods are applied to the Doyle and Sambanis data, there is a preponderance of evidence to suggest that UN peacebuilding has a positive effect on peace and democracy in the aftermath of civil war.
Another paradox of turnout? (Part II)
Last week I highlighted a new article by Arceneaux, Gerber, and Green that suggests that matching methods have difficulty in replicating the experimentally estimated causal effect of a phone-based voter mobilization effort, given a relatively rich set of covariates and a large control pool from which to draw matches. Matching methods have been touted as producing experiment-like estimates from observational data, so this result is kind of disheartening. How might advocates of matching methods respond to this claim?
Let's assume that the results in the paper hold up to further scrutiny (someone should - and I have no doubt will - put this data through the ringer, although hopefully it won't suffer the fate of the NSW dataset). Why should turnout be problematic? Explaining voter turnout has presented quandaries and paradoxes in other branches of political science, so it is hardly surprising that it mucks up the works here. Turnout has been called "the paradox that ate rational choice," due to the great difficulty in finding a plausible model that can justify turnout on instrumental terms. To my mind, the most reasonable (and least interesting) rational choice models of turnout resort to the psychic benefits of voting or "civic duty" - the infamous "D" term - to account for the fairly solid empirical generalization that some people do, in fact, vote. What, exactly, the "D" term represents is something of a mystery, but it seems reasonable that people who feel a duty to go to the polls are also more likely to listen to a phone call urging them to vote, even conditional on things like age, gender, and voting behavior in the previous two elections.
The authors are somewhat pessimistic about the possibility of detecting such problems when researchers do not have an experimental estimate to benchmark their results (and, hence, when matching or some other technique is actually needed). They ask, "How does one know whether matched observations are balanced in terms of the
unobservedcauses of the dependent variable?" That is indeed the question, but I think that they may be a little too skeptical about the ability to ferret out such problems, especially in this particular context. If the matched data is truly balanced on both the observed and unobserved outcomes, then there should be no difference in expected value of some auxiliary variable (excluded from the matching process) that was observed before the treatment was applied, unless we want to start thinking in terms of reverse temporal causation. The authors could have dropped, say, turnout in 2000 from their matching procedure, matched on the other covariates, and then checked for a difference in the turnout in 2000 between the treatment and control groups in 2002. My guess is that they would find a pretty big difference. Of course, since these matches are not the same as those used in the analysis, any problems that result could be "fixed" by the inclusion of 2000 voter turnout in the matching procedure, but that is putting a lot of weight on one variable.
Even if the prospects for identifying bias due to unobserved covariates are better than Arceneaux, Gerber, and Green suggest, it is not at all apparent that we can do anything about it. In this case, if we knew what "duty" was, we might be able to find covariates that would allow us to satisfy the unconfoundedness constraint. On the other hand, it is not obvious how we would identify those variables from observational studies, since we would likely have similar problems with confoundedness. No one said this was supposed to be easy.
January 31, 2006
Another paradox of turnout? (Part I)
Those of you who have followed this blog know that making reasonable causal inferences from observational data usually presents a huge challenge. Using experimental data where we "know" the right answer, in the spirit of Lalonde (1986), provides one way for researchers to evaluate the performance of their estimators. Last month, Jens posed the question (here and here) "What did (and do we still) learn from the Lalonde dataset?" My own view is that we have beaten the NSW data to death, buried it, dug it back up, and whacked it around like a piñata. While I'm sure that others would disagree, I think that we would all like to see other experiment-based datasets with which to evaluate various methods.
In that light, it is worth mentioning "Comparing experimental and matching methods using a large-scale voter mobilization experiment" by Kevin Arceneaux, Alan Gerber, and Donald Green, which appears in the new issue of
Political Analysis. Much in the spirit of Lalonde's original paper, they base their analysis on a voter turnout experiment in which households were randomly selected to receive non-partisan phone calls encouraging them to vote in the 2002 mid-term elections. This type of mobilization experiment suffers from a classic compliance problem; some voters either don't have phones or refuse to take unsolicited calls. As a result, in order to determine the average causal effect of the treatment on those who would receive it, they need to find a method to compare the compliers who received treatment to compliers in the control group. Since assignment to treatment was randomly assigned, they use assignment as an instrument in the spirit of Angrist, Imbens, and Rubin (1996). Using a 2SLS regression with assignment in the first stage, their estimates of the ATT are close to zero and statistically insignificant. While one might quibble with various choices (why not a Bayesian estimator instead of 2SLS?), it is not obvious that there is a problem with their experimental estimate, which in the spirit of this literature we might call the "truth".
The authors then attempt to replicate their experimental results using both OLS and various matching techniques. In this context, the goal of the matching process is to pick out people who would have listened to the phone call had they been contacted. The authors have a set of covariates on which to match, including age, gender, household size, geographic location, whether the voter was newly registered, and whether the voter turned out in each of the two previous elections. Because the control sample that they have to draw from is very large (almost two million voters), they don't have much difficulty in finding close matches for the treated group based on the covariates in their data. Unfortunately, the matching estimates don't turn out to be very close to the experimental baseline, and in fact are much closer to the plain-vanilla OLS estimates. Their conclusion from this result is that the assumptions necessary for causal inferences under matching (namely, unconfoundedness conditional on the covariates) are not met in this situation, and (at least by my reading) they seem to suggest that it would be difficult to find a dataset that was rich enough in covariates that the assumption would be met.
As a political scientist, I have to say that I like this dataset, because (a) it is not the NSW dataset and (b) it is not derived from a labor market experiment. What do these results mean for matching methods in political science? I'll have some thoughts on that tomorrow.
January 30, 2006
Applied Statistics - Jim Greiner
This week, the Applied Statistics Workshop resumes for the spring term with a talk by Jim Greiner, a Ph.D. candidate in the Statistics Department. The talk is entitled "Ecological Inference in Larger Tables: Bounds, Correlations, Individual-Level Stories, and a More Flexible Model," and is based on joint work with Kevin Quinn from the Government Department. Jim graduated with a B.A. in Government from the University of Virginia in 1991 and then received a J.D. from the University of Michigan Law School in 1995. He clerked for Judge Patrick Higginbotham on the U.S. Court of Appeals for the Fifth Circuit and was a practicing lawyer in the Justice Department and private practice before joining the Statistics Department here at Harvard. As chair of the author's committee, he is a familiar figure to readers of this blog.
As a reminder, the Applied Statistics Workshop meets in Room N354 in the CGIS Knafel Building (next to the Design School) at 12:00 on Wednesdays during the academic term. Everyone is welcome, and lunch is provided. We hope to see you there!