29 November 2010
We hope that you can join us for the Applied Statistics Workshop this Wednesday, December 1st when we will be happy to have Dan Carpenter from the Department of Government. You will find an abstract below. As always, we will serve a light lunch and the talk will begin around 12:15p.
Elections with small margins of victory represent an important form of electoral competition and, increasingly, an opportunity for causal inference. Scholars using regression discontinuity designs (RDD) have interpreted the winners of close elections as randomly separated from the losers, using marginal election results as an experimental assignment of offce-holding to one candidate versus the other. In this paper we suggest that marginal elections may not be as random as RDD analysts suggest. We draw upon the simple intuition that elections that are expected to be close will attract greater campaign expenditures before the election and invite legal challenges and even fraud after the election. We present theoretical models that predict systematic differences between winners and losers, even in elections with the thinnest victory margins. We test predictions of our models on a dataset of all House elections from 1946 to 1990. We demonstrate that candidates whose parties hold structural advantages in their district are systematically more likely to win close elections at a wide range of bandwidths. Our findings call into question the use of close elections for causal inference and demonstrate that marginal elections mask structural advantages that may be troubling normatively. (Co-authored with Justin Grimmer, Eitan Hersh, and Brian Feinstein)
22 November 2010
David Sparks has drawn up some isarithmic maps of the two-party presidential vote over the last 90 years. An isarithmic map is sometimes called a heat map, and you would most often see a rough version of them on your local weather report. David shows us the political weather over time:
As you can see, the votes have been smoothed over geographic space. David also has a video where he smooths across time, leading to very beautiful plots. You should also see the summary of how he made the plots. A good reminder of the death-by-1000-papercuts nature of data analysis:
Using a custom function and the interp function from akima, I created a spatially smoothed image of interpolated partisanship at points other than the county centroids. This resulted in inferred votes over the Gulf of Mexico, the Atlantic and Pacific Oceans, the Great Lakes, Canada and Mexico -- so I had to clip any interpolated points outside of the U.S. border using the very handy pinpoly function from the spatialkernel package.
My only worry is that spatial geography might be the wrong dimension on which to smooth. With weather data, it makes obvious sense to smooth in space. A suburb of Chicago might have more in common with a suburb of Cleveland than it does to Chicago, even though it is much closer to Chicago. Thus, this type of smoothing might understate real, stark differences between local communities (Clatyon Nall has some work on how the interstate highway system has accelerated some of these divides). Basically, I think there is a political space that fails to quite match up to geographic space. (Exploring what that political space looks like and why it occurs would be an interesting research project, at least to me.)
You should really explore the rest of David's site. He has numerous awesome visualizations.
19 November 2010
Not to burst anyone’s bubble here, but if you really think that multiple regression involves assumptions that are too much for the average political scientist, what do you think is going to happen with topological clustering algorithms, neural networks, and the rest??
Gelman is responding to two of Schrodt’s seven sins: (1) kitchen-sink regressions with baseless control sets and (2) dependence on linear models at the exclusion of other statistical models. I think that Gelman misinterprets Schrodt’s criticism here. It is not that political scientists somehow lack the ability to comprehend multiple regression and its assumptions. It is that political scientists are being lazy intellectually (possibly incentivized by the discipline!) and fail to critically examine their analysis or their methods. It’s a failure of standards and work, not a failure of intellect. Thus, I fail to see the contradiction in Schrodt’s advice or his condemnation—-it’s a call to thinking more about our data and they fit with our models and their assumptions. Now, one may think that this is beyond the abilities of folks, but I fail to see that argument being made by Schrodt (and I am certainly not making it).
Gelman himself often calls for simplicity:
I find myself telling people to go simple, simple, simple. When someone gives me their regression coefficient I ask for the average, when someone gives me the average I ask for a scatterplot, when someone gives me a scatterplot I ask them to carefully describe one data point, please.
This seems more about presentation of results or a failure to know the data. There is a huge challenge in more complicated models since they often require more care and attention to how we present the results. All of the techniques Gelman describes should be essential parts of the data analysis endeavor. That people fail to do these simple tasks speaks more to Schrodt’s accusation of “intellectual sloth” than anything.
Finally, we can probably all get behind a couple of commandments:
I see both Gelman and Schrodt making these points, albiet differently. While Schrodt sees a violation of 2 as primarily due to intellectual laziness, Gelman see it as primarily due to intellectual handicaps. Both are slightly unfair to academics, but sloth is at least curable.
18 November 2010
John Salvatier has a blog post on the future of MCMC algorithms, focusing on differential methods, which use derivatives of the posterior to inform where the algorithm should move next. This allows for greater step length, faster convergence, and better handling of multimodal posteriors. Gelman agrees with the direction. There has been some recent work on implementing automatic differentiation in R, which is the cornerstone of the algorithms Salvatier discusses. Perhaps we will see this moving into some of the more popular MCMC packages soon.
On a slightly different Bayes front, SSS-pal and former blogger Justin Grimmer has a paper on variational approximation, which is a method for deterministically approximating posteriors. This approach is often useful when MCMC is extremely slow or impossible, since convergence under VA is both fast and guaranteed.
15 November 2010
We hope that you can join us for the Applied Statistics Workshop this Wednesday, November 17th when we will be happy to have Matevž Raškovič from the University of Ljubljana,who is currently a Visiting Fellow in the Sociology Department. You will find an abstract below. As always, we will serve a light lunch and the talk will begin around 12:15p in CGIS K354 (1737 Cambridge St).
This talk is part of an ongoing PhD research taking place at the Faculty of Economics and Faculty of Social Sciences at the University of Ljubljana in Slovenia, and the Technical University Eindhoven in the Netherlands. The research is motivated by literature on the different 'mentalities' of international companies and the fact that transnational companies, as a unique type of an international company mentality, are increasingly being understood as communities and spaces of social relationships. The research explores the management of supplier-buyer relationships within Danfoss, Denmark's biggest industrial organization. In particular, it looks at how Danfoss manages supplier-buyer relationships to be both globally efficient and flexible, while at the same time facilitating learning. Balancing all these three strategic goals presents a considerable challenge for all internationally-active companies. The talk will offer a short theoretical background for the research and focus on presenting the multi-method mixed research design built around two separate two-mode egocentric cognitive social networks.
11 November 2010
According to a working paper by Greg Kaplan and Sam Schulhofer-Wohl at the Federal Reserve Bank of Minneapolis, recent estimated declines in interstate migration are simply artifacts of the imputation procedure used by the Census Bureau.
The bureau uses a “hot-deck” imputation procedure to match respondents who fail to respond (called recipients) to those who actually do respond (called donors) and impute the recipient’s missing values with the donor’s observed values. For migration, the crucial questions are where the respondent lived one year ago. Before 2006, they effectively did not match on current location, even though current location is a strong predictor of past location. In 2006, they switched:
Using the most recently processed respondent as the donor to impute missing answers means that the order of processing can aect the results. Since 2006, respondents have been processed in geographic order. This ordering means that the donor usually lives near the recipient. Since long-distance migration is rare, the donor’s location one year ago is also usually close to the recipient’s current location. Thus, if the procedure imputes that the recipient moved, it usually imputes a local move. Before 2006, the order of processing was geographic but within particular samples. Therefore, on average, donors lived farther from recipients; donors’ locations one year ago were also on average farther from recipients’ current locations; and recipients were more likely to have imputed interstate moves.
9 November 2010
A recent New York Times piece by Nicholas Wade makes the point that research is an extremely risky enterprise with far more failures than successes.
"Nature yields her secrets with the greatest unwillingness, and in basic research most experiments contribute little to further progress, as judged by the rarity with which most scientific reports are cited by others."
In political science, it seems like most of this risk gets passed on directly to the researcher with possibly detrimental effects on the way we do research. In theory, if a project is a dead end, we should probably just walk away. In practice, however, projects can become "too big to fail". My sense is that the need to squeeze something out of a research project leads to a lot of poor statistical practice -- specification searches for something that is "significant", overly optimistic claims about causal identification, and other shady dealings. On the other hand, attempts to mitigate this risk by avoiding the cost of large data collection projects typically mean that we keep running model after model on the same 5 datasets that everyone else is using.
Is the inherent riskiness of research at the root of these problems? How do you manage these risks?