16 March 2010
Last Halloween, I alerted readers of the social science statistics blog to cutting edge research suggesting that if zombies attacked, humans faced serious risk of extinction.
It turns out that some of these conclusions may have been premature. Some recent research by Blake Messer suggests that if there is terrain that favors humans in some way, then humans may have a better shot at survival.
But it doesn't end there.
UCLA's Gabriel Rossman points out that Messer's model doesn't account for the possibility of human stupidity/sabotage (always a good thing to include in our models, I guess). Rossman's findings suggest that in the face of a zombie onslaught, small islands of weapons stockpiles might be more favorable for the long-term survival of the human race than a single cache -- perhaps the most important policy implication to come out of this renewed debate.
I think future research in this area will be worth following. First, I hear there is interesting work afoot on the spread of zombification through social networks, although getting the zombies to accurately report who bit who can be difficult. I've also heard rumors of some machine learning research that attempts to classify zombie speech (early results suggest that there is only one category: "BRAINS!"), and I believe some economists are using the apparent exogeneity of zombie outbreaks to finally identify the effect of education on wages.
12 March 2010
Here's a neat article in the Wall Street Journal on a new putting statistic recently adopted by the PGA that was developed by researchers at MIT's Sloan School of Management. The article gives a great rundown on the deficiencies of the "putting average" traditionally used to rate pro golfers, then explains in detail how this new statistic improves upon it. Cool stuff!
10 March 2010
The Google Public Data Explorer just went up and it is worth a look. They have collected a number of large datasets and created a set of visualization tools to explore the data. Probably most interesting is the ability to show how the data changes over time using animation. This will be familiar to you if you have seen any of Hans Rosling's TED talks.
While it is fun to play around with the data, it can be a bit overwhelming. Content requires curation. One that I found interesting was the World Bank data on net migration:
It's hard to get the colors/sizes quite right since size measures just the magnitude (positive or negative) and the colors range from red (people coming) to blue (people going). This sort of feels like the natural extension of programs like SPSS to the web.
8 March 2010
We hope you will join us this Wednesday, March 10th at the Applied Statistics workshop when we will be happy to have Tristan Zajonc (Harvard Kennedy School). Details and an abstract are below. A light lunch will be served. Thanks!
"Bayesian Inference for Dynamic Treatment Regimes"
Harvard Kennedy School
March 10th, 2010, 12 noon
K354 CGIS Knafel (1737 Cambridge St)
Policies in health, education, and economics often unfold sequentially and adapt to developing conditions. Doctors treat patients over time depending on their prognosis, educators assign students to courses given their past performance, and governments design social insurance programs to address dynamic needs and incentives. I present the Bayesian perspective on causal inference and optimal treatment choice for these types of adaptive policies or dynamic treatment regimes. The key empirical difficulty is dynamic selection into treatment: intermediate outcomes are simultaneously pre-treatment confounders and post-treatment outcomes, causing standard program evaluation methods to fail. Once properly formulated, however, sequential selection into treatment on past observables poses no unique difficulty for model-based inference, and analysis proceeds equivalently to a full-information analysis under complete randomization. I consider optimal treatment choice as a Bayesian decision problem. Given data on past treated and untreated units, analysts propose treatment rules for future units to maximize a policymaker's objective function. When policymaker's have multidimensional preferences, the approach can estimate the set of feasible outcomes or the tradeoff between equity and efficiency. I demonstrate these methods through an application to optimal student tracking in ninth and tenth grade mathematics. An easy to implement optimal dynamic tracking regime increases tenth grade mathematics achievement 0.1 standard deviations above the status quo, with no corresponding increase in inequality. The proposed methods provide a flexible and principled approach to causal inference for sequential treatments and optimal treatment choice under uncertainty.
In case you have not heard, Edward Tufte has been appointed to the Recovery Independent Advisory Panel by President Obama. The mission statement of the Panel is:
To promote accountability by coordinating and conducting oversight of Recovery funds to prevent fraud, waste, and abuse and to foster transparency on Recovery spending by providing the public with accurate, user-friendly information.
It is hard to imagine a better person for this panel than Tufte. As Feltron said, this is wonderful news for data nerds, designers, and the general public.
6 March 2010
Andrew Gelman has some good comments on the great Elizabeth Green article about teaching in the New York Times Magazine. The article is about how to improve both classroom management and subject instruction for K-12 teachers, but Gelman correctly points out that many of these the struggles resonate with those of us teaching statistics at the undergraduate and graduate levels.
I used to be of the opinion that the teaching of children and the teaching of adults were two fundamentally different beasts and comparisons between the two were missing the point. The more I teach, though, the more I see teaching as a kind of a skill which is separated from the material being taught. Knowing a topic well does not imply being able to teach a topic well. This should have been obvious to me given the chasm between good research and good presentations.1 The article nails this as it talks about math instruction:
Mathematicians need to understand a problem only for themselves; math teachers need both to know the math and to know how 30 different minds might understand (or misunderstand) it. Then they need to take each mind from not getting it to mastery. And they need to do this in 45 minutes or less. This was neither pure content knowledge nor what educators call pedagogical knowledge, a set of facts independent of subject matter, like Lemov's techniques. It was a different animal altogether.
If this is true, how can we improve teaching? I think that Gelman is right in identifying student participation as important to teaching statistics. Most instructors would agree that statistics is all about learning by doing, but many of us struggle to identify how to actually implement this, especially in lectures. Cold-calling is extremely popular with law and business schools, but rare in the social sciences. Breaking off to do group work is another useful technique. In addition to giving up control of the class (which Gelman mentions), instructors have to really build the class around these breaks.
Reflecting on my own experience, both as a student and an instructor, I am starting to believe in three (related) fundamentals of statistics teaching:
There are probably more fundamentals that I am missing, but I think each of these is important and overlooked. Often this is simply because they are hard implement, instructors have other commitments, and the value-added of improving instruction can be very low. In spite of these concerns and the usual red herrings2, I think that there are simple changes we can make to improve our teaching.
1Perhaps a more subtle point is that being a good presenter does not imply being a good instructor. They are related, though. Good public speakers have an advantage as teachers, since they are presumably more comfortable in front of crowds. The goal of presenting (persuasion) and the goal of instruction (training people in a skill) are very different. People confuse the two because the medium is often so similar (lecture halls, podiums, etc).↑
2Teaching evaluations are important, but they are often very coarse. Students know if they didn't understand something, but rarely know why. Furthermore, improving evaluations need not come from improving instruction. ↑
5 March 2010
Infochimps hosts what looks to be a growing number of datasets mostly free. There seems to be some ability to sell your dataset (at a 50% commission rate!), but the real story is quick ability to browse data. It looks a little thin now, bu as someone who is constantly looking for good examples for teaching, this could be a valuable resource. (via gelman)
2 March 2010
Newsdot is a new tool from Slate that displays a "social network" for topics in the news, be they people, organizations, or locations. Here's a look:
It uses a product called Calais, which does automatic tagging of documents by finding keywords. You can try it out with any set of text with their viewer. Here is a sample output from an article in the New York Times about the primary elections in Texas:
You can see that Calais has been able to identify all the Gov. Perry and Sen. Hutchison in addition to any pronouns or verbs that refer to them.
Some thoughts are below the fold.
1 March 2010
We hope you will join us this Wednesday, March 3rd at the Applied Statistics workshop when we will be happy to have Thomas Steenburgh (Harvard Business School). Details, an abstract, and a link to the paper are below. A light lunch will be served. Thanks!
"Substitution Patterns of the Random Coefficients Logit"
Harvard Business School
March 3rd, 2010, 12 noon
K354 CGIS Knafel (1737 Cambridge St)
You can find the paper at the SSRN.
Previous research suggests that the random coefficients logit is a highly flexible model that overcomes the problems of the homogeneous logit by allowing for differences in tastes across individuals. The purpose of this paper is to show that this is not true. We prove that the random coefficients logit imposes restrictions on individual choice behavior that limit the types of substitution patterns that can be found through empirical analysis, and we raise fundamental questions about when the model can be used to recover individuals' preferences from their observed choices.
Part of the misunderstanding about the random coefficients logit can be attributed to the lack of cross-level inference in previous research. To overcome this deficiency, we design several Monte Carlo experiments to show what the model predicts at both the individual and the population levels. These experiments show that the random coefficients logit leads a researcher to very different conclusions about individuals' tastes depending on how alternatives are presented in the choice set. In turn, these biased parameter estimates affect counterfactual predictions. In one experiment, the market share predictions for a given alternative in a given choice set range between 17% and 83% depending on how the alternatives are displayed both in the data used for estimation and in the counterfactual scenario under consideration. This occurs even though the market shares observed in the data are always about 50% regardless of the display.