30 May 2007
The New York Times has an interesting article today on airlines overbooking flights. Apparently the number of people bumped off flights (voluntarily and involuntarily) has risen over the past years despite efforts to model no-shows.
The article mentions US Airways' team of ``math nerds'' who are trying to figure out how many seats the airline can sell without bumping off too many people. One interesting aspect is that they seem unable to do a great job at predicting the number of no-shows for a given flight, which leads to too-high overbooking. I wonder why that's so hard to get right? With all their historical data, airlines should be able to do a reasonable job. The article's charts show that the number of seats sold on the average flight has increased over the last years, which leads to more bumping. I imagine that this behavior is more driven by profit motives, and that airlines risk overselling more frequently. Unless model accuracy increased alongside, it's pretty much given that they end up bumping more people.
But the most interesting insight is how people respond to the increased bumping. Staff book fake passengers to prevent headquarters from overbooking flights (apparently Mickey Mouse is a favorite placeholder). Airlines bump more in the morning, because they can move passengers to flights later in the day. Passengers increasingly refuse to be bumped because they anticipate being stuck if they agree to wait. Whatever the reason that the predictive accuracy is not great right now; people's responses have to be reckoned with and ought to be part of the model.
23 May 2007
The New York Times has an article today ("For Drug Makers, a Downside to Full Disclosure") discussing the recent creation of archives for pharmecutical clinical trial data, including data from trials that did not result in publications. This effort is an attempt to deal with the age old problem of publication bias, a problem supposedly identified by the ancient Greeks, as described in a letter to the editor of Lancet by Mark Pettigrew:
The writings of Francis Bacon (1561-1626) are a good starting point. In his 1605 book, The Advancement of Learning, he alludes to this particular bias by pointing out that it is human nature for "the affirmative or active to effect more than the negative or privative. So that a few times hitting, or presence, countervails oft-times failing or absence". This is a clear description of the human tendency to ignore negative results, and Bacon would be an acceptable father figure. Bacon, however, goes further and supports his claim with a story about Diagoras the Atheist of Melos, the fifth century Greek poet.
Diagoras was the original atheist and free thinker. He mocked the Eleusinian mysteries, an autumnal fertility festival which involved psychogenic drug-taking, and was outlawed from Athens for hurling the wooden statue of a god into a fire and sarcastically urging it to perform a miracle to save itself. In the context of publication bias, his contribution is shown in a story of his visit to a votive temple on the Aegean island of Samothrace. Those who escaped from shipwrecks or were saved from drowning at sea would display portraits of themselves here in thanks to the great sea god Neptune. "Surely", Diagoras was challenged by a believer, "these portraits are proof that the gods really do intervene in human affairs?" Diagoras' reply cements his claim to be the "father of publication bias": "yea, but . . . where are they painted that are drowned?"
While dealing with publication bias would seem to be a good thing, the Times article suggests (perhaps in an attempt to avoid publication bias itself) that some people are worried about this practice:
Some experts also believe that releasing the results of hundreds of studies involving drugs or medical devices might create confusion and anxiety for patients who are typically not well prepared to understand the studies or to put them in context.
“I would be very concerned about wholesale posting of thousands of clinical trials leading to mass confusion,” said Dr. Steven Galson, the director for the Center for Drug Evaluation and Research at the F.D.A.
It is a little hard for me to believe that this confusion would be worse than the litany of possible side effects given at the end of every pharmecutical commercial, but that is a different issue. From a purely statistical point of view, it seems like this is a no-brainer, a natural extension of efforts to ensure that published results can be replicated. Whether you are a frequentist or a Bayesian, inferences should be better when conditioned on all of the data that has been collected, not just the data that researchers decided to use in their publications. There could be a reasonable argument about what to do with (and how do define) corrupted data - data from trials that blew up in one way or another - but this seems like a second-order consideration.
It would be great if we could extend this effort into the social sciences. It would be easier to do this for experimental work since the data collection process is generally well defined. On the other hand, I suspect that there is less of a need for archives of experimental data in the social sciences, for two reasons. First, experimental work is still rare enough (at least in political science) that I think you have a decent chance of getting published even with "non-results". Second, my sense is that, with the possible exception of researchers closely associated with particular policy interventions, the incentives facing social scientists are not the same as those facing pharmecutical researchers. Social scientists may have a preference for "significant" results, but in most cases they don't care as much about the direction.
The kind of data archive described above would be more useful for observational research, but much harder to define. Most social scientists have invested significant time and energy collecting observational data only to find that there are no results that reviewers would think were worth publishing. On the other hand, how do we define a trial for observational data? Should there be an obligation to make one's data available any time that it is collected, or should it be restricted to data that has been analyzed and found uninteresting? Or should we think of data and models together, and ask researcher to share both their data and their analysis? I'm not sure what the answer is, but it is something that we need to think about as a discipline.
22 May 2007
Over at the Volokh Conspiracy, Professor Elmer Elhauge from Harvard Law School has a post about the future of empirical legal studies, comparing the law today to baseball before the rise of sabermetrics. From the post:
In short, in law, we are currently still largely in the position of the baseball scouts lampooned so effectively in Moneyball for their reliance on traditional beliefs that had no empirical foundation. But all this is changing. At Harvard Law School, as traditional a place as you can get, we now have by my count 10 professors who have done significant statistical analysis of legal issues. We just hired our first JD with a PhD in statistics. The movement is not at all limited to Harvard, and seems to be growing at all law schools.
So we are hardly devoid of empirical analysis of law. We are just, rather, in our early Bill James era, and can expect the analysis to get more sophisticated and systematic as things progress. I expect within a couple of decades we will have our own book distilling the highlights of things we will know then that conflict with what is now conventional legal wisdom.
We are all pretty pleased that Harvard Law now has a stats Ph.D. on faculty. But one of the commenters raises an interesting question; if empirical legal studies are like sabermetrics, who is the legal equivalent of Joe Morgan?
12 May 2007
This is a common question, commonly misunderstood. It certainly does seem like MI makes up data, since if you look at the 5 or so imputed data sets, the missing values are indeed filled in. But in fact, the point of MI has nothing to do with making up data, and everything to do merely with putting the data in a more convenient format.
The fact is that the vast majority of our statistical techniques require rectangular data sets, and so data that look like swiss cheese make it really hard to do anything sensible with directly. Listwise deletion, where you excise horizontal slices out of the cheese wherever you see holes, discards a lot of cheese! What MI does instead is to fill in the holes in the data using all available information from the rest of the data set (thus moving some information around) and adding uncertainty to these imputations in the form of variation in the values across the different imputed data sets (thus taking back assertions of knowledge from the imputations when it is not predictable from the rest of the data and from duplication of the same information in different places in the data). If done properly, MI merely puts the data in a convenient rectangular format and enables the user (with some simple combining rules) to apply statsitical techniques to data acting as if it were fully observed. MI standard errors then are not too small, which would be the case if data were being made up.
The particular models for imputation can be used incorrectly or inappropriately (and so should be used with priors when additional information is available; see e.g., "What to do About Missing Values in Time Series Cross-Section Data"), but proper usage of MI makes up no information other than that genuinely available.
10 May 2007
And while we're doing announcements, the Society for Political Methodology is also soliciting nominations for the Gosnell Prize, awarded to the best paper in methods presented at any political science conference:
The Gosnell Prize for Excellence in Political Methodology is awarded for the best work in political methodology presented at any political science conference during the preceding year, 1 June 2006-31 May 2007.
The Award Committee also includes Michael Crespin and Patrick Brandt.
We look forward to submissions for this important award in the next few weeks, as our decision will be made toward the end of the month. Yes, this month. Right now it is a wide open field. There were a lot of great papers presented at APSA, MPSA, Methods, ISA, and elsewhere in the past year. Please send a short nomination paragraph along with the originally presented paper (not a revision) in PDF format to me or any of the committee members.
Thanks for your help in nominating worthy manuscripts.
Michael D. Ward, Professor of Political Science
University of Washington, Seattle, WA, 98195-3530, USA
The Program on Survey Research at Harvard is hosting an afternoon conference tomorrow on the challenges of surveying multiethnic populations:
Surveying Multiethnic America
May 11, 2007
12:30 – 5:00
Institute for Quantitative Social Science
1737 Cambridge St.
Cambridge, MA 02138
Across a variety of different academic disciplines, scholars are interested in topics related to multiethnic populations, and sample surveys are one of the primary means of studying these populations. Surveys of multiethnic populations face a number of distinctive methodological challenges, including issues related to defining and measuring ethnic identity, and locating, sampling, and communicating with the groups of interest.
This afternoon panel sponsored by the Program on Survey Research at Harvard University will look at recent survey research projects on multiethnic populations in the US. Researchers will discuss how they confronted the unique methodological challenges in their survey projects and will consider the implications of their approach for their key theoretical and empirical findings.
12:30 - 2:45
Sunshine Hillygus, Harvard University, Introduction
Manuel de la Puente, US Bureau of the Census, Current Issues in Multiethnic Survey Methods
Guillermina Jasso, New York University, New Immigrant Study
Deborah Schildkraut, Tufts University, The 21st Century Americanism Study
Yoshiko Herrera, Harvard University, Discussant
3:00 - 5:00
Tami Buhr, Harvard University, Harvard Multi-Ethnic Health Survey
Ronald Brown, Wayne State University, National Ethnic Pluralism Survey
Valerie Martinez-Ebers, Texas Christian University, National Latino Politics Survey
Kim Williams, Harvard University, Discussant
Simon Jackman sent around the following today on behalf of the Society for Political Methodology:
The Society for Political Methodology will award its first Political Methodology Career Award this year, to recognize an outstanding career of intellectual accomplishment and service to the profession in the Political Methodology field. The award committee -- Simon Jackman (chair), Elisabeth Gerber, Marco Steenbergen, Mike Alvarez -- is calling for nominations for this award, due no later than Monday May 28, 2007. Nominations may be sent to me. Needless to say, a brief statement in support of the nominee will greatly assist the committee in our deliberations.
9 May 2007
This may not be new to anybody but me, but recent news at UNC brought the so-called "Achievement Index" to my attention. The Achievement Index is a way of calculating GPA that takes into account not only how well one performs in a class, but also how hard the class is relative to others in the institution. It was first suggested by Valen Johnson, a professor of statistics at Duke University, in a paper in Statistical Science titled "An Alternative to Traditional GPA for Evaluating Student Performance." (The paper is available on his website; you can also find a more accessible pdf description here).
This seems like a great idea to me. The model, which is Bayesian, calculates "achievement index" scores for each student as latent variables that best explain the grade cutoffs for each class in the university. As a result, it captures several phenomena: (a) if a class is hard and full of very good students, then a high grade is more indicative of ability (and a low grade less indicative of lack of ability); (b) if a class is easy and full of poor students, then a high grade doesn't mean much; (c) if a certain instructor always gives As then the grade isn't that meaningful -- though it's more meaningful if the only people who take the class in the first place are the extremely bright, hard-working students. Your "achievement index" score thus reflects your actual grades as well as the difficulty level of the classes you have chosen.
Why isn't this a standard measure of student performance? 10 years ago it was proposed at Duke but failed to pass, and at UNC they are currently debating it -- but what about other universities? The Achievement Index addresses multiple problems. There would be less pressure toward grade inflation, for one thing. For another, it would address the unfortunate tendency of students to avoid "hard" classes for fear of hurting their GPA. Students in hard majors or taking hard classes also wouldn't be penalized in university-wide, GPA-based awards.
One might argue that students shouldn't avoid hard classes simply because of their potential grade, and I tend to agree that they shouldn't -- it was a glorious moment in my own college career when I finally decided "to heck with it" and decided to take the classes that interested me, even if they seemed really hard. But it's not necessarily irrational for a student to care about GPA, especially if important things -- many of which I didn't have to worry about -- hinge on it: things like scholarships or admission to medical school. Similarly, instructors shouldn't inflate grades and create easy classes, but it is often strictly "rational" to do so: giving higher grades can often mean better evaluations and less stress due to students whinging for a higher grade, and easier classes are also easier to teach. Why not try to create a system where the rational thing to do within that system is also the one that's beneficial for the university and the student in the long run? It seems like the only ones who benefit from the current system are the teachers who inflate their grades and teach "gimme" courses and the students who take those easy courses. The ones who pay are the teachers who really seek to challenge and teach their students, and the students who want to learn, who are intellectually curious and daring enough to take courses that challenge them. Shouldn't the incentive structure be the opposite?
I found a petition against the Achievement Index online, and I'm not very persuaded by their arguments. One problem they have is that it's not transparent how it works, which I could possibly see being a concern... but there are two kinds of transparency, and I think only one really matters. If it's not transparent because it's biased or subjective, then that's bad; but if it's not transparent simply because it's complicated (as this is), but is in fact totally objective and is published how it works - then, well, it's much less problematic. Sometimes complicated is better: and other things that matter a great deal for our academic success -- such as SATs and GREs -- aren't all that transparent either, and they are still very valuable. The petition also argues that using the AI system will make students more competitive with each other, but I confess I don't understand this argument at all: how will it increase competition above and beyond the standard GPA?
Anyway, it might seem like I'm being fairly dogmatic about the greatness of the Achievement Index, but I don't intend to be. I have no particular bone to pick, and I got interested in this issue originally mainly just because I wanted to understand the model. It's simply that I don't really see any true disadvantages and I wonder what I'm missing. Why don't more universities try to implement it? Can anyone enlighten me?
8 May 2007
We have blogged a fair bit about reproducibility standards and data-sharing for replication (see here and here). Some journals require authors to make datasets and codes available for a while already, and now these policies start to show effects. For example the American Economic Review requires authors to submit their data since 2004, and this information is now available on their website. The AER provides a basic readme document and files with the used variables for an increasing number of articles since late 2002; some authors also provide their program codes. There's a list of articles with available data here.
The 2006 Report of the Editor suggests that most authors now comply with the data posting requirements and that only few exceptions are made. At this point AER is pretty much alone among the top economics journals with offering this information. I wonder if authors substitute between the AER and other journals. Since the AER is still a very desirable place to publish, maybe this improves the quality of AER submissions if only confident authors submit? At least for now the submission statistics in the editor’s report don't suggest that they are loosing authors. Meanwhile hundreds of grad students can rejoice in a wealth of interesting papers to replicate.
7 May 2007
Just as a reminder, the Applied Statistics Workshop has wrapped up for this academic year. Thanks to all who came to the talks, and we look forward to seeing you again in September.
2 May 2007
OK, so now that I have a job, I feel like I can stick my foot in something smelly to see what happens. When I was on the market this past year, I was often asked about the difference (lawyers are always careful to ask about "the difference, if any") between a degree in statistics and a degree in something more "traditional" for a law scholar, such as economics or political science or sociology. Because of the prevelance and power of the Law & Economics movement in legal scholarship, there was particular interest in the difference between statistics and economics/econometrics. I had a certain amount of trouble answering the question. It was easy to point out that the best quantitative empiricists move within all fields and are able to read all literatures. As an aspiring statistician, it was also easy to give the statistical version of things, which is that statisticians invent data analysis techniques and methods that, after ten to twenty-five to forty years, filter into or are reinvented by other fields (whenever I said this, I clarified that this story was a caricature).
So what is the difference between an empirical, data-centered economist and an applied statistician? The stereotypes I've internalized from hanging out in an East Coast statistics department are that economists tend to focus more on parameter estimation, asymptotics, unbiasedness, and paper-and-pencil solutions to problems (which can then be implemented via canned software like STATA), whereas applied statisticians are leaning more towards imputation and predictive inference, Bayesian thinking, and computational solutions to problems (which require programming in packages such as R). Anyone care to disabuse me of these notions?
1 May 2007
The New York Times has an article discussing a working paper by Justin Wolfers and Joseph Price, looking at the rate at which white referees call fouls on black players (and black referees call fouls on white players). The paper can be found here. I haven't had a chance to read it yet, but if it uses "multivariable regression analysis" as it says in the Times article, then I'm sure it must be good.