26 September 2006
I'm a little late into the game with this, but it's interesting enough that I'll post anyway. Several folks have commented on this paper by Gerber and Malhotra (which they linked to) about publication bias in political science. G&M looked at how many articles were published with significant (p<0.05) vs. non-significant results, and found -- not surprisingly -- that there were more papers with significant results than would be predicted by chance; and, secondly, that many of the significant results were suspiciously close to 0.05.
I guess this is indeed "publication bias" in the sense of "there is something causing articles with different statistical significance to be published differentially." But I just can't see this as something to be worried about. Why?
Well, first of all, there's plenty of good reason to be wary of publishing null results. I can't speak for political science, but in psychology, a result can be non-significant for many many more boring reasons than that there is genuinely no effect. (And I can't imagine why this would be different in poli sci). For instance, suppose you want to prove that there is no relation between 12-month-olds' abilities in task A and task B. It's not sufficient to show a null result. Maybe your sample size wasn't large enough. Maybe you're not actually succeeding in measuring their abilities in either or both of the tasks (this is notoriously difficult with babies, but it's no picnic with adults either). Maybe A and B are related, but the relation is mediated by some other factor that you happen to have controlled for. etcetera. Now, this is not to say that no null results are meaningful or that null results should never be published, but a researcher -- quite rightly -- needs to do a lot more work to make it pass the smell test. And so it's a good thing, not a bad thing, that there are fewer null results published.
Secondly, I'm not even worried about the large number of studies that are just over significance. Maybe I'm young and naive, but I think it's probably less an indication of fudging data than a reflection of (quite reasonable) resource allocation. Take those same 12-month-old babies. If I get significant results with N=12, then I'm not going to run more babies in order to get more significant results. Since, rightly or wrongly, the gold standard is the p<0.05 value (which is another debate entirely), it makes little sense to waste time and other resources running superfluous subjects. Similarly, if I've run, say, 16 babies and my result is almost p<0.05, I'm not going to stop; I'll run 4 more. Obviously there is an upper limit on the number of subjects, but -- given the essential arbitrariness of the 0.05 value -- I can't see this as a bad thing either.
This week the Applied Statistics Workshop will present a talk by Ben Hansen, Assistant Professor of Statistics at the University of Michigan. Professor Hansen graduated from Harvard College, magna cum laude, with a degree in Mathematics and Philosophy. He went on to win a Fulbright Fellowship to study philosophy at the University of Oslo, Norway, after which he earned his Ph.D. in Logic and Methodology of Science at the University of California, Berkeley.
Professor Hansen’s primary research interests involve causal inference in comparative studies, particularly observational studies in the social sciences. His publications appear in the Journal of Computational and Graphical Statistics, Bernoulli, Journal of the American Statistical Association, and Statistics and Probability Letters. He is currently working on providing methods for statistical adjustment that enable researchers to mount focused, specific analogies of their observational studies to randomized experiments.
Professor Hansen will present a talk entitled "Covariate balance in simple, stratified and clustered comparative studies." The working paper that accompanies the talk is available from the course website. The presentation will be at noon on Wednesday, September 27, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.
If you missed the workshop’s first meeting, you should check out the abstract of Jake Bowers’ talk, “Fixing Broken Experiments: A Proposal to Bolster the Case for Ignorability Using Subclassification and Full Matching”.