October 2006
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries

Categories

Blogroll

SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« Applied Statistics - Nan Laird & Christoph Lang | Main | Predicting Elections »

31 October 2006

More thoughts on publication bias and p-values

Amy Perfors

In a previous post about the Gerber & Malhotra paper about publication bias in political science, I rather optimistically opined that the findings -- that there were more significant results than would be predicted by chance, and that many of those were suspiciously close to 0.05 -- were probably not deeply worrisome, at least for those fields in which experimenters could vary the number of subjects run based on the significance level achieved thus far.

Well, I now disagree with myself.

This change of mind comes as a result of reading about the Jeffreys-Lindley paradox (Lindley, 1957), a Bayes-inspired critique of significance testing in classical statistics. It says, roughly, that with large enough sample size, a p-value can be arbitrarily close to zero even though the null hypothesis is highly probable (i.e., very close to one). In other words, a classical statistical test might reject the null hypothesis at an arbitrarily low p-value, even though the evidence that it should be accepted is overwhelming. [A discussion of the paradox can be found here].

When I learned about this result a few years ago, it astonished me, and I still haven't fully figured out how to deal with all of the implications. (This is obvious, since I forgot about it when writing the previous post!). As I understand the paradox, the intuitive idea is that, with larger sample size, you will naturally get some data that appears unlikely (and, the more data you collect, the more likely you are to see some really unlikely data). If you forget to compare the probability of that data under the null hypothesis with the probability of the data under the alternative hypotheses, then you might get an arbitrarily low p-value (indicating that the data are unlikely under the null hypothesis) even if the data is even more unlikely under any of the alternatives. Thus, if you just look at the p-value, without taking effect size, sample size, or the comparative posterior probability of each hypothesis under consideration, you are likely to wrongly reject the null hypothesis on the basis of the p-value, even if it is the most likely of all possibilities.

The tie-in with my post before, of course, is that it implies that it isn't necessarily "okay" practice to keep increasing sample size until you achieve statistical significance. Of course, in practice, sample sizes rarely get larger than 24 or 32 -- at the absolute outside, 50 to 100 -- which is much smaller than infinity. Does this practical consideration, then, mean that the practice is okay? As far as I can tell, it is fairly standard (but then, so is the reliance on p-values to the exclusion of effect sizes, confidence intervals, etc., so "common" doesn't mean "okay"). Is this practice a bad idea only if your sample gets extremely large?

Lindley, D.V. (1957) A statistical paradox. Biometrika, 44. 187-192

Posted by Amy Perfors at October 31, 2006 10:00 AM