 

#  Studies that withhold replication data are more likely to have errors 

 





November 10, 2011

 

 

We already knew that scholars who provide replication data [get cited more](http://gking.harvard.edu/files/replvdc.pdf). Now we know that they are also more likely to be right! Paper by Wicherts, Bakker, and Molenaar [here](http://andrewgelman.com/wp-content/uploads/2011/11/PLoSONE_Wicherts.pdf). Blog post by Gelman [here](http://andrewgelman.com/2011/11/insecure-researchers-arent-sharing-their-data/).

The authors asked for replication data to 49 psychology studies. Amazingly, many of them did not comply *even though they were explicitly under contract with the journals to provide the data.*

1\) Papers whose authors withheld data had more reporting errors, meaning that the reported p-value was different than the correct p-value as calculated from the coefficient and standard error (as reported in the paper). I'd really like to think that these were all just innocent typos but: in seven papers, these typos reversed findings. **None** of those seven authors shared their data.

2\) Papers whose authors withheld data tended to have larger p-values, meaning that their results were not as "strong" in some sense. This interpretation tortures the idea of the p-value a little bit, but it certainly represents how many researchers think about p-values. It's striking that researchers who think their results are "weaker" were less likely to provide data. It also suggests that researchers who are getting a range of p-values from different, plausible models tend to pick the p-value just below 0.05 rather than the one just above. But then, we already [knew that](http://polmeth.wustl.edu/media/Paper/Publication%20Bias%20in%20Political%20Science%20(final).pdf).

This is frightening, not least because most of these were **lab experiments**, where we tend to think that the results are less sensitive to analyst manipulation because of strong design. Also, these are only the problems that were obvious *without* access to the replication data.

Most responses to this study include appeals for better data sharing standards, but I don't think it's necessary. As long as we know which authors provide replication data and which don't, we can all update accordingly.

Posted by [Richard Nielsen](http://www.iq.harvard.edu/blog/sss/archives/author/richard-nielsen-1/) at November 9, 2011 7:48 PM



 

 

 



 

 

 Share on:- [     Facebook ](#)
- [     Twitter ](#)
- [     Linkedin ](#)