The New York Times has an article today ("For Drug Makers, a Downside to Full Disclosure") discussing the recent creation of archives for pharmecutical clinical trial data, including data from trials that did not result in publications. This effort is an attempt to deal with the age old problem of publication bias, a problem supposedly identified by the ancient Greeks, as described in a letter to the editor of Lancet by Mark Pettigrew:
The writings of Francis Bacon (1561-1626) are a good starting point. In his 1605 book, The Advancement of Learning, he alludes to this particular bias by pointing out that it is human nature for "the affirmative or active to effect more than the negative or privative. So that a few times hitting, or presence, countervails oft-times failing or absence". This is a clear description of the human tendency to ignore negative results, and Bacon would be an acceptable father figure. Bacon, however, goes further and supports his claim with a story about Diagoras the Atheist of Melos, the fifth century Greek poet.
Diagoras was the original atheist and free thinker. He mocked the Eleusinian mysteries, an autumnal fertility festival which involved psychogenic drug-taking, and was outlawed from Athens for hurling the wooden statue of a god into a fire and sarcastically urging it to perform a miracle to save itself. In the context of publication bias, his contribution is shown in a story of his visit to a votive temple on the Aegean island of Samothrace. Those who escaped from shipwrecks or were saved from drowning at sea would display portraits of themselves here in thanks to the great sea god Neptune. "Surely", Diagoras was challenged by a believer, "these portraits are proof that the gods really do intervene in human affairs?" Diagoras' reply cements his claim to be the "father of publication bias": "yea, but . . . where are they painted that are drowned?"
While dealing with publication bias would seem to be a good thing, the Times article suggests (perhaps in an attempt to avoid publication bias itself) that some people are worried about this practice:
Some experts also believe that releasing the results of hundreds of studies involving drugs or medical devices might create confusion and anxiety for patients who are typically not well prepared to understand the studies or to put them in context.
“I would be very concerned about wholesale posting of thousands of clinical trials leading to mass confusion,” said Dr. Steven Galson, the director for the Center for Drug Evaluation and Research at the F.D.A.
It is a little hard for me to believe that this confusion would be worse than the litany of possible side effects given at the end of every pharmecutical commercial, but that is a different issue. From a purely statistical point of view, it seems like this is a no-brainer, a natural extension of efforts to ensure that published results can be replicated. Whether you are a frequentist or a Bayesian, inferences should be better when conditioned on all of the data that has been collected, not just the data that researchers decided to use in their publications. There could be a reasonable argument about what to do with (and how do define) corrupted data - data from trials that blew up in one way or another - but this seems like a second-order consideration.
It would be great if we could extend this effort into the social sciences. It would be easier to do this for experimental work since the data collection process is generally well defined. On the other hand, I suspect that there is less of a need for archives of experimental data in the social sciences, for two reasons. First, experimental work is still rare enough (at least in political science) that I think you have a decent chance of getting published even with "non-results". Second, my sense is that, with the possible exception of researchers closely associated with particular policy interventions, the incentives facing social scientists are not the same as those facing pharmecutical researchers. Social scientists may have a preference for "significant" results, but in most cases they don't care as much about the direction.
The kind of data archive described above would be more useful for observational research, but much harder to define. Most social scientists have invested significant time and energy collecting observational data only to find that there are no results that reviewers would think were worth publishing. On the other hand, how do we define a trial for observational data? Should there be an obligation to make one's data available any time that it is collected, or should it be restricted to data that has been analyzed and found uninteresting? Or should we think of data and models together, and ask researcher to share both their data and their analysis? I'm not sure what the answer is, but it is something that we need to think about as a discipline.