25 April 2006
There was a big scandal in scientific research recently. Dr. Hwang Woo-suk, Seoul National University in Korea, announced last June that he and his team had cloned human embryonic stem cells from 11 patients. It was a remarkable breakthrough in stem cell research and many people expected that he would eventually get a Nobel Prize. Hwang's team, however, was found to have intentionally fabricated key data in two landmark papers on human embryonic stem cells, according to a Seoul National University panel. Now, the prosecution is probing into his team’s alleged fabrication of data and violation of bioethics law.
Remarkably, the prestigious journal Science was not able to detect the data faking before and after publication of the articles. It is understandable considering that peer reviewers typically examine the presented analysis of the data but do not receive nor examine the actual data itself. Even more surprisingly, most of the 26 co-authors of the June 2005 article were unaware of the data fabrication. It was revealed only through an inside whistleblower who was the second author of the earlier article, and through a team of investigative journalists.
This incident makes us aware of the weakness and vulnerability of the review system of academic journals. Indeed, there have been many fraud cases in the history of scientific research, and Dr. Hwang has just added one more such case. Although outright faking may not be very common, errors in data and data analysis might be much more common than most people assume them to be.
I was struck by numerous errors that were found by students of Gov 2001 who replicated the analysis of an article published in a prominent social science journal. Many of the errors are probably benign and not critical to their key findings, but some errors may be critical and even deliberate. It can be tempting to distort the data or results of data analysis when a researcher has spent much time and energy to find evidence to support his or her hypothesis and the results are close but fall short of significance.
In his entry entitled Citing and Finding Data, Gary King discussed the [in]ability to reliably cite, access, and find quantitative data, all of which remain in an entirely primitive state of affairs. Sebastian Bauhoff also stressed the need for making data available in his entry Data Availability. I cannot agree with them more. If journals require authors to submit data as well as manuscript of their paper and publish data that were used for articles as an on-line appendix, it will certainly reduce the errors in data and data analysis as well as spur further research. This should be applied to qualitative data (such as interview transcripts) as well as quantitative data.