Privacy, Statistics, and the Debate over the Regulation of Social Science Research

(This is a guest post by Dr. Micah Altman, who is a Senior Research Scientist and Director of Data Archiving and Acquisitions at IQSS.)

The U.S. Office for Human Research Protections (OHRP) proposed a set of sweeping changes to the federal regulations that govern research involving human subjects (the “Common Rule”), in the form of an Advance Notice of Proposed Rule Making (ANPRM) and solicited comments from investigators, Institutional Review Boards (IRBs), and any other interested parties by October 26, 2011. The ANPRM posed 75 questions, among these were many that implicated the collection, storage, de-identification and distribution of information about individual research subjects, as well as major questions about the types and nature of exempt and minimal risk research. Together these proposed changes could have a huge effect on the conduct of social science research and on the sharing of research results.

There have been over 1100 comments submitted on the proposed legislation. The Data Privacy Lab, which is run by Latanya Sweeney and which has now joined IQSS, organized a series of seminars at Harvard on the proposed changes, resulting in two responses being submitted. One response was drafted by the lab and joined by about 50 data privacy researchers and supported, and by two national privacy groups. A second complementary response, was led by Salil Vadhan, Joseph Professor of Computer Science and Applied Mathematics, and former Faculty Director of the Center for Research on Computation and Society of Computer Science at Harvard, and joined by academics and researchers.

Among other issues these responses draw attention to the key role of data sharing in social science research, and to the statistical and computation advances that have fundamentally changed both the analysis of informational risks, and the opportunities available to use statistical methods to disclose data safely.

For example, the proposed privacy HIPAA privacy rule is implicitly tailored to traditional microdata, as Vadhan, et al.’s response points out: “[T]here is increased interest in collecting and analyzing data sets that are not in the traditional microdata form. For example, social network data involves relationships between individuals. A “friendship” relationship or contact between two individuals on a social network does not entirely “belong” to either individual’s record; the relationship can have privacy implications for both parties. While this change from data about individuals to data about pairs may seem innocuous, it makes the task of anonymization much more difficult and one cannot expect standards developed for traditional microdata, like HIPAA, to apply. ”

The response then goes on to highlight how advances in statistical and computational methods can direct access to confidential data safely through dynamic interactive mechanisms for tabulation, visualizations, and general statistical analysis; multiparty computation; and synthetic data generation. In many circumstances these techniques can yield both better privacy protections and better research utility than traditional “de-identification” techniques such as removing and generalizing fields.

Sweeney, et. al’s response goes on to comment on the systemic issues, incentive problems, and policy issues associated with the proposed changes, most important:

First, that “The proposed ban on re-identification would drive re-identification methods further into hidden, commercial activities and deprive the public, the research community and policy makers of knowledge about re-identification risks and potential harms to the public.”

And second, that the proposed policy provides no incentive to develop or use statistical and computational methods that would improve both privacy and research utility of data sharing:”[T]here needs to be a channel for NCHS, NIST or a professional data privacy body to operationalize research results so that real-world data sharing decisions rely on the latest guidelines and best practices.”

The DPL has collected these responses, along with the related responses from Harvard University, major privacy groups, and Social Science research association.

Posted by Matt Blackwell at November 2, 2011 5:26 PM

Social Science Statistics Blog

Privacy, Statistics, and the Debate over the Regulation of Social Science Research

Blog posts by month