November 2011
Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

Authors' Committee


Matt Blackwell (Gov)


Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries



SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
The Education Wonks
Empirical Legal Studies
Free Exchange
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science



Powered by
Movable Type 4.24-en

« October 31, 2011 | Main | November 6, 2011 »

2 November 2011

Privacy, Statistics, and the Debate over the Regulation of Social Science Research

(This is a guest post by Dr. Micah Altman, who is a Senior Research Scientist and Director of Data Archiving and Acquisitions at IQSS.)

The U.S. Office for Human Research Protections (OHRP) proposed a set of sweeping changes to the federal regulations that govern research involving human subjects (the “Common Rule”), in the form of an Advance Notice of Proposed Rule Making (ANPRM) and solicited comments from investigators, Institutional Review Boards (IRBs), and any other interested parties by October 26, 2011. The ANPRM posed 75 questions, among these were many that implicated the collection, storage, de-identification and distribution of information about individual research subjects, as well as major questions about the types and nature of exempt and minimal risk research. Together these proposed changes could have a huge effect on the conduct of social science research and on the sharing of research results.

There have been over 1100 comments submitted on the proposed legislation. The Data Privacy Lab, which is run by Latanya Sweeney and which has now joined IQSS, organized a series of seminars at Harvard on the proposed changes, resulting in two responses being submitted. One response was drafted by the lab and joined by about 50 data privacy researchers and supported, and by two national privacy groups. A second complementary response, was led by Salil Vadhan, Joseph Professor of Computer Science and Applied Mathematics, and former Faculty Director of the Center for Research on Computation and Society of Computer Science at Harvard, and joined by academics and researchers.

Among other issues these responses draw attention to the key role of data sharing in social science research, and to the statistical and computation advances that have fundamentally changed both the analysis of informational risks, and the opportunities available to use statistical methods to disclose data safely.

For example, the proposed privacy HIPAA privacy rule is implicitly tailored to traditional microdata, as Vadhan, et al.’s response points out: “[T]here is increased interest in collecting and analyzing data sets that are not in the traditional microdata form. For example, social network data involves relationships between individuals. A “friendship” relationship or contact between two individuals on a social network does not entirely “belong” to either individual’s record; the relationship can have privacy implications for both parties. While this change from data about individuals to data about pairs may seem innocuous, it makes the task of anonymization much more difficult and one cannot expect standards developed for traditional microdata, like HIPAA, to apply. ”

The response then goes on to highlight how advances in statistical and computational methods can direct access to confidential data safely through dynamic interactive mechanisms for tabulation, visualizations, and general statistical analysis; multiparty computation; and synthetic data generation. In many circumstances these techniques can yield both better privacy protections and better research utility than traditional “de-identification” techniques such as removing and generalizing fields.

Sweeney, et. al’s response goes on to comment on the systemic issues, incentive problems, and policy issues associated with the proposed changes, most important:

First, that “The proposed ban on re-identification would drive re-identification methods further into hidden, commercial activities and deprive the public, the research community and policy makers of knowledge about re-identification risks and potential harms to the public.”

And second, that the proposed policy provides no incentive to develop or use statistical and computational methods that would improve both privacy and research utility of data sharing:”[T]here needs to be a channel for NCHS, NIST or a professional data privacy body to operationalize research results so that real-world data sharing decisions rely on the latest guidelines and best practices.”

The DPL has collected these responses, along with the related responses from Harvard University, major privacy groups, and Social Science research association.

Posted by Matt Blackwell at 5:26 PM