16 October 2007
Nature had news article and an accompanying editorial in its most recent issue on the issues around privacy in the developing field of computational social science. These pieces do a nice job of highlighting the need to develop a powerful institutional infrastructure to facilitate the growth of computational social science. If I were to point to one place where this burgeoning field, which has enormous potential to do good, could trip up, it would be in dealing with privacy. The key challenges are balancing the need for privacy of the data of individuals, with the benefits of improved knowledge about human behavior. This balance is manageable, but requires a lot of work by social and computer scientists, as well as the existing self-regulatory systems of the research world (e.g., IRB's). Something that I will write about in the future, but below are excerpts from the article and editorial.
Excerpts from news article:
The hottest growth area in the field [of social science] is computational social science. This is often based on privileged access to electronic data sets such as e-mail records, mobile-phone call logs and web-search histories of millions of individuals. Such studies are ushering in a revolution in the social sciences, specialists say. But there is a trade-off between the scientific interest in working with such data and concerns about privacy. “It’s a huge issue,” says David Lazer, a researcher at the John F. Kennedy School of Government at Harvard University.
This work [referring to work by Jon Kleinberg that illustrates the potential to "de-anonymize" network data] reinforces the need for a systematic, institutional approach to improving the privacy rights of those whose data are used, says [Marshall] Van Alstyne [of Boston University]. That echoes the conclusions of a May study by the US National Academies, which said that safeguarding privacy cannot safely be left to individual researchers. It stated that: “Institutional solutions involve establishing tiers of risk and access, and developing data-sharing protocols that match the level of access to the risks and benefits of the planned research.” But [Myron] Gutmann [of the University of Michigan and co-author of the study] and other social scientists also stress that the risks should be kept in perspective. Scientists must meet strict rules on any research on human subjects. In contrast, private firms are largely free from such constraints, and already have wide latitude to snoop on, and data mine, their employees’ work habits.
Excerpts from editorial:
For a certain sort of social scientist, the traffic patterns of millions of e-mails look like manna from heaven. Such data sets allow them to map formal and informal networks and pecking orders, to see how interactions affect an organization's function, and to watch these elements evolve over time. They are emblematic of the vast amounts of structured information opening up new ways to study communities and societies. Such research could provide much-needed insight into some of the most pressing issues of our day, from the functioning of religious fundamentalism to the way behaviour influences epidemics....
But for such research to flourish, it must engender that which it seeks to describe. And so it is encouraging that computational social scientists are trying to anticipate threats to trust that are implicit in their work. Any data on human subjects inevitably raise privacy issues (see page 644), and the real risks of abuse of such data are difficult to quantify. But although the risks posed by researchers seem far lower than those posed by governments, private companies and criminals, their soul-searching is justified. Abuse or sloppiness could do untold damage to the emerging field.
Posted by David Lazer at October 16, 2007 8:25 AM