Main

November 3, 2011

International Cyber Security Challenges

It is widely recognized in government, business, military and scientific circles that there is a growing interconnectedness of physical and virtual infrastructure through information and communication technology (ICT). Many refer to it as Cyberspace. A domain, pervasive and ubiquitous, now considered strategically equal to land, air, sea, and space. Yet from a security perspective, cyberspace is different to the four other domains. It has emergent properties and eludes state control. Beyond its impact on leadership, management or institutions, there is a common fear that the growing interconnectedness offers more and more avenues for disruptions in the digital supply chain and thus increases the vulnerability of the information society, military power and the global economy to system failure.

The causes of disruptions to cyberspace and critical infrastructure - e.g. utilities, transport, telecoms, defense contractors, government institutions - in general can be divided into three categories: natural disasters, accidents and intentional attacks . Cyber threats are risks arising from cyberspace and its technologies (e.g. hacking, denial of service attacks, viruses, malware). The impact of malicious cyber activities can be without direct consequences (e.g. installing spyware) but may as well lead to physical consequences (e.g. loss of business or failure of control systems). Cybercrime is a common threat image in the business community while cyber terrorism, cyber war and information warfare dominate defense community's framing of the issue . The worst case scenario frequently mentioned but highly contested is an "electronic Pearl Harbor" type disruption.

Images of threats typically involve a broad range of adversaries and targets, including both state and non-state actors, dissolving the boundaries between the domestic and the international. Along these lines, non-state actors may be a challenge to as well as providers of security. Most observers focus on the transnational and network-based character of cyber threats. Adversaries are typically seen as operating in loosely organized networks consisting of relatively independent nodes of individuals. Research and media coverage of recent cyber attacks underline that beyond the hype concerns are real as attempts to exploit or defeat existing cyber infrastructures are happening ever second.

IT-security researchers showed that an unprotected computer connected to the Internet to collect intelligence on attack techniques and behaviors ("honeypot") was hacked and utilized for a botnet within 15 minutes of connecting it to the Internet. The sophistication of attacks and their consequences have reached a new level since 2010. Unlike distributed denial of service (DDOS) attacks to disrupt the Internet in Estonia or Georgia, the "Stuxnet Worm", likely developed by a state actor, targeted industrial control systems that use Siemens software and infected over 30,000 computers in Iran, including computers involved in running nuclear facilities in Iran. The latter raises even greater concerns about not only the threat to industrial control systems, but to all components of our information and communications technologies, generally. Early in 2011, RSA, a provider of SecurID two-factor authentication products which can be considered one of the core Internet security technologies experienced a similar APT breach to its infrastructure as Google did in 2010. Unlike most intrusions that go after financial and identity data, advanced persistent threat (APT) attacks tend to go after source code and other intellectual property. Intrusions may even sneak into an organization's network, sometimes for years, even after it has taken corrective action.

Much has been written about the challenge of attribution in cyberspace. Who is intruding in a system and who is behind the malicious activity? If we are attacked, will we know who is behind it, so that we can respond, without incurring the wrath of the world community? All too often it remains difficult, if not impossible, to identify the involved parties who hide behind the anonymity and global orientation of the Internet and utilize a catacomb of enablers, consisting of both legitimate and illegitimate providers, to cover their tracks. Policy makers and military planners are only beginning to address these questions.

In fact, cyber security and risk is inherently difficult to comprehend and communicate, due to its socio-technical complexity, and relationships among stakeholders in the international community. There is for example, little agreement as to what the security issue in cyberspace actually is as well as what is critical and the threat level . From a European perspective, the cyber insecurity "hype" is much more prevalent in the U.S. than in Europe. Consequently, it is difficult to come together around a common, collective vision in international bodies such as the UN or NATO by states such as the United States, United Kingdom, France, and Germany which often lead the adoption of broad based agendas. An additional challenge derives from the existing cyber security institutional eco-system which resembles a broad set of international, national, and private organizations with unclear and overlapping boundaries as well as differing capacities. Finally, the government resources available are tied up with national cyber security efforts such as critical infrastructure protection where state and industry objectives are only partially convergent. Specifically, the private sector fears that sensitive information on past security incidents might not be treated with the necessary degree of confidentiality by state entities and cause damage to their reputation. Furthermore, international approaches would be of much greater interest for transnational businesses. Many believe a comprehensive approach , one that provides for appropriate information sharing and mutual assistance obligations governed by international policies and treaties, is needed. Considering the struggles of the UN, EU or NATO in developing a common comprehensive approach for conflict and crisis management and the illustrated difficulties in the cyber security domain, this will pose a great challenge to the international community.

Nevertheless it requires a public-private collaboration which identifies critical cyber priorities, sets goals and objectives for each, and identifies corresponding milestones and metrics for those objectives so that they can be resourced, tracked, and improved over time. It is also important to systematically collect and share statistically significant malicious cyber activity data on the national and global scale. Moreover, we need to build the capability to quickly connect the dots among disparate databases to get a true picture of which instances of criminality are connected to each other, to which malicious actors, and to which enablers. However, first and foremost none of these and many other points is actively and openly debated among government or private industry organizations, nor is the fact that current means of law enforcement have proven insufficient, specifically because they tend to be reactive instead of proactive; they investigate after the fact instead of preventing the criminal attack. We must recognize that more of the same will not change this reality. The complexity of cyber risk must be addressed strategically and proactively by an alliance of business and government stakeholders, including, but not limited to, law enforcement because no single effort or initiative will eliminate the cyber threat.

References
Archick, K. (2006) "Cybercrime: The Council of Europe Convention", Report, U.S. Congressional Research Service.

Bendrath, R. (2001) "The Cyberwar Debate: Perception and Politics in US Critical Infrastructure Protection", Information & Security, 7, 80-103.

Bendrath, R. (2010) "The American Cyber-Angst and the Real World - Any Link?", 49-72 in: Latham, R. (Ed.) "Bombs and Bandwith: The Emerging Relationship between IT and Security", The New Press, New York.

Brunner, E.; Michalkova, A.; Suter, M.; Cavelty, M. D. (2009) "Cybersecurity - Recent Strategies and Policies: An analysis", Focal Report 3: Critical Infrastructure Protection, Center for Security Studies, ETH Zurich, Zurich.

Cavelty, M.D.; Stuter, M. (2009) "Public-Private Partnerships are no silver bullet: An expanded governance model for critical infrastructure protection", International Journal of Critical Infrastucture Protection, 2, 4, 179-187.

CMCS - Center for Media & Communication Studies (2010) "Cyber Security: Participants' reflection on workshop themes", 7-8/6, Budapest, Hungary.

Clark, D. (2010) "Characterizing cyberspace: past, present and future", v1.2., MIT CSAIL, Cambridge, MA.

Culkier, K. N.; Mayer-Schönberger, V.; Branscomb, L. M. (2005) „Ensuring (and Insuring?) Critical Inormation Infrastructure Protection", RWP05-055, Harvard Kennedy School, Cambridge, MA.

Deibert, R.; Rohozinski, R. (2010) "Risking Security: The policies and paradoxes of cyberspace security," International Political Sociology, 4, 1, 15-32. Pages 15 - 32,

Demchak, C. (2010) "Conflicting Policy Presumptions about Cybersecurity", Atlantic Council of the United States, Washington, D.C.

Denning, D. E. (1999) Information Warfare and Security", Addison-Wesley, Boston.

Dlamini, M. T.; Eloff, J. H. P.; Eloff, M. M. (2008) "Information security: The moving target", doi.10.1016/j.cose.2008.11.007

Dunn, Myriam; Suter, M. (2009) "Public-Private Partnerships Are No Silver Bullet", CRN Reports, Center for Security Studies, ETH Zürich, Zürich.

Eggers, W. D. (2005) "Government 2.0", Rowman Littlefield.

ENISA (2009) "Analysis of Member States' policies and regulations".

Erikkson, J.; Giacomello, G. (2006) "The Information Revolution, Security, and Internationl Relations: (IR)relevant Theory?", International Political Science Review, 27, 221-244.

Eriksson, J.; Giacomello, G. (Ed.) (2007) "International Relations and Security in the Digital Age", Routledge.

Ferwerda, J.; Choucri, N.; Madnick, S. (2010) „Institutional Foundations for Cyber Security: Current Responses and New Challenges", Minerva Working Paper Series, -Draft-, 2009-03, CISL, MIT Sloan School of Management, Cambridge, MA.

GAO - Government Accountability Office (2005) "Information Security: Emerging Cybersecurity Issues Threaten Federal Information Systems", Report, GAO-05-31.

GAO - Government Accountability Office (2009) "Information Security: Cyber Threats and Vulnerabilities Place Federal Systems at Risk", Report, GAO-09-661T.

German Federal Ministry of the Interior (2009) "CIP Implementation Plan".

Ghose, A.; Gal-Or, E. (2004) "The Economic Incentives for Sharing Security Information", URL http://ssrn.com/abstract=629282 or doi:10.2139/ssrn.629282

Gordon, L.A., Loeb, M.P., Lucyshyn, W. & Richardson, R. (2007). "2006 CSI/FBI Computer Crime And Security Survey," Computer Security Institute Publication.

Hansen, L.; Nissenbaum; H. (2009) "Digital Disaster, Cyber Security and the Copenhagen School", International Studies Quarterly, 53, 4, 1155-1175.

Hathaway, M. E. (2009) "Strategic Advantage: Why America Should Care about Cybersecurity", Belfer Center for Science and International Affairs.

Hathaway, M. E. (2010) "Toward a Closer Digital Alliance", SAIS Review, 30, 2, 21-31.

Hawkins, S.; Yen, D. C.; Chou, D. C. (2000) "Awareness and challenges of Internet security", Information Management & Computer Security, 8, 3, 131-143.6, 2, 523-541.

Hosein, Ian. (2008) Creating Conventions: Technology Policy and International Cooperation in Criminal Matters. In Governing Global Electronic Networks, edited by
William J. Drake and ErnestJ. Wilson III. Cambridge, MA: MIT Press.

Knake, R. K. (2010) "Internet Governace in an Age of Cyber Insecurity", CFR Council Special Reports, 56, Council on Foreign Relation, New York.

Kramer, F. D.; Starr, S. H.; Wentz, L. K. (Ed.) (2010) "Cyberpower and National Security", NDU Press.

Lewis, J. A. (2010) "The Cyber War has not Begun", Center for Strategic & International Studies.

Libicki, M. C. (2009) "Cyberdeterrence and Cyberwar", RAND, Santa Monica, CA.
Nissenbaum, H. (2005) Where Computer Security meets National Security", Ethics and Information Technology, 7, 2, 61-73.

Nye, J. S. Jr. (2010) "Cyber Power", Belfer Center for Science and International Affairs, Harvard Kennedy School, Cambridge, MA.

Paget, F. (2009) "Cybercrime and Hacktivism", Whitepaper, McAfee.

Peritz, A. J.; Sechrist, M. (2010) "Protecting Cyberspace and the US National Interest", Belfer Center for Science and International Affairs, Harvard Kennedy School, Cambridge, MA.

Rintakoski, K.; Autti, M. (2009) "Comprehensive Approach", Seminar publication, Crisis Management Initiative, Ministry of Defense, Finland.

Roberts, S. (2003) "Critical Infrastructure Protection and Homeland Security", Perspectives on Preparedness Report, 15, Belfer Center for Science and International Affairs, Harvard Kennedy School, Cambridge, MA.

SDA - Security Defense Agenda (2008) "Assessing the Cyber Security Threat", SDA Monthly Roundtable, Brussles.

SDA - Security Defense Agenda (2010) "Cyber Security: A Transatlantic Perspective", SDA Evening Debate Report, Brussels, 3/22.

Shackelford, S J (2009) "From Nuclear War to Net War: Analogizing Cyber Attacks in International Law", Berkley Journal of International Law (BJIL), 25, 3, URL: http://ssrn.com/abstract=1396375

Shackelford, S. J. (2010) State Responsibility for Cyber Attacks: Competing Standards for a Growing Problem (January 12, 2010). Proceedings of the NATO CCD COE Conference on Cyber Conflict held in Tallinn, Estonia July 15-18, 2010. URL: http://ssrn.com/abstract=1535351

Sheffi, Y. (2005) "The Resilient Enterprise", MIT Press, Cambridge, MA.

Starr, S. H. (2010) "Toward a Preliminary Theory of Cyberpower", 43-88 in Kramer, F.

D.; Starr, S. H.; Wentz, L. K. (Ed.) (2010) "Cyberpower and National Security", NDU Press.

Talib, S.; Clarke, N.L.; Furnell, S.M. (2010) "An Analysis of Information Security Awareness within home and work environment", ARES '10 International Conference , 15-17/2, Krakow, 196-203.

Tikk, E. "Global Cyber Security - Thinking About The Niche for NATO", SAIS Review , 30, 2,105-119

The White House. (2009a). Cyberspace policy review: assuring a trusted and resilient information and communications infrastructure. Retrieved on September 23, 2009

Ottis, R., Lorents, P. (2010) Cyberspace: Definition and Implications. In Proceedings of the 5th International Conference on Information Warfare and Security, Dayton, OH, US, 8-9 April. Reading: Academic Publishing Limited, 267-270.

Van Eten, M.; Bauer, J. M. (2009) "Emerging Threats to Internet Security: Incentives, Externalities and Policy Implications", Journal of Contingencies and Crisis Management, 17, 4, 221-232.

Wilson, C. (2010) "Cyber Crime", 415-436 in: Kramer, F. D.; Starr, S. H.; Wentz, L. K. (Ed.) (2010) "Cyberpower and National Security", NDU Press.

Whitman, M. E. (2003). Enemy at the gate: threats to information security. Communications of the ACM, 46, 8.

November 30, 2010

Sinan Aral on contagion


I am pleased to announce a presentation by Sinan Aral of NYU in the
Boston-Cambridge Colloquium on Complexity and Social Networks. This
research is of particular interest in its application of a field
experimental design to study social contagion.

As usual, a light lunch will be served.


Creating Social Contagion through Viral Product Design: A Randomized
Trial of Peer Influence in Networks

Sinan Aral (NYU)
@ Center for Complex Network Research (fifth floor, Dana Building, 110
Forsyth Street)
12:30-2:00, November 30th


Abstract:
We examine how firms can create word-of-mouth peer influence and social
contagion by designing viral features into their products and marketing
campaigns. Word-of-mouth (WOM) is generally considered to be more effective
at promoting product contagion when it is personalized and active.
Unfortunately, the relative effectiveness of different viral features has
not been quantified, nor has their effectiveness been definitely
established, largely because of difficulties surrounding econometric
identification of endogenous peer effects. We therefore designed a
randomized field experiment on a popular social networking website to test
the effectiveness of a range of viral messaging capabilities in creating
peer influence and social contagion among the 1.4 million friends of 9,687
experimental users. Overall, we find that viral product design features can
indeed generate econometrically identifiable peer influence and social
contagion effects. More surprisingly, we find that passive-broadcast viral
messaging generates a 246% increase in local peer influence and social
contagion effects, while adding active-personalized viral messaging only
generates an additional 98% increase in contagion. Although
active-personalized messaging is more effective in encouraging adoption per
message and is correlated with more user engagement and sustained product
use, passive-broadcast messaging is used more often enough to eclipse those
benefits, generating more total peer adoption in the network. In addition to
estimating the effects of viral product design on social contagion and
product diffusion, our work also provides a model for how randomized trials can
be used to identify peer influence effects in networks.

November 2, 2010

Twitter election mood

What is the shape of political discussion at this very moment? It is, of course, impossible to know exactly what people are talking about real time over the kitchen table, but Twitter does offer the possibility to eavesdrop on a particular slice of discussion. What we (Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Yu-Ru Lin, Jukka-Pekka Onnela, and J. Niels Rosenquist) have done is pick out a few keywords that can be spatially mapped. We have posted an interactive version for various issue areas that allows you to go up to the minute (or at least the last five minutes) with what words are popping up on Twitter and where they are popping up. This needs to be taken with appropriately large grain of salt, but it is illustrative of the possibilities that today's communication technologies offer in understanding truly systemic level things even in real time.


JOBS

TAXES

OBAMA

November 1, 2010

The issue meter

Below is a somewhat different take on the same word data. In particular, we look at how often certain issues are brought up on particular websites. What is most striking to me, actually, is that Democrats and Republicans are generally talking about the same issues, with a few exceptions, e.g., Republicans are talking more about taxes, immigration, and Obama.

Click here if you want to go the full, interactive version with House, Senate, and gubernatorial races.


October 31, 2010

Campaign visualizations: the a moving picture of the national conversation

I've been working with postdoctoral fellow of mine at Northeastern and IQSS, Yu-ru Lin, on visualizations that capture campaign 2010. Over the next couple of days we will be posting some of the visualizations on the blog. The first visualization is a dynamic word cloud based on daily snapshots of all Democratic and Republican campaign websites in October. So, for example, the words for the home pages of all Democratic candidates for the House were pooled together, and for each day, a word cloud was created, where words were sized based on their frequency (certain functional words were omitted, and word counts were normalized so no one website could dominate the count). This process was repeated for Republicans in the House, and for both parties in Senate and gubernatorial races. Below we show the dynamics for the Republican and Democratic websites. For the full set of 6 graphics, with interactivity, we have set up a dedicated website.

A brief perusal suggests some interesting contrasts. You can see jobs in both websites, but more prominently for Democrats, and tax and spending are a lot more visible for Republicans. America is big for Republicans, and education for Democrats. Democrats talk more about veterans and security, and Republicans about business. Republicans use "Republican" a lot, and Democrats "Democrat" very little. Notably missing are: Iraq, Afghanistan, health, and Obama. (For health, there is an interesting contrast with Senate campaign websites, where both parties feature health very prominently.)


July 8, 2010

The arrest of a suspect in the "Grim Sleeper" killings

Four years ago, Frederick Bieber, Charles Brenner and I wrote a paper in Science on the feasibility of "familial searching" of offender DNA databases for leads. Familial searching utilizes the known statistical correlations in the genetic profiles of close relatives to produce investigative leads. I followed this up with a Taubman Center "policy brief" on the ethical and practical implementation issues of a familial search policy. Yesterday familial searching produced a striking breakthrough in LA's notorious "Grim Sleeper" serial murder case: the arrest of Lonnie David Franklin Jr. because the DNA from a piece of pizza he discarded while he was under police surveillance matched the DNA from the Grim Sleeper crime scenes.

The reason why the police had placed Franklin under surveillance? Because DNA from his son, convicted of a qualifying offense, had produced a "familial match" to the crime scene profile. This match, in turn, led to a frantic search through the son's family tree, and ultimately to the surveillance of Franklin.

Without getting too deeply into the broader ethical/policy issues here, the essential policy conundrum is that familial searching is potentially effective, but de facto incorporates millions of individuals who have not even been suspected of any crime into the offender database. Only two states allow familial searching currently (California and Colorado), but the Grim Sleeper case offers a dramatic ethical boundary case in the ongoing policy battle. The non-hypothetical question proponents of familial searching can now offer: can one justify not using familial searching in an investigation of a serial murder case where there is ongoing danger to the public?


March 12, 2010

Cell Phone Data Collection - New Experiments and Existing Resources

I've been asked a lot by many different researchers how they can get their hands on behavioral data logging programs that work on cell phones, such as in Nathan Eagle and Sandy Pentland's landmark Reality Mining study. That study was back in 2004, and they were using old Nokia phones with the Symbian OS, which presented a host of problems. Below I'll go through the currently available data logging applications for phones, and I'll describe a new system being built on top of Android that will allow for an incredibly enhanced platform for social scientists. All of these applications log Bluetooth proximity information, call logs, and cell tower IDs, but some log additional information such as WiFi access points, SMS messages, and accelerometer data. Here are many of the dominant data logging applications available today:

Nokia
Only 6600 phones are officially supported, but the Context Group at the University of Helsinki has developed a number of behavior logging applications for these phones, available for download here (use mitv2).

iPhone
The iPhone is nice because a lot of people have them, but it's a poor choice for data logging because it does not allow processes to run in the background. This means you have to have jailbroken iPhones to run these applications, and it also means you can't offer them for download on the official app store. Anmol Madan from our group has made an iPhone app available for download here, and he also wrote a short tutorial on how to get this application running. Your iPhones have to have older versions of the firmware, however, and it doesn't work with the new 3G iPhones.

Windows Mobile
This is still a widely used phone OS, and Anmol has written a fairly robust data logging application that eclipses all of the previous versions in functionality with WiFi access point logging, survey launcher, and automatic updating tool. He hasn't made it available for download quite yet, but it should be appearing in the next few weeks on the Human Dynamics Social Evolution website. Unfortunately, this version will not be useful for new phones in a few months because Microsoft is releasing Windows Mobile 7.0, which is not compatible with the old 6.x version that this application is written for.

Android
Now the good news: Android phones are becoming increasingly popular and will most likely eclipse all other platforms as the dominant phone OS. Almost every cell phone manufacturer is producing Android phones and with a unified and unrestricted app store there is an opportunity to easily reach millions of people after a short development period. Android also allows easily for automatic updates.

Nadav Aharony from our group is spearheading the project for creating an Android data logging application, and he has already deployed it on over 50 phones in a new study of consumption patterns among family groups (rather than the normal college students in dorms study). This application logs most of the usual suspects (Bluetooth, WiFi access points, call logs), but it also hashes the contents of text messages, allowing researchers to see not just who texts who, but get an idea about how topics spread (not the actual content, since the words are hashed, but just that topic A passed from person 1 to person 2). Actually this application has been running on my phone for over a month with no real problems. The platform also comes with a special app store that allows researchers to log what applications people install, allowing you to look at how application usage spreads among friends. Soon Nadav is also planning to allow researchers to deploy their own apps over this new app store so that researchers can push surveys or more sophisticated logging tools to study participants. Instead of paying for apps, though, users will get paid to download apps so that they will participate (sort of like Mechanical Turk).

The Android Reality Mining platform promises to be extremely powerful, and the results from the current study should further push the boundaries of computational social science.

January 18, 2010

The Obama network in 2010

Last year I posed the question about what would come of the Obama network. Would it become a force for policy and political change, for example? This is the great undercovered political story of the last year. Techpresident-- a wonderful blog on technology and politics-- just issued an fascinating report on Organizing for America, the institutionalization of the organization/network/technology side of the Obama campaign. The story it paints is a mixed one-- successful at keeping a nontrivial number of people engaged in an ongoing fashion (although this number is surely tiny relative to the number mobilized in the 2008 election); unsuccessful at mobilizing people toward policy goals (i.e., around health care reform).

Tomorrow, the special election for Senate in Massachusetts offers an interesting test of OFA. Clearly, OFA is all in-- with their e-mails, the use of volunteer phonebanks, etc. Massachusetts must have among the highest density of Obama contributors and volunteers in the country, many battle hardened (from work in NH in 08, not MA, of course).

But the Internet tide has shifted. Republican Scott Brown is the beneficiary of vastly more Internet-based support in the special election than Democrat Martha Coakley, and the question tomorrow (and beyond) is: to what extent did the 2008 election reflect the marriage of the medium and a man with a particular moment? Does the Internet just enable the bottom up mobilization of the passions of the moment, or does it also enable the institutionalization of mobilization?

We won't get the conclusive answer tomorrow, but the tea (party?) leaf I will be reading most closely is the extent of the mobilization of Obama supporters tomorrow. Obama still has reasonably strong support in Massachusetts-- do they turn out? And the less visible story will be-- how many volunteers did OFA mobilize to make calls, to contribute to Coakley, and so on. (On this last point, at least we can look at FEC filings in a few months to see the overlap between 2008 Obama contributors and Coakley contributors January 10 to 18.)

November 20, 2009

Wall Street Journal article: Science as a Team Sport

Today I was quoted in the Wall Street Journal in an article on scientific collaborations by Robert Lee Hotz. (They also did a short video accompanying the online edition, in which I talk a little bit more about the advantages and problems of collaboration.) Here's an excerpt of the piece:

Once a mostly solitary endeavor, science in the 21st century has become a team sport. Research collaborations are larger, more common, more widely cited and more influential than ever, management studies show. Measured by the number of authors on a published paper, research teams have grown steadily in size and number every year since World War II.

To gauge the rise of team science, management experts at Northwestern University recently analyzed 2.1 million U.S. patents filed since 1975 and all of the 19.9 million research papers archived in the Institute for Scientific Information database. "We looked at the recorded universe of all published papers across all fields, and we found that all fields were moving heavily toward teamwork," says Northwestern business sociologist Brian Uzzi.

As research projects grow more complicated, management becomes a variable in every experiment. "You can't do it alone," says research management analyst Maria Binz-Scharf at City College of New York. "The question is how you put it all together."

The key is bringing the people together in the first place, which has sped technological advancements that often benefited the rest of us. The ease of global business and social networking today owes much to the World Wide Web, which was designed to aid information-sharing between scientists. It was invented at the European Organization for Nuclear Research (CERN), the home of the Large Hadron Collider.

New online science management experiments are underway. Last year, the National Science Foundation started a $50 million project to map all plant biology research, from the level of molecules to organisms to entire ecosystems, so scientists can swoop through shared data as if they were using Google Earth. Last month, U.S. computer experts launched a $12 million federal project to create a national biomedical network called VIVOweb to encourage collaborations.

Scientists are experimenting with the new technology of teamwork even in mathematics, where researchers customarily work alone.

This is such an exciting area of research. Together with Leslie Paik and Avrom Caplan (both from the City College of New York), I will be devoting a good part of the next three years to study how scientists collaborate, especially how the collaborative production of scientific knowledge changes as collaborations are increasingly virtual. This work is supported by the NSF (see here for the project abstract and here for the CCNY press release).

October 26, 2009

Papers on online deliberative field experiments

There might be some interest in the scholarly papers undergirding some of the research in the aforementioned report. Below we list some of the papers from the online deliberative field experiments that we posted on SSRN.


Who Wants to Deliberate - and Why?

Michael A. Neblo
Ohio State University - Department of Political Science

Kevin M. Esterling
University of California, Riverside - Department of Political Science

Ryan Kennedy
University of Houston - Department of Political Science

David Lazer
Northeastern University - Department of Political Science; Harvard University - John F. Kennedy School of Government

Anand E. Sokhey
University of Colorado at Boulder - Department of Political Science

Interest in deliberative theories of democracy has grown tremendously among political theorists over the last twenty years. Many scholars in political behavior, however, are skeptical that it is a practically viable theory, even on its own terms. They argue (inter alia) that most people dislike politics, and that deliberative initiatives would amount to a paternalistic imposition. Using two large, representative samples investigating people's hypothetical willingness to deliberate and their actual behavior in response to a real invitation to deliberate with their member of Congress, we find: 1) that willingness to deliberate in the U.S. is much more widespread than expected; and 2) that it is precisely people who are less likely to participate in traditional partisan politics who are most interested in deliberative participation. They are attracted to such participation as a partial alternative to "politics as usual."


Means, Motive, & Opportunity in Becoming Informed About Politics: A Deliberative Field Experiment with Members of Congress and Their Constituents

Kevin M. Esterling
University of California, Riverside - Department of Political Science

Michael A. Neblo
Ohio State University - Department of Political Science

David Lazer
Northeastern University - Department of Political Science; Harvard University - John F. Kennedy School of Government

Survey research on political knowledge typically measures citizens' ability to recall political information on the spot, and in these surveys most citizens appear appallingly ignorant. Deliberative theorists emphasize, however, that citizens' capacity to become informed when given a motive and opportunity to participate in politics is equally important for democratic accountability. We assess this capacity among citizens using two deliberative field experiments. In the summer of 2006 we conducted a field experiment in which we recruited twelve current members of the U.S. Congress to discuss immigration policy with randomly drawn small groups of their constituents. In the summer of 2008, we conducted a similar experiment using a large group of constituents interacting with Senator Carl Levin of Michigan on detainee policy. Using an innovative statistical method to identify average treatment effects from field experiments, we find that constituents demonstrate a strong capacity to become informed in response to this opportunity. The primary mechanism for knowledge gains is subjects' increased attention to policy outside the context of the experiment. This capacity to become informed seems to be spread widely throughout the population, in that it is unrelated to prior political knowledge.


Estimating Treatment Effects in the Presence of Noncompliance and Nonresponse: The Generalized Endogenous Treatment Model

Kevin M. Esterling
University of California, Riverside - Department of Political Science

Michael A. Neblo
Ohio State University - Department of Political Science

David Lazer
Northeastern University - Department of Political Science; Harvard University - John F. Kennedy School of Government

If ignored, non-compliance with a treatment and nonresponse on outcome measures can bias estimates of treatment effects in a randomized experiment. To identify treatment effects in the case where compliance and response are conditioned on subjects' unobserved compliance type, we propose the parametric generalized endogenous treatment (GET) model. GET incorporates behavioral responses within an experiment to measure each subjects' latent compliance type, and identifies causal effects via principal stratification. We use Monte Carlo methods to show GET has a lower MSE for treatment effect estimates than existing approaches to principal stratification that impute, rather than measure, compliance type for subjects assigned to the control. In an application, we use data from a recent field experiment to assess whether exposure to a deliberative session with their member of Congress changes constituents' levels of internal and external efficacy. Since it conditions on subjects' latent compliance type, GET is able to test whether exposure to the treatment is ignorable after balancing on observed covariates via matching methods. We show that internally efficacious subjects disproportionately select into the deliberative sessions, and that matching does not break the latent dependence between treatment compliance and outcome. The results suggest that exposure to the deliberative sessions improves external, but not internal, efficacy.

October 21, 2009

Responsive Buildings and Social Networks

In a blog entry about a year ago I talked about using sensor data to change architecture into a changeable force for altering social networks. I recently analyzed the office layout data from 4 of our sociometric badge studies, and found that the probability of interaction between two people degraded greatly as the distance between their desks (as well as the physical barriers such as walls) increased.

I got fairly intrigued by the idea of dynamically modifying office layout to help deal with this situation, and recently for the Media Lab fall sponsor week my UROP Alex Speltz and I built a prototype of an augmented cubicle wall that changes based on the social context. Here's a picture:

The wall is a little over 2 meters tall and made of two plexiglass sheets with a wood frame. Inside the plexiglass sheets are window blinds that can be raised and lowered by an actuator mounted on the bottom of the wall.

The idea is that by detecting the stage of work for a worker (exploring vs. exploiting) we can determine if they need more face-to-face interaction or less for a certain period of time (probably at least a week). If someone needs to talk more with people around them, at night the actuator will pull down the blinds to create a window, making serendipitous interaction easier. If, on the other hand, the person is more in an exploit mode and needs to sit at their desk and work, the blinds are pulled up at night, and when they come in the next day it will give them more privacy. People can also specify their interaction preferences through a web-based system that my other UROPs Tim Kaler, Ernie Park, and Margaret Ding made, which allows us to further tailor the system output.

It's important to imagine an entire office outfitted with these, so if you knew that two groups were starting to work on a project together the barriers between those groups would disappear, while if someone was monopolizing the time of another group the barriers between them would increase. In effect, the augmented cubicle would become a social signal for availability. While people can control the blinds manually, in practice people stick with the defaults (I pulled down the blinds in my office two months ago and haven't gotten around to pulling them back up).

We're planning on deploying this in a real organization in the next few months as the design gets finalized to see if we can have a positive effect on the work environment, as well as productivity and job satisfaction. We're also currently making a demonstration video of the wall, which I'll post here as soon as it's ready.

July 14, 2009

The complexity of Government 2.0

In today's post, I would like to address three issues related to Government 2.0: transparency, citizenship and agenda hi-jacking.

First, while we read a lot about transparency, it is easier said than done. For example, transparency levels may be highly dependent on the government context and its potential (unintended) impact on either discloser or public behavior--whether citizens or corporations. Second, when participation is emphasized--whether online of offline--, we need to revisit our understanding of citizenship today and in the future. Thirdly, political agendas/policies may be "hi-jacked" by bottom-up Internet-based approaches of proposing alternatives which also relates back to the question of citizenship and legitimacy.

Government 2.0 is the flavor of the year. Other terms now being introduced are WikiGovernment, Collaborative Government, Information Government or the U.S. administration's Open Government. While the terms might differ and the authors that introduce them slightly vary in their description and priorities, all of them intend to convey the same ideas: participation, collaboration, transparency and technology jointly allow for a new form of government and governance. Certain things are here to stay; others will pass out of fashion quickly.

The following quotes may illustrate my concerns:

A memo released by the White House, called federal agency heads to "upgrading the capacity of regulatory agencies for using the Internet to become more open, efficient, and responsive". The National Performance Review (NPR) recommended to "[u]se information technology and other techniques to increase opportunities for early, frequent, and interactive public participation during the rulemaking process and to increase program evaluation efforts."

This sounds familiar. However, the White House memo dates back to Dec 17, 1999 and NPR's recommendation back to September 1993. Therefore, policies that connect openness and responsiveness to the potential of technology have been around for over 40 years in government. Some think that eGovernment is dead. But its ideas are quite alive; especially thoughts on eDemocracy seem to finally become reality. eGovernment (the internal/external use of technology in government) does not contradict Government 2.0 anyways. On a 50.000 foot level the use of social media in government is the use of technology.The envisioned transformation requires patience and long-term support from policy makers because government is a complex ecosystem which is resilient to change.

Along these lines, I recently read an interesting blog post (Steve Radick) which reminded me of a post I contributed to this blog (why government is ahead in web 2.0 in 2008.

Of course we should not let the past constrain our vision about the future. Yet, the past may prevent us from being overly optimistic or in other words, overly disappointed when all things envisioned don't become reality.

Transparency
The Obama administration's agenda on transparency (the latest move was making information on government IT spendingavailable) is amazing but these policies as a form of regulation are not new to government. For an overview of transparency initiatives and regulations visit freedominfo.org, wobsite.be or Wikipedia. The European Commission also introduced a directive on the re-use of public sector information in 2003. Unfortunately, it is difficult to get a full overview and understanding of the level of progress of the latter in EU Member States. Consequently, it should openly be discussed how the level of transparency of a government or any of its agencies can be measured.

While Vivek Kundra agrees in principle that all public government data should be online, he also cautions that the reality is government data sits in more than 10,000 different systems, many of them written in old programing languages or are still locked in dusty paper archives. Accordingly, eGovernment is not dead. Without the appropriate infrastructure (interoperability standards, electronic records management, enterprise architecture) projects such as data.gov can only achieve parts of their true potential.

In general, for transparency we have two primary actors: the discloser and the user. There are many ways for discloser to provide less than complete information or hide important information by providing excessive amounts of information. Placing data in the public domain does not guarantee that it will be used or used in the intended way. Data may be ignored, approached with indifference, misunderstood or misused. For example, data may make it easier for special interest groups to lobby for their own interests. Transparency activities are complex and need full commitment of a government body.

Finally, government and politics are based on the type and flow of information. Transparency policies, social media and the influx of "believers of openness" in government have slightly altered the process. That may have two effects.

On the one hand, it has become more difficult to contain information. At the same time the need to monitor the "global thought stream" is increasing to be able to proactively react to emerging "crisis". These continue to be defined by traditional media (tv, radio, press) once they declare some Internet trend "news" (Note the change: Digital collective action can quickly lead to more media coverage; past: media leading to collective action).

On the other hand, transparency and social media could lead to even tighter confidentiality protocols and altered behaviour of elected officials. "Negative" media coverage/spin continues to be "sunlight" which government tries to avoid at all costs. A recent episode of "The Daily Show" provides a case in point.

The Daily Show With Jon StewartMon - Thurs 11p / 10c
Cheney Predacted
www.thedailyshow.com
Daily Show
Full Episodes
Political HumorSpinal Tap Performance

Mainstream media also like to quote twitter messages of U.S. members of Congress and adding their spin to 140 character thoughts. Some of the early adopters still offer unique commentary. How long will this be he case?

Citizenship and Participation
Despite all the anti-American sentiment around the globe, the Obama administration has remarkably managed to export its open government policy around the globe. It spread virally through the Internet. Inspired by U.S. and UK based initiatives, individuals (early adopters) in other countries have started applying these initiatives to their national context (mostly exact copies) or supporting calls for government action ("democratization"). Numerous "experts" are presenting (mostly the same) ideas and good practice cases to government officials. Many of those officials are still struggling with the topic. For example, many are still wondering about the best way of "eParticipation" which is the current buzz.

However, there is an underlying question we need to answer that is far more complex and fundamental than eParticipation:

How do we define citizenship in an era of Government 2.0?

This requires a return to political theorists such as Aristotle, John Rawls and Jurgen Habermas as well as multi-disciplinary deliberation of what we would like citizenship to be. Because in the near future, every established form of decision making--especially on the political level--will experience collective action based on the increase of expressive capability of the Internet (Everyone can claim for a democratization of "something" pointing to the potential use of social media). In addition, the digital divide between those who are offline, those who are online and those who "live" online ("Netcitizen") continues to exist.

Similar to transparency, the opportunity to participate may simply be neglected until a true need arises. An average worker might only have 2-3 hours available per day to engage in participatory action which are competing with many other leisure activities. Consequently, there is also the issue of legitimacy of those participatory actions that were either offered by government or started by citizens.

Agenda hi-jacking
To prove my last point, I would like to draw on a current example from Europe. In November 2009, the EU Ministerial eGovernment Conference will take place in Malmoe, Sweden. It is planned to present a ministerial declaration on eGovernment in the EU for the next seven years. This declaration will be the result of back-room dealings between EU Member States (MS).

However, this year a group of people led by two companies decided to use a
social media facilitated bottom-up approach to create a declaration
alongside the official one in Malmoe for eGovernment 2015 It is also their goal to get official endorsement of their version from the European Commission. As the content of the platform is openly accessibly, ideas might even find their way into the official document. The group's motivation is probably a mix of self-marketing, fascination for social media and spirit to influence policy making.

So far, 75 individuals participated in the activity. It will be interesting to see how many people will sign the declaration. It will also be interesting to see whether and when the media will pick-up the story of alternative agenda and how much pressure this will exert on policy makers. Considering the total population of 500 Mio EU citizens, legitimacy of this initiative is questionable.

Nevertheless, the EU is at a crossroads: If it does not open up more, it will further strip itself of legitimacy. Gov 2.0 type activities provide one avenue to strengthen the EU and its institutions.

Finally, with regards to research, I see two issues. First, old and new research from various disciplines relating to Government 20 is not connected. Second, researchers can hardly keep pace with the current output of Government 2.0 policies and projects being implemented.

June 19, 2009

Reality Mining Workshop at AOM

I wanted let everyone know about a workshop at this year's Academy of Management Conference that I'm organizing with Lynn Wu. I've posted the call for participation below. Hope to see you there!

Reality Mining Workshop at the 2009 Academy of Management Annual Meeting
Saturday, August 8, 8 - 10 AM

In the last decade sensors have become cheaper, faster, and more ubiquitous, enabling automatic collection of data at the millisecond-level time scale in a technique called Reality Mining. The Reality Mining workshop will focus on discussing what new management paradigms can be enabled with this technique, as well as how researchers can immediately use sensing tools to augment their research.

To give participants a better `feel' for the technology and its potential usefulness, we will arrange for participants to have the option of wearing Sociometric Badges: name badges with electronics that continuously measure face-to-face interaction parameters (e.g., who is talking, who is nearby).

Reality Mining research was described by the International Conference on Information Systems 2008 awards committee as "opening a new area of Management Information Systems research." This has generated a large volume of interest in Reality Mining techniques, which is only expected to build as the technology behind this methodology matures. Come and learn about this groundbreaking new research methodology.

To express your interest in participating, please e-mail the organizers at reality-workshop@media.mit.edu.

Organizers:
Benjamin N. Waber, MIT Media Laboratory
Lynn Wu, MIT Sloan School of Management

Confirmed Discussants:
Sinan Aral, NYU
Erik Brynjolfsson, MIT
Peter Gloor, MIT
David Lazer, Harvard
Alex "Sandy" Pentland, MIT

Workshop website: http://web.media.mit.edu/~bwaber/aom_workshop/

June 18, 2009

Talk: Impact of Social Media on You

In line with David Gibson's recent post I would like to recommend watching the following video from a talk of Clay Shirky (NYU) at the U.S. Department of State on June 17th. Its a great summary for government and enterprise executives to better understand the issue and impact of the Internet, social media and the emerging networking society on their organizations/work.

There is a follow up interview with Shirky on the emerging events in Iran and the role of social media.

Source: World Bank

June 16, 2009

When social networking matters more than social networks

Yet again, social networking platforms seem to be playing a critical role in enabling social unrest--now in Iran. Some of us in the network analysis community are probably ambivalent, given all the trouble we go to in reminding people that there were social networks before the internet. Yet it seems that technology is making all the difference. Also troubling--someone tell me if I'm wrong--is the fact that network position, as traditionally conceived, doesn't seem so important when anyone at all can subscribe to an online information source, and when information that fails to reach you through one channel will probably find its way to you through another.

One can study online networking incrementally, by asking how people use the internet to service social ties and perhaps expand their number and reach. But the case of Iran, and before that Moldava, suggests that our baseline assumption, that people are not tied unless we find strong evidence to the contrary (e.g., socializing), might have to be turned on its head. It's not obvious to me that traditional social network analysis will take us very far in understanding such situations, and social movement perspectives might do only slightly better. At the risk of seeming too excitable, we might be witnessing a social discontinuity comparable to the Industrial Revolution, and equally demanding of new theories. That should be exciting to social scientists, but given that we're still puzzling through the French Revolution one wonders how long it'll take us to get our act together.

May 22, 2009

ISPRAT 1st international government CIO knowledge exchange

I just came back from the three-day event (5/18-20) "ISPRAT 1st International Government CIO Knoweldge Exchange" in Washington, D.C. ISPRAT is a non-profit think tank based in Germany. The think tank's scope is on technology and innovation/trends in government and bridging the gap between disciplines. Thus ISPRAT's members come from industry, academia and government. Usually it organizes government CIO summits and government related studies in Germany. The U.S. event brought its activities to a new level. The underlying idea was to bring German/EU and U.S. government CIOs together to exchange ideas/experiences on current challenges and trends.

The first day was spent at CSC, Falls Church, VA, talking about identity management (linked to post @Shaping Network Society inspired by the movie "Beyond the shadow of a doubt"), privacy, trust and enterprise architecture (case stuides on MITA, IRS and DoE). Many might not be aware of it, but both areas - identity management and enterprise architecture - are fundamental to Government 2.0. A couple of former CIOs joined the discussion and offered their insights on issues such as cross-boundary collaboration: Dan Mintz (DoT), Pat Schambach (DHS) and Mark Kneidinger (NY, VA, DoS).

n28357_36705671_3975384.jpg

The morning of the second day we spent at the White House Conference Center. Officially Supported by GSA and the U.S. CIO Council , we had a couple of acting CIOs present to offer their insights in a roundtable discussion with German, Austrian and Mexican government executives. Unfortunately, Vivek Kundra couldn't come as he had to testify on Information Security on the hill. (Update 5/29: Read the Whitehouse Cyberspace Policy Review).Two take-aways. First, when talking about new collaboration tools, the CIOs admitted that it is quite a challenge to align social media with the existing laws and regulations--some dating back to the 70s--"they can get you fired, put you in jail or burden you with huge fines". Second, data.gov will go live on May 1st--it now is.

4609_655820827641_28357_36705932_2538769_n.jpg

4609_655821321651_28357_36706014_4865010_n.jpg

Using Cisco's telepresence center in Herndon, VA, the group--including gov execs sitting in Germany--exchanged thoughts with the Paul Cosgrave (NYC) and Teri Takai (California). It was the first time I participated in such a"video conference" (Cisco doesn't like that term) and I was amazed. The world really becomes a small place (D.C., L.A., New York, Berlin) and while there are still some minor glitches to it, you quickly emerge in a conversation that feels quite real. For dinner, we had Jackie Patillo, the acting CIO of DoT who was also willing to share her knowledge with the group.

n28357_36705946_3342520.jpg

The final day we spent time at IBM's Institute for eGovernment with an introductory part by Sherry Amos from SAP on the economic stimulus package and transparency. A vivid discussion started and I am curious to see how some ideas will be transferred to Germany/Europe. Many were skeptical about the use of Web 2.0 tools in the coming national election in Germany. Unlike the Obama campaign, Angela Merkel and Frank Walter Steinmeier, the candidates running for chancellor, lack a comparable story and mission. Moreover, a survey among participants conducted on the first day, showed that most perceived the level of transparency in government in Germany as rather poor. Other topics included: cloud computing (Among others, people wondered about: how does this connect the security needs of government?), government 2020 and "smart cities" (everything connected).

n28357_36705976_3062033.jpg

Several people twittered about the event: Ines Mergel (Ines also recently posted some Twitter recommendations), Anke Domscheit, Thomas Langkabel and Philipp Mueller.

We also managed to convince/bring Harald Lemke, the former CIO of the German State of Hesse, to the Twitter community.

n28357_36705967_6440887.jpg

May 3, 2009

Panel Discussion - Machines with eyes and texting spies: The shifting lines of public and private

For those of you in the Boston area, there's an interesting panel next week that I'll be on about privacy and technology. There will be an emphasis on social networks since this is related to the Media Lab's Sociable Media Group's exhibition called Connections. Here's the specifics:

Wednesday, May 6, 6:00 - 7:30 p.m.

Location: MIT Museum, 265 Massachusetts Ave. Cambridge MA. 10 min. walk from Central Sq. or Kendall Sq. T stop
Free Admission. Light Refreshments will be served.

MODERATOR
Jonathan Zittrain, co-founder of the Berkman Center for Internet & Society at Harvard University.

SPEAKERS
Judith Donath - Director of the Sociable Media Group at the MIT Media Lab,
Shava Nerad - Development Director/ former Executive Director of The Tor Project -- providing anonymity online
Aaron Swartz - Founder of watchdog.net and reddit.com - dedicated to openess on the web.
Benjamin Waber- Researcher, Human Dynamics Group at the MIT Media Lab

DESCRIPTION
The exhibition Connections (through Sept. 13) was conceived from research done by the Sociable Media Group at the MIT Media Lab, and includes the dynamic new installation, Metropath(ologies). In a world overflowing with information and non-stop communication this modular and interactive exhibition poses the question of how people's notions of privacy have changed the context of hyper-connected and hyper-aware real and virtual spaces. Where is the line between privacy and transparency in this new age? How have these issues affected the way people live in and think about their communities?

May 2, 2009

Networked governance and the swine flu

Riffing off of Ines' post, there was an interesting piece by David Brooks on governance and the swine flu last Monday. Not quite right, but definitely interesting. And it has many of my favorite words (complex, networks, emergent). Some extended excerpts:

In these post-cold war days, we don't face a single concentrated threat. We face a series of decentralized, transnational threats: jihadi terrorism, a global financial crisis, global warming, energy scarcity, nuclear proliferation and, as we're reminded today, possible health pandemics like swine flu.

These decentralized threats grow out of the widening spread and quickening pace of globalization and are magnified by it. Instant global communication and rapid international travel can sometimes lead to universal, systemic shocks. A bank meltdown or a virus will not stay isolated. They have the potential to hit nearly everywhere at once. They can wreck the key nodes of complex international systems.

So how do we deal with these situations? Do we build centralized global institutions that are strong enough to respond to transnational threats? Or do we rely on diverse and decentralized communities and nation-states?
...
If you apply [the logic of a centralized response] to the swine flu, you could say that the world should beef up the World Health Organization to give it the power to analyze the spread of the disease, decide when and where quarantines are necessary and organize a single global response.
...
Those dangers are all real. Yet, so far, that's not the lesson of this crisis. The response to swine flu suggests that a decentralized approach is best. This crisis is only days old, yet we've already seen a bottom-up, highly aggressive response.

In the first place, the decentralized approach is much faster....

Second, the decentralized approach is more credible. It is a fact of human nature that in times of crisis, people like to feel protected by one of their own....

Finally, the decentralized approach has coped reasonably well with uncertainty....
A single global response would produce a uniform approach. A decentralized response fosters experimentation.

The bottom line is that the swine flu crisis is two emergent problems piled on top of one another. At bottom, there is the dynamic network of the outbreak. It is fueled by complex feedback loops consisting of the virus itself, human mobility to spread it and environmental factors to make it potent. On top, there is the psychology of fear caused by the disease. It emerges from rumors, news reports, Tweets and expert warnings.

The correct response to these dynamic, decentralized, emergent problems is to create dynamic, decentralized, emergent authorities: chains of local officials, state agencies, national governments and international bodies that are as flexible as the problem itself.
Swine flu isn't only a health emergency. It's a test for how we're going to organize the 21st century. Subsidiarity works best.

---

I have seen remarkable evidence of the bottom up response, just in my little corner of the world. I have been deluged by e-mails from Harvard, my kids' schools, my synagogue. Each of these e-mails have updated me on the status of the potential pandemic vis a vis that particular institution, and instructed me on appropriate behavior on my part (stay calm and wash my hands, basically).

The reason why such a response can work well is because there is a reasonably good alignment of individual incentives and global effects. If I wash my hands, it reduces the likelihood that I will get sick, which, in turn, is good for people I know who might get the flu from me were I to get sick. Such an approach works less well if there were no such alignment (e.g., vis a vis CO2 emissions).

However, I would note that these are not distinctively 21st century governance problems-- indeed, human history is littered with pandemics/epidemics that have killed millions (see Spanish Flu, 1918). Of course, epidemics can travel much faster now in the jet age (although jets are not particularly new technologies anymore either). Further, our global authority structures have not changed dramatically in a long time. The idea of a global centralized authority that could respond to the pandemic is nothing that we are going to see any time soon. This is not a governance choice that has ever been on the table. It is inconceivable that there could be an uber World Health Organization that could order my youngest's school shut.

What is different is that we have much more effective tools with which to communicate about these issues. And further, I would guess that we actually have more powerful institutions at the center of the storm (the WHO, the CDC) than we have ever had before. They likely have more resources, and vastly superior mechanisms with which to disseminate information, recommended practices, and to work with local authorities to evaluate whether the flu is present in particular jurisdictions,etc. I would guess that there is actually less variation and experimentation in local practice and more (voluntary) following of systemic authorities than ever before.

Thus, arguably in this case "subsidiarity works." But it is because (1) the incentives of the individual decision makers in the system are reasonably well aligned with global outcomes, (2) there are substantial centralized capacities, and (3) because of current communication technologies, local decision makers (generally) are acting voluntarily in a fashion consistent with the preferences of those central agencies.

May 1, 2009

CDC is fighting the spread of the swine flu with viral technologies

The CDC (Center for Disease Control and Prevention) is using several different social media channels to inform about the swine flu besides the traditional (Web 1.0) channels, such as frequent press briefings, general information in audio and video, etc.:


  1. Updates from the H1N1 page haven an RSS feed.

  2. Frequent updates are spread using Twitter.

  3. Video updates are posted using podcasts.

  4. Image sharing on the CDCs Flickr site.

  5. Buttons for your website.

  6. Information sharing on MySpace's e-health page and daily strength group.

  7. Updates can be shared using several different services (Google Reader, Bookmarks, Delicious, Facebook, Digg, etc.).

  8. e-Cards to send by email to family members and friends to remind people to wash their hands.

  9. Agencies can embed a flu widget on their page.

On the funny side: Do you have swine flu?

March 23, 2009

The social psychology of Facebook, etc.

What is the motivation behind Facebook and other forms of online self-presentation, such as, say, blogging? I posed this question (with respect to Facebook) to my undergraduates. Their answers included a desire for social contact and curiosity about other people (for which, perhaps, self-disclosure is the medium of exchange). Here are some other possibilities:

1. According to Cooley, we see ourselves through the eyes of others, or at least we try hard to. But what others? Whomever we come into contact with, I suppose, for those are the people whose reactions we can gauge. But then online self-presentation poses a challenge, for this is presenting ourselves to people we might not otherwise encounter, and whom we might not ever encounter in person. I conjecture--and perhaps Cooley anticipated this--that we see ourselves through the eyes of whomever we've received responses from in the recent past. Then once a blogger has, perhaps under pressure from a former colleague, presented himself to the blogosphere once and received some responses, he sees himself through the (imagined) eyes of those same people (or at least some typification of that sort of person), and feels answerable to them.

2. Once one has a taste of externalizing one's thoughts and imagining that others care to ponder them, thinking that is not externalized seems kind of pointless, perhaps like singing in the shower after performing in front of a large audience. I've had this experience after reviewing books for journals, of feeling deflated upon then reading a book for no one's benefit but my own. (It passes, unless one feeds the habit by writing Amazon reviews.)

3. Consistent with (2), one acquires the cognitive habit of thinking and experiencing on behalf of an audience, and perhaps of formulating a blog entry as the experience unfolds, so that half the work is done by the time the experience is complete. Whether this diminishes the intensity of the original experience, I won't conjecture. Obviously Twitter takes this to a new extreme.

4. When my students talk about maintaining social contact, I assume they mean contact with high school and college friends, and that a precondition for friendship is, at least in some circles, continuous self-accounting and monitoring of the self-accounts of others. This should probably be distinguished from blogging (or Facebooking) to combat genuine isolation, of the sort that my students are at little risk of but that probably besets folks stranded in the suburbs and beyond. The problem with this formulation is that it portrays online interaction as a last act of desperation, akin to talking to a Wilson soccer ball, whereas it seems that a genuine, if virtual, community readily pops into existence for anyone looking for one. And then who's to say that it's less "real" than a clutch of friends chatting at the coffee shop? As I tell my students: no moral evaluations. No, not even in the footnotes.

March 14, 2009

Sunbelt Update

Many of us have been at the Sunbelt conference for the past two days, and there have been some extremely interesting talks on new research.

Some particularly interesting work is coming out of the United States Military Academy Network Science Center. One of their current projects involves giving out BlackBerries to 35 cadets and continuously logging their location with GPS, e-mail, phone calls, and other data. They are also collecting all other e-mail data and giving these cadets weekly surveys on their networks, so this is shaping up to be a very interesting data set.

Jamie Olsen from the Carnegie Mellon CASOS group also presented some fascinating work looking at shipping traffic patterns using GPS-like sensors and identifying areas of importance using clustering techniques and network measures.

I'm really encouraged to see this uptick in sensor-related research in the social network community, and the great receptions that these and similar presentations received signals the increasing appeal of the reality mining technique.

February 22, 2009

Facebook's Terms of Use and implications for network researchers

The changes of Facebook's Terms of Use were quickly followed by massive protests of thousands of users requesting to abandon those changes. The Consumerist Blog was one of the first to ask their readers to boycott Facebook and look for alternative ways to connect with friends.

About a week after the change, Facebook made the decision to revert back to their original TOS (from Sepember 2008) and now works with their lawyers and legal specialists to come up with an improved version.

For researchers the TOS are critical: not just for understanding how Facebook will use our own data, but we also need to understand how we can use network data to analyze emergent social structures and the way users create, maintain, or abandon their online ties. The current TOS leave us in limbo - not knowing what is allowed and to what extent.

To understand this better and to collect the wisdom of the social network analyst crowd, I recently started a discussion on this topic on the SocNet listserver. I am trying to find arguments that will help to explain my research interests to an Institutional Review Board. The discussion is still going on. A few highlights are:

  • Facebook does not allow research (or anyone) to store data more than 24 hours, which makes it difficult to clean, analyze and of course at the end publish the data
  • Data needs to be anonymous (especially in SNA network data cannot be anonymous - we need to know what kind of actors are nominating other actors and longitudinal data analysis seems to be impossible)
  • So far I have identified three different ways to collect/use Facebook data, although at this point it is unclear how people can comply to the first two bullet points.
1. Bernie Hogan at the Oxford Internet Institute, University of Oxford, UK, has created a Facebook application available on iTunesU to analyze Facebook data (open iTunes -> iTunes -> Oxford University).

2. Dataverse project at Harvard's Berkman Center has made available Facebook data.

3. Create an application or a group on Facebook where you can find a way to have people give their consent to collect data on their online behavior and contacts.

We have set up an informal meeting at the annual INSNA (International Network of Social Network Analysis) conference in San Diego to exchange some of the ideas and information available. In case you are interested in joining us - please email me at ines_mergel(at)yahoo.com. I will post an update after the conference in March.

February 5, 2009

Paper in Science tomorrow on "Computational Social Science"

One of the key themes of this blog has been that social science will/should undergo a transformation over the next generation, driven by the availability of new data sources, as well as the computational power to analyze those data. I, along with many collaborators, address these issues in a paper coming out tomorrow in Science on "Computational social science" (the original title-- Life in the network: the coming age of computational social science-- was more evocative but too wordy). In any case, while I cannot post the final version of the paper, I can post the version we submitted:


Computational social science


David Lazer (Harvard University), Alex (Sandy) Pentland (MIT), Lada Adamic (University of Michigan), Sinan Aral (NYU), Albert Laszlo Barabási (Northeastern University), Devon Brewer (Interdisciplinary Scientific Research), Nicholas Christakis (Harvard University), Noshir Contractor (Northwestern University), James Fowler (UCSD), Myron Gutmann (University of Michigan), Tony Jebara (Columbia University), Gary King (Harvard University), Michael Macy (Cornell University), Deb Roy (MIT), Marshall Van Alstyne
(Boston University)


We live life in the network. When we wake up in the morning, we check our e-mail, make a quick phone call, walk outside (our movements captured by a high definition video camera), get on the bus (swiping our RFID mass transit cards) or drive (using a transponder to zip through the tolls). We arrive at the airport, making sure to purchase a sandwich with a credit card before boarding the plane, and check our BlackBerries shortly before takeoff. Or we visit the doctor or the car mechanic, generating digital records of what our medical or automotive problems are. We post blog entries confiding to the world our thoughts and feelings, or maintain personal social network profiles revealing our friendships and our tastes. Each of these transactions leaves digital breadcrumbs which, when pulled together, offer increasingly comprehensive pictures of both individuals and groups, with the potential of transforming our understanding of our lives, organizations, and societies in a fashion that was barely conceivable just a few years ago.

The capacity to collect and analyze massive amounts of data has unambiguously transformed such fields as biology and physics. The emergence of such a data-driven "computational social science" has been much slower, largely spearheaded by a few intrepid computer scientists, physicists, and social scientists. If one were to look at the leading disciplinary journals in economics, sociology, and political science, there would be minimal evidence of an emerging computational social science engaged in quantitative modeling of these new kinds of digital traces. However, computational social science is occurring, and on a large scale, in places like Google, Yahoo, and the National Security Agency. Computational social science could easily become the almost exclusive domain of private companies and government agencies. Alternatively, there might emerge a "Dead Sea Scrolls" model, with a privileged set of academic researchers sitting on private data from which they produce papers that cannot be critiqued or replicated. Neither scenario will serve the long-term public interest in the accumulation, verification, and dissemination of knowledge.

What potential value might a computational social science, based in an open academic environment, offer society, through an enhanced understanding of individuals and collectives? What are the obstacles that stand in the way of a computational social science?

From individuals to societies

To date the vast majority of existing research on human interactions has relied on one-shot self-reported data on relationships. New technologies, such as video surveillance, e-mail, and 'smart' name badges offer a remarkable, second-by-second picture of interactions over extended periods of time, providing information about both the structure and content of relationships. Consider examples of data collection in this area and of the questions they might address:

Video recording and analysis of the first two years of a child's life (1): Precisely what kind of interactions with others underlies the development of language? What might be early indicators of autism?

Examination of group interactions through e-mail data: What are the temporal dynamics of human communications--that is, do work groups reach a stasis with little change, or do they dramatically change over time (2 , 3)? What interaction patterns predict highly productive groups and individuals? Can the diversity of news and content we receive predict our power or performance (4)?

Examination of face-to-face group interactions over time using sociometers: Small electronics packages ('sociometers') worn like a standard ID badge can capture physical proximity, location, movement, and other facets of individual behavior and collective interactions. What are patterns of proximity and communication within an organization, and what flow patterns are associated with high performance at the individual and group levels (5)?

Macro communication patterns: Phone companies have records of call patterns among their customers extending over multiple years, and e-Commerce portals such as Google and Yahoo collect instant messaging data on global communication. Do these data paint a comprehensive picture of societal-level communication patterns? What does the "macro" social network of society look like (6), and how does it evolve over time? In what ways do these interactions affect economic productivity or public health?

Tracking movement: With GPS and related technologies, it is increasingly easy to track the movements of people (7, 8). Mobile phones, in particular, allow the large scale tracing of people's movements and physical proximities over time (9), where it may be possible to infer even cognitive relationships, such as friendship, from observed behavior (10). How might a pathogen, such as influenza, driven by physical proximity, spread through a population (11)?

Internet: The Internet offers an entirely different channel for understanding what people are saying, and how they are connecting (12). Consider, for example, in this political season, tracing the spread of arguments/rumors/positions in the blogosphere (13), as well as the behavior of individuals surfing the Internet (14), where the concerns of an electorate become visible in the searches they conduct. Virtual worlds, by their nature capturing a complete record of individual behavior, offer ample opportunities for research, for example, experimentation that would be impossible or unacceptable (15). Similarly, social network websites offer an unprecedented opportunity to understand the impact of a person's structural position on everything from their tastes to their moods to their health (16), while Natural Language Processing offers increased capacity to organize and analyze the vast amounts of text from the Internet and other sources (17).

In short, a computational social science is emerging that leverages the capacity to collect and analyze data with an unprecedented breadth and depth and scale. Substantial barriers, however, might limit progress. Existing ways of conceiving human behavior were developed without access to terabytes of data describing their minute-by-minute interactions and locations of entire populations of individuals. For example, what does existing sociological network theory, built mostly on a foundation of one-time 'snapshot' data, typically with only dozens of people, tell us about massively longitudinal datasets of millions of people, including location, financial transactions, and communications? The answer is clearly "something," but, as with the blind men feeling parts of the elephant, limited perspectives provide only limited insights. These emerging data sets surely must offer some qualitatively new perspectives on collective human behavior.

There are significant barriers to the advancement of a computational social science both in approach and in infrastructure. In terms of approach, the subjects of inquiry in physics and biology present different challenges to observation and intervention. Quarks and cells neither mind when we discover their secrets nor protest if we alter their environments during the discovery process (although, as discussed below, biological research involving humans offers some similar concerns regarding privacy). In terms of infrastructure, the leap from social science to a computational social science is larger than from, say, biology to a computational biology, in large part due to the requirements of distributed monitoring, permission seeking, and encryption. The resources available in the social sciences are significantly smaller, and even the physical (and administrative) distance between social science departments and engineering or computer science departments tends to be greater than for the other sciences. The availability of easy-to-use programs and techniques would greatly magnify the presence of a computational social science. Just as mass-market CAD software revolutionized the engineering world decades ago, common computational social science analysis tools and the sharing of data will lead to significant advances. The development of these tools can, in part, piggyback on those developed in biology, physics and other fields, but also requires substantial investments in applications customized to social science needs.

Perhaps the thorniest challenges exist on the data side, with respect to access and privacy. Many, though not all, of these data are proprietary (e.g., mobile phone and financial transactional data). The debacle following AOL's public release of "anonymized" search records of many of its customers highlights the potential risk to individuals and corporations in the sharing of personal data by private companies (18). Robust models of collaboration and data sharing between industry and the academy need to be developed that safeguard the privacy of consumers and provide liability protection for corporations.

More generally, properly managing privacy issues is essential. As the recent NRC report on GIS data highlights, it is often possible to pull individual profiles out of even carefully anonymized data (19). To take a non-social science example: this past Summer NIH and the Wellcome Trust abruptly removed a number of genetic databases from online access (20). These databases were seemingly anonymized, simply reporting the aggregate frequency of particular genetic markers. However, research revealed the potential for de-anonymization, based on the statistical power of the sheer quantity of data collected from each individual in the database (21).

A single dramatic incident involving a breach of privacy could produce a set of statutes, rules, and prohibitions that could strangle the nascent field of computational social science in its crib. What is necessary, now, is to produce a self-regulatory regime of procedures, technologies, and rules that reduce this risk but preserve most of the research potential. As a cornerstone of such a self-regulatory regime, Institutional Review Boards (IRBs) must increase their technical knowledge enormously to understand the potential for intrusion and individual harm because new possibilities do not fit their current paradigms for harm. For example, many IRBs today would be poorly equipped to evaluate the possibility that complex data could be de-anonymized. Further, it may be necessary for IRBs to oversee the creation of a secure, centralized data infrastructure. Certainly, the status quo is a recipe for disaster, where existing data sets are scattered among many different groups, with uneven skills and understanding of data security, with widely varying protocols.

Researchers themselves must tackle the privacy issue head on by developing technologies that protect privacy while preserving data essential for research (22). These systems, in turn, may prove useful for industry in managing privacy of customers and security of their proprietary data.

Finally, the emergence of a computational social science shares with other nascent interdisciplinary fields (e.g., sustainability science) the need to develop a paradigm for training new scholars. A key requirement for the emergence of an interdisciplinary area of study is the development of complementary and synergistic explanations spanning different fields and scales. Tenure committees and editorial boards need to understand and reward the effort to publish across disciplines (23). Certainly, in the short run, computational social science needs to be the work of teams of social and computer scientists. In the longer run, the question will be: should academia be building computational social scientists, or teams of computationally literate social scientists and socially literate computer scientists?

The emergence of cognitive science in the 1960s and 1970s offers a powerful model for the development of a computational social science. Cognitive science emerged out of the power of the computational metaphor of the human mind. It has involved fields ranging from neurobiology to philosophy to computer science. It attracted the investment of substantial resources to establish a common field, and it has created enormous progress for public good in the last generation. We would argue that a computational social science has a similar potential, and is worthy of similar investments.

References:

1. D. Roy, R. Patel, P. DeCamp, R. Kubat, M. Fleischman, B. Roy, N. Mavridis, S. Tellex, A. Salata, J. Guiness, M. Levit, P. Gorniak. 2006. "The Human Speechome Project," Twenty-eighth Annual Meeting of the Cognitive Science Society.
2. JP Eckmann, E. Moses, D. SergI. 2004. "Entropy of dialogues creates coherent structures in e-mail traffic," Proceedings of the National Academy of Sciences of the United States of America 101: 14333-14337.
3. Kossinets, G. & D. Watts. 2006. "Empirical Analysis of an Evolving Social Network." Science (311:5757): 88-90.
4. S. Aral, M. Van Alstyne. 2007. "Network Structure & Information Advantage" Proceedings of the Academy of Management Conference, Philadelphia, PA.
5. Pentland. A. 2008. Honest Signals: how they shape our world, MIT Press, Cambridge, MA
6. J.-P. Onnela, J. Saramäki, J. Hyvönen, G. Szabó, D. Lazer, K. Kaskil, J. Kertész, A.-L. Barabási. 2007. "Structure and tie strengths in mobile communication networks," Proceedings of the National Academy of Sciences of the United States of America.
7. B. Shaw, T. Jebara. 2007. "Minimum Volume Embedding," Proceedings of the Conference on Artificial Intelligence and Statistics.
8. T. Jebara, Y. Song, K. Thadani. 2007. "Spectral Clustering and Embedding with Hidden Markov Models", Proceedings of the European Conference on Machine Learning".
9. M. C. González, C. A. Hidalgo, A.-L. Barabási. 2008. Understanding individual human mobility patterns Nature 453: 779-782.
10. N. Eagle, A. Pentland, D. Lazer. 2008. "Inferring friendships from behavioral data," HKS working paper.
11. V. Colizza, A.Barrat, M. Barthelemy, and A. Vespignani. 2006. "Prediction and predictability of global epidemics: the role of the airline transportation network," Proceedings of the National Academy of Sciences of the United States of America, 103: 2015-2020.
12. D. Watts. Connections A twenty-first century science, Nature 445: 489.
13. L. Adamic, N. Glance. 2005. The Political Blogosphere and the 2004 U.S. Election Divided They Blog, LinkKDD-2005, Chicago, IL.
14. J. Teevan. 2008. "How People Recall, Recognize and Re-Use Search Results," To appear in ACM Transactions on Information Systems (TOIS) special issue on Keeping, Re-finding, and Sharing Personal Information.
15. W. Bainbridge. 2007. "The scientific research potential of virtual worlds," Science 317. no. 5837: 472 - 476.
16. K. Lewis, J. Kaufman, M. Gonzalez, A. Wimmer, and N. Christakis. 2009. "Tastes, Ties, and Time: A New (Cultural, Multiplex, and Longitudinal) Social Network Dataset Using Facebook.com." Social Networks, in press.
17. C. Gardie, J. Wilkerson. 2008. Text annotation for political science research, Journal of Information Technology and Politics 5: 1-6.
18. M. Barbarao, T. Zeller Jr. 2006. "A Face Is Exposed for AOL Searcher No. 4417749, New York Times, (August 9).
19. National Research Council. 2007. Putting People on the Map: Protecting Confidentiality with Linked Social-Spatial Data. Ed. Myron P. Gutmann and Paul Stern. Washington: National Academy Press.
20. J. Felch. August 29, 2008. DNA databases blocked from the public. LA Times.
21. N Homer, S Szelinger, M Redman, D Duggan, W Tembe. 2008. "Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays," PLoS Genetics 4(8): e1000167. doi:10.1371/journal.pgen.1000167
22. L. Backstrom, C. Dwork, J. Kleinberg. 2007. Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography. Proc. 16th Intl. World Wide Web Conference.
23. M. Van Alstyne, E. Brynjolfsson. 1996. "Could the Internet Balkanize Science?" Science. 274: 1479-1480.

------------------------------------------------
Full reference for this paper: David Lazer, Alex Pentland, Lada Adamic, Sinan Aral, Albert-Laszlo Barabasi, Devon Brewer, Nicholas Christakis, Noshir Contractor, James Fowler, Myron Gutmann, Tony Jebara, Gary King, Michael Macy, Deb Roy, and Marshall Van Alstyne, "Computational Social Science," Science 6 February 2009: 721-723.

January 6, 2009

Facebook viruses

Speaking of contagion, there was an interesting piece in the Christian Science Monitor on the spread of viruses in social media, such as Facebook. Interestingly, this problem apparently increased substantially in 2008.

Let me make a short suggestion that there is an opportunity, with the social media, to better understand the epidemiology of computer viruses. In particular, environments such as Facebook are self contained, and have a great deal of information on the strength of relationships among individuals. Further, it should be possible, after the fact, to trace exactly when and where the virus was passed from one individual to another (difficult to do with viruses that affect humans). It should therefore be possible to link topology to spread in a fashion that is generally impossible. There is, in short, an opportunity to greatly advance understanding of contagion with data that companies like Facebook, Bebo, etc, have-- if anyone from these companies is reading, consider this a short research proposal ; - ).

January 5, 2009

Google books

There was an interesting article in today's New York Times on Google books. Google books is a massive effort to scan, essentially, all print media, going back centuries. (Also see effort by Open Content Alliance.) Partially putting aside the important issues around control of the data, the digitization of texts creates the capacity to access, organize, and analyze much of what humanity has "thought" in recent history. From the perspective of a social scientist, the exciting prospect is to view this corpus as, perhaps the most extraordinary data set ever assembled (especially when combined with recent developments in natural language processing). Can we see the rise and fall of social movements? Of ways of thinking about the world, linking these constructs to space and time? This is part of a broader movement, as I have written before, toward a "computational social science."

The one aspect of control that this does raise is what access will there be to the entire Google books corpus for researchers? Indeed, part of the concern that has driven the Open Content Alliance (as I understand it) are the issues around public access to the corpus, where, for example, libraries will need to pay subscription fees for access to what could be a Google monopoly. There are similar concerns, as I see it, regarding access to those who wish to do research on these data. For those readers of the blog who have insight on this, please post comments.

December 5, 2008

Next steps for the Obama network...

As I wrote about earlier, a fascinating question is what happens to the Obama network now that the campaign is past. The Obama campaign mobilized unprecedented amounts of money and numbers of people. The possibility: that politics in this country will be reorganized around the Internet/living room model of the Obama campaign. The unknown: now that Bush (shortly) will be history, and Obama has made history, how many people will remain engaged? Obviously, it is very much in the interest of Obama to keep the network alive, for his re-election campaign in 2012, to help him govern, and to create an enduring structural advantage for the Democratic party.

In any case, part of what is necessary is to maintain those relationships built during the campaign toward the cause of electing Obama. The e-mail below, sent to Obama's e-mail list, provides hints of the initial steps in this direction:


Exactly one month ago, you made history by giving all Americans a real opportunity for change.

Now it's time to start preparing and working for change in our communities.

On December 13th and 14th, supporters are coming together in every part of the country to reflect on what we've accomplished and plan the future of this movement. Your ideas and feedback will be collected and used to guide this movement in the months and years ahead.

Join your friends and neighbors -- sign up to host or attend a Change is Coming house meeting near you.

Since the election, the challenges we face -- and our responsibility to take action -- have only gotten more urgent.

You can connect with fellow supporters, make progress on the issues you care about, and help shape the future of your community and our country.

Learn what you can do now to support President-elect Obama's agenda for change and continue to make a difference in your community.

Take the first important step by hosting or attending a Change is Coming house meeting. Sign up right now:

http://my.barackobama.com/changeiscoming

To get our country back on track, it will take all of us working together.

Barack and Joe have a clear agenda and an unprecedented opportunity for change. But they can't do it alone.

Will you join us at a house meeting and help plan the next steps for this movement?

Thanks,

David

David Plouffe
Campaign Manager
Obama for America

----

A few key ingredients: emphasis the power at the roots ("you made history"); the assertion that the roots will matter ("guide this movement"); and embedding action in those local relationships forged in the campaign.

The $750m question: how many people will show up on the 13th and 14th? You won't see any newspaper stories about this, but it will be an enormously important signal as to the future of American politics.

October 30, 2008

On data growth and growing concerns

Are you in favour of more efficient and effective government? Of course you are. If one counted the reasons given most often for any type of government reform, these two would score the highest marks.

It is widely recognised that the characteristics of information and communication technologies (ICT) have strong impact on both. Government was thus among the first to utilise ICT. In the early days, punch-card machines were used for the census, and electronic databases replaced large amounts of data stored in non-digital form (for example, in files) throughout government once the technology was available.

Because information drawn from data is at the core of everything government does - analysis, decision-making or verifying eligibility for access to public services, to name just a few - the proliferation of databases, data mining and ICT in general is unsurprising. However, it is this increase in databases, the kind of data being gathered, the way that data is protected (or rather the opposite) and the way it is used internally and externally, that has come under increased scrutiny and been criticised by many civil libertarians.

But the criticisms are not just about civil liberties. When governments implement ICT, outcomes vary. Large-scale projects such as the FBI's Virtual Case File, the UK's C-NOMIS or Germany's FISKUS either failed completely or largely exceeded their estimated budgets, wasting billions of taxpayers' money. There are, of course, also successful projects but, by and large, the expected impact of eGovernment in moving into a brave new world of efficient, effective and citizen-focused government administration has not happened.

The power of ICT
ICT has characteristics that need to be understood before carrying out any impact assessment. These characteristics underline why digital data and databases will continue to grow in the future, and why it is necessary to find balanced governance mechanisms for ICT, for the organisations they are embedded in, and for us - the individuals using them.

ICT allows information processing, coordination and flows to be structured without the common boundaries of roles, organisational relationships and operating procedures found in government. As a consequence, the relationship between information and the physical factors of organisational size, distance, time and costs are altered.

Digital information makes geographical dispersion irrelevant, allowing for new forms of collaboration and networks. Information technologies facilitate the speed of communication and more selectively control access to, and participation in, information exchange.

Interestingly, the standardisation, routinisation and formalisation of information sharing are not only technical requirements for shared databases to be effective; they are also typical traits of bureaucracies.

Organisational memory that was once hidden in non-digital forms or an individual's memory can be stored, managed and analysed in digital form to improve knowledge or facilitate decision-making - helped by the fact that information storage, provision and search costs are virtually zero once information is digitised. Moreover, the human constraints of processing large quantities of information are reduced (for example, through the use of search engines), and software applications make it possible to combine and reconfigure data so as to provide new information.

This has been spurred by the rise of Web 2.0 applications such as social networking sites, mash-ups, tagging, and wikis, with the underlying philosophy that comes with it - i.e. mass collaboration and data sharing - further facilitating the growth of data.

The public has followed this trend on a scale that no one imagined. Younger people in particular store and share data about their activities, location, buying behaviour or personal lives like no other generation before, and periodical incidents of security breaches, identity theft and fraud have not reversed this trend.

Often, this behaviour is based on a conscious decision: millions of users joined corporate loyalty programmes (offered by, for example, airlines, hotels or shops) in return for personalised services, rebates or points that can be used in various ways. People may also just be following an intrinsic desire to share and connect. Wikipedia is one of the prominent examples of the powerful force of collaborative peer production.

Data is also gathered and stored by companies in ways which customers are unaware of, but while the public has less control over the activities of companies, there is generally greater concern when government is engaging in these types of activities.

The rise of government databases
The counter argument is that governments do not gather more data; they are just gathering and combining data in new ways (for example, databases, biometrics, face-recognition software, remotely readable chips (RFID)).

They do so for good reasons: national security, accountability, to provide better public services and to bridge organisational silos. Yet, since 9/11, more data is being sought indiscriminately rather than selectively, meaning that innocent people's data is included through law-enforcement agencies' screening processes.

Indeed, studies have shown that bigger DNA databases produce better results. This may argue in favour of creating a comprehensive DNA database containing information on all citizens and not just those convicted of crimes, as this may actually help to exclude suspects, save investigative resources and have a deterrent effect overall.

The automatic transfer of data about passengers flying from Europe to the US sheds light on another important aspect of the discussion on databases and data sharing. In a globalised world, should countries grant access to their domestic databases and how can they protect personal data beyond their national borders?

Incidents such as the day in November 2007 when the UK government managed to lose two CDs with unencrypted data of more than 25 million citizens underline four key issues relating to government databases.

First, the government has a mandate to protect the public's data. Second, data security is not only about technology. Thirdly, government needs strategies to manage digital trust. Finally, the characteristics that make ICT so valuable (for example, the ease with which it can be transferred) mean there are increasing vulnerabilities and risks: that data will be shared when it should not be, or that it will be lost, stolen or misused. At the same time, there is a risk that data will not be available when it should be.

Paradoxically, calls for new government databases and better interoperability do arise when the system of government fails. Accordingly, new databases to track and monitor individuals and institutions, or links between formerly separate databases, are built.

Moreover, many ideas for creating pro-active, multi-channel, one-stop and joined-up government simply do not work without databases. The volume of data to be collected will grow constantly in the near future as more government transactions are digitised, and cases of data being cross-referenced ('mined') will also grow as the relevant software improves.

Even if a government body decides to discard data, it faces many difficulties.

First, because data storage costs are continuously decreasing, governmental organisations prefer to keep everything, creating 'data cemeteries'. The expansion in the volume and kinds of data maintained by agencies have made it almost impossible to maintain an inventory of resources.

Second, interim systems sometimes bridge the incompatibilities between the old and the new system, thus keeping the legacy system alive and increasing its overall complexity. For example, the US' Internal Revenue Service launched a new software application to support a total quality management initiative, but never shut it down after the initiative ended. The amount of work it would take to resolve issues relating to data exchange with other systems were considered too high.

Moreover, these 'electronic mounds' accumulate massive quantities of rules that conflict with changes to other systems. For example, to control user access, user behavior or make sure different software applications can work together. The possibilities of storing and searching electronic information may also justify the development of large sets of these rules, so ICT does not always cut red tape. This is why some have proposed a combination of laws and technology to require and make it easier for data to be deleted - and thus "revive our society's capacity to forget".

Policy options
The expansion of databases puts greater burdens on the political-administrative-ethical calculus to strike the right balance between innovation and regulatory regimes. The following questions should be considered by policy-makers in the planning stages of initiatives that include setting up databases:

  • What type of data should be collected and why?
  • Who collects, maintains and owns the data?
  • How is the data collected?
  • How long should the data be stored?
  • Should the data be shared and why?
  • Who will be affected by making data more widely available?
  • What impact will the data have on different stakeholder groups?
  • Is the data aggregated?
  • Is there a minimum opt-out provision for those who provide the data directly or indirectly?
  • What security measures and policies are in place to protect the data?
  • How to can the data be accessed and changed by those who provide it directly or indirectly?
  • How can accuracy of the data be ensured?
  • How can the data be reviewed and disclosed?
  • Do third parties who provide or use the data have the same security standards and privacy policies?

Policy-makers should also consider educating the public better on issues such as privacy and identity self-management - a process which may need to begin as early as in elementary school. They also need to understand how trust and the perception of security in digital government is created.

In any case, there will be many alternatives for government, businesses and the public to choose from when incorporating ICT into their lives. The perception of what is right and wrong will evolve alongside the values they are measured against, and the databases and techniques they are applied to.

(a longer version will appear in the European Policy Centre's "Challenge Europe - Is Big Brother watching you - and who is watching Big Brother" publication)

References:
D. Lazer (Ed) (2004) DNA and the Criminal Justice System, Cambridge, MA: MIT Press.

V. Mayer-Schoenberger (2007) 'Useful Void: The Art of Forgetting in the Age of Ubiquitous Computing', RWP07-22, Faculty Research Working Paper Series, Harvard University, Cambridge, MA: John F. Kennedy School of Government.

National Research Council (2008) Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment, Washington, D.C: National Academies Press

A. Schellong (2008) Citizen Relationship Management, Brussels: Peter Lang Publishing.

D. Tappscott; A. D. Williams (2007) Wikinomics, New York: Portfolio Hardcover.

The Economist 'Data Mining, 27 September 2008.

The Economist 'Privacy in Britain, 28 January, 2008.

The Economist Big, bigger, biggest', 28 February 2008

The Economist 'Identity parade, 14 February 2008.

October 21, 2008

Neighbor to Neighbor vs Voter to Voter

Interesting blog posting from techPresident comparing Obama's "Neighbor to Neighbor" tool to McCain's "Voter to Voter".

Some key observations:

If you do a Google search for the words Obama and "neighbor to neighbor," Google returns 479,000 hits. A search for McCain and "voter to voter" returns 325 hits.

According to Google, the total number of sites linking to my.barackobama.com/n2n is 475, with 396 of them being blogs. The total number of links to www.johnmccain.com/v2v is 18, with none from blogs.

October 19, 2008

Pew study: the connected American family

A just released Pew report, Networked Families (by Tracy Kennedy, Aaron Smith, Amy Tracy Wells, and Barry Wellman), may be of interest to some readers of the blog. To quote from the abstract:

The internet and cell phones have become central components of modern family life. Among all household types, the traditional nuclear family has the highest rate of technology usage and ownership.

A national survey has found that households with a married couple and minor children are more likely than other household types -- such as single adults, homes with unrelated adults, or couples without children to have cell phones and use the internet.

The survey shows that these high rates of technology ownership affect family life. In particular, cell phones allow family members to stay more regularly in touch even when they are not physically together. Moreover, many members of married-with-children households view material online together.

Also, go to the Washington Post story.

The report is chock full of interesting data on how modern communication technologies have become integrated into the daily lives of American families. I am a bit concerned, however, about the interpretation that these technologies have actually improved communication within families. For example, respondents are asked whether the Internet has improved connections to family members/friends/etc, and given the choices a lot, some, only a little, not at all. It is problematic to interpret these particular results as supporting the proposition that the Internet has improved communication within families, because (1) no choices are given that the Internet has undermined connections, and, more important, (2) respondents are not in a position to construct a counterfactual. (At least, I can no more imagine what communication within my family would be like without the Internet than it would be without the automobile.) That is, I would be cautious in interpreting some of these results as supporting the proposition that the Internet, for example, really has improved communication within the family.

Interesting puzzle to me, in a question to adults (from p. 28) about whether the Internet and cell phones have made their families closer than their families when they grew up, 28% of families with cell phones and Internet in their households indicate that these technologies have made their families closer, but, oddly, 17% of families without either also indicate that these technologies have made their families closer. It is striking to me that there is not a bigger difference-- is this because of indirect access to these technologies (at school, work, through friends)?

Neat factoid from report (p. 24): women generally seem to communicate more with their children than men, but there is a particularly large gender gap in text messaging with children, with 28% of women reporting texting their kids, and only 12% of men. I am also surprised that as many as one in twenty five parents communicate with their children (age 7-17, so this includes many parents who don't have kids who are using these Facebook, etc) via social networking sites at least once a day. My oldest considers it scandalous that I am even on Facebook, and has flatly forbidden me from friending her....

September 22, 2008

Motion Sensors in Laboratories

In the last year motion sensors have been deployed at a leading academic research laboratory to study how people use space. This study has been using sensors similar to those developed by MERL, which detect when an object moves under a sensor, which are mounted to the ceiling. This does enable for limited tracking capabilities, although the sensors are placed only in "public" spaces in the lab. Naturally this is rich area for research and this study also provides a platform for studying privacy systems.

While these sensors can be used to study how people use buildings in their current state, to me the most interesting question that these sensors can fundamentally change architecture. For example, if certain room types are found to be more effective for fostering interaction, could rooms automatically alter themselves (unfurling walls, moving lights, adding chairs) to try to elicit desired behaviors? Of course this could be done manually by having someone navigate an interface, but allowing the organization itself to specify architectural parameters and have the building change from day to day by itself would be fascinating.

The data in this ongoing study is kept public (online as well as access through a public display) to members of the laboratory being studied, although not to the outside world. This is a marked departure from previous studies, which would often only release analyzed behaviors months after the study. Still, it is important for people to know who is looking at the data, since someone at the lab could potentially use the data to track someone. Interestingly, when the system was installed there was an initial minor backlash, but now that the system has been in place for so long people mostly ignore it. I have experienced this before with the badges, but I suspect this is true in e-mail monitoring and similar applications as well.

Many companies are developing this built-in sensing technology, and NEC appears to be emerging as a leader with their IR motion sensors that actually leech power from fluorescent light bulbs, allowing them to last indefinitely in the environment, versus 3 years for the MERL sensors. While this technology is still experimental, this technology should become commercialized in the next few years, either as a consulting package or a standalone sensing tool.

September 5, 2008

Honest Signals

Sandy Pentland has written a new book on our group's research from the past 5 years: Honest Signals. Here's a summary:

How can you know when someone is bluffing? Paying attention? Genuinely interested? The answer, writes Sandy Pentland in Honest Signals, is that subtle patterns in how we interact with other people reveal our attitudes toward them. These unconscious social signals are not just a back channel or a complement to our conscious language; they form a separate communication network. Biologically based "honest signaling," evolved from ancient primate signaling mechanisms, offers an unmatched window into our intentions, goals, and values. If we understand this ancient channel of communication, Pentland claims, we can accurately predict the outcomes of situations ranging from job interviews to first dates.

Pentland, an MIT professor, has used a specially designed digital sensor worn like an ID badge--a "sociometer"--to monitor and analyze the back-and-forth patterns of signaling among groups of people. He and his researchers found that this second channel of communication, revolving not around words but around social relations, profoundly influences major decisions in our lives--even though we are largely unaware of it. Pentland presents the scientific background necessary for understanding this form of communication, applies it to examples of group behavior in real organizations, and shows how by "reading" our social networks we can become more successful at pitching an idea, getting a job, or closing a deal. Using this "network intelligence" theory of social signaling, Pentland describes how we can harness the intelligence of our social network to become better managers, workers, and communicators.

I've read through an early edition, and in contrast to other pop science books like Freakanomics and Predictably Irrational (both of which are interesting reads), Honest Signals has the scientific details of the experiments that it talks about, in the form of a thorough 50-page appendix. For anyone interested in how sensing technology will change business and the sciences or who's interested in learning how people actually interact with each other, this is a must read.

April 25, 2008

Virtual course and blog: Government 2.0

Technology, societal changes and new management practices influence how we perceive the roles of government. Moreover, they may transform how government does business and creates public value. However, we might as well fall into the trap of technological determinism--moving from eGovernment straight to Government X.0 hype. Therefore, many predicted a significant transformation of government thanks to new technologies such as ICT, in particular, the Internet while current research shows that the transformation has not happened (e.g. work by West, Norris, Fountain or Lazer). eDemocracy also remains a rethorical promise (Mahrer/Krimmer; UN).

In any case, while I am still working on my contribution to the discourse on Web 2.0 & Government, I have two recommendations for any of our readers interested in the matter:

First, Philipp Mueller, who has already contributed some guest entries to this blog, is offering a course on "Government 2.0" for master students at Erfurt University's School of Public Policy (ESPP) (Spring term 2008). The course covers various aspects such as Web 2.0, open source, NPM, PPP, citizen-centric governance or performance management. The sessions can be viewed online or downloaded as an mp3 file.

Second, a blog by David Osimo, a researcher at the European Commission's Joint Research Centre IPTS, who is working on the impact of Web 2.0 on public services.

March 4, 2008

Commetrix a dynamic network visualization tool

While working at the CeBit, the world's largest IT related fair, I stumbled upon Commetrix, a dynamic network visualization tool, developed by researchers from TU Berlin. The software allows to import data from discussion groups, VoIP, eMail, blogs or social networking sites. Moreover, besides the usual functionality such as centrality, density or zoom, it allows for a timed-based observation of network growth and a parallel visualization of the content (e.g. emails). The latter somehow reminded me of tag clouds although in a much more sophisticated way. Matthias, the project manager, presented a demo of one of their case studies of an Enron email dataset to underline the potential of the tool. The software is only available in English. The possibilities and usabilities were pretty impressive. Though I it would be interesting to hear the opinion from an expert of software in that area. (Please comment)

cmx-sreen500.png


Researchers who would be interested in getting a copy of the software or learning more about it should contact Matthias Trier. You should also be able to find some of their work presented at HICSS and Sunbelt online.

February 24, 2008

Tapping on the wisdom of the crowd: Social network analysis software tools on Wikipedia

Together with Jana Diesner, CMU, and Matthias Meyer, WHU, I have started to collect information on social network analysis software packages and libraries.
In order to be able to make a selection from a larger pool of tools, we searched the literature and the Web for archives of tools that are widely accepted. Our goal here was to compile a systematic and (to an extent) exhaustive list of tools along with their main features, application areas and possibilities for interoperability across tools. We failed in this effort.
Clearly, there is a plethora of listings of some of the tools according to more or less explicitly stated categorization or selection criteria out there (e.g. INSNA and the chapter by Huisman and Duijn (2005) on Software for Social Network Analysis).
However, none of these lists seemed complete or up-to-date to us. We noticed that compiling our own list leads to the exact same problems, and we think we are not the only ones who went through this process. We thought this might be a good case for putting the wisdom of crowd idea into action in the social networks community. Our rationale here is that no single Web editor or researcher needs to carry the burden of building and/or maintaining such a collection, but collectively this goal can be achieved with very little individual effort.
Wikipedia has an elaborated site on Social networks (the Social network analysis site is automatically redirected there). We started to expand the network analytic section by adding a table – which was moved by the community within a day to a new page now called Social Network Analysis Software that allows everyone to add a tool along with a URL, short description, unique feature, platform it runs on, price.
We hereby invite the social network community members to add their tools and/ or to edit/ fill some of the cells in the table. Note, the present structure of the table is a suggestion, and can be modified by anyone. Potentially, this table and the references associated with it might grow -in this case we might move the table to a new page that will be linked from the current page. If you have trouble working with the Wikipedia Table you can also send your information to Jana and we will integrate it into Wikipedia. We are looking forward to the collective results!

Ines Mergel
Jana Diesner
Matthias Meyer

January 17, 2008

Social Network Feedback in Real Time

The Media Lab had an event for our corporate sponsors in Tokyo, and we thought it would be a good opportunity for us to demonstrate how sensing technologies afford real time feedback on behavior, specifically one’s social network. 70 participants (60 from the sponsor companies, 10 from the Media Lab) wore the Sociometric badges during the event, which lasted all day on January 17. One third of the participants from each company wore the badges, although when only one person from a company came they got a badge.

The badges recorded which badges were recognized over IR (corresponding to a face-to-face interaction) and then transmitted this information over a 2.4 GHz radio through intermediate basestations to a badge attached to a computer through USB. That badge then sent the information over USB to a database, which was read out by a social network visualization program (a modification of the GUESS system developed by Eytan Adar). This visualization program pulled interaction data from the database and then added edges to the social network diagram if a new interaction was detected, while at the same time modifying the layout using popular layout algorithms. The visualization itself was projected onto a large screen in the break/lunch room.

Naturally, all of this was done in real time, with a very small delay from an interaction being detected to it being rendered on the screen. It was really fantastic to see the data rendered that quickly, and crowds of people were gathering around the screen (and in some unfortunate cases blocking the projector) to see where they were on the visualization and how many people they had met. I had many people come to me throughout the day exclaiming how the visualization “Inspired [them] to network more and gave [them] a great appreciation for the value of an event like this.” Unique numbers, not names, were displayed on the visualization, so only the participant could identify themselves. Still, I noticed people pointing each other’s nodes out to colleagues and almost “keeping score.” Participants would check the visualization, go around and meet with some other people, and then check again, comparing themselves with colleagues. It was all great fun.

Initially I had assumed that each company would form its own small cluster, with perhaps a few links interspersed between the groups. You can see a screenshot of the SN diagram before lunch, after lunch, and after the last break (except for these breaks, all of the time was spent listening to lectures from Media Lab students and faculty). I’ll add pictures of the actual projected display and the set up as soon as I get them.

EDIT: Here's a picture of the visualization at the event:

explaining.jpg

Above: Visualization being discussed by myself and Schlumberger managers

first-mod.jpg

Above: SN Diagram before lunch

middle-mod.jpg

Above: SN Diagram after lunch

final-mod.jpg

Above: SN Diagram after the last break

From almost the very beginning there was one giant component with a strong core-periphery structure, although the density of the component increased over the course of the event. It appears that there were two factors that led to this structure:

1. Media Lab participants, who all spoke to each other and spoke with many sponsor companies

2. Research affiliates: members of sponsor companies who had also worked at the Media Lab as visiting researchers. These participants knew other research affiliates who had been at the Media Lab from different companies at the same time as well as the Media Lab participants.

The research affiliates also ended up introducing participants to one another, and I believe demonstrated extremely the kind of social capital that is generated through such an exchange program. In fact, Prof. Hiroshi Ishii, who organized the event, felt that this visualization could be presented to potential and current sponsors as an additional way to show the value of Media Lab sponsorship.

We are also going to analyze the data collected with the Sociometric badges to see if we can predict company affiliation, recognize research affiliates, and so on. We will also incorporate additional information into the visualization. I believe that this visualization was a success because of its simplicity, but if we add information such as accelerometer, speech, and proximity data, then participants may gain an even better understanding of what’s happening in their environment, as well as how they can interact with it.

January 10, 2008

Cary Coglianese: Weak Democracy, Strong Information: The Role of Information Technology in the Rulemaking Process

Below is a guest entry from one of the contributors to Governance and Information Technology: From Electronic Government to Information Government, Cary Coglianese, based on his chapter.

Weak Democracy, Strong Information:
The Role of Information Technology in the Rulemaking Process

Cary Coglianese

Policymakers and scholars predict that information technology will foster a "strong democracy" in the process of creating new government regulations, transforming -- indeed, some say "revolutionizing" -- the rulemaking process. Currently, the way government agencies like OSHA, EPA, and the FDA make new regulations remains relatively obscure, but several so-called e-rulemaking projects in the United States -- such as the creation of Regulations.Gov -- specifically aim to tap into the purported transformational potential of the Internet and increase the role citizens play in the regulatory process. For example, according to Peter Shane, one of the nation's leading scholars of law and information, the federal government’s current e-rulemaking initiative “seems to hold out the potential to enlarge significantly a genuine public sphere in which individual citizens participate directly to help make government decisions that are binding on the entire polity.”

Is this faith in the transformative power of information technology justified? Those who believe it is point to cases in which a large number of citizens have used the Internet to submit comments on proposed regulations. For example, hundreds of thousands of comments from the public came in on a U.S. Department of Agriculture rulemaking on organic foods, a Federal Communications Commission decision on the concentration of ownership of media outlets, and a U.S. Forest Service proceeding to ban roads in wilderness areas.

Yet despite the large absolute number of comments filed in a few highly controversial rulemakings, it is far from clear that information technology will, as a general matter, transform rulemaking into anything close to the ideals of strong democracy. For one thing, those rulemakings that generate comments in the hundreds of thousands still constitute only a minute fraction (even a fraction of a fraction) of the several thousands of new federal rules issued each year. By far, most rulemakings continue to elicit little attention from the public. Furthermore, for the exceedingly rare rule that may generate a half million or more comments, even this level of participation still represents only less than 5 percent of the total voting-age population in the United States. We know that participation by citizens in presidential elections — the most salient avenue for public participation in government — is quite low relative to other wealthy nations, so it would be surprising if the mere existence of information technology led to a consequential increase in participation over rulemaking in the U.S.

Major barriers to citizen participation in rulemaking will remain even with advances in information technology. One of these barriers is the specialized knowledge needed to participate meaningfully in the often highly technical decisions raised by rulemaking. Improving the accessibility of regulatory information on the Internet provides no guarantee that a significantly greater number of citizens will actually be able to process that information well. To imagine that information technology will dramatically increase citizens' involvement in rulemaking is a bit like imagining that making it possible to download technical automobile manuals or order car parts on-line will turn a great number of car owners into do-it-yourself mechanics. A small subset of people like engineers and car buffs will surely find it easier to fix their own cars, but most of us will be none the wiser. As long as most citizens lack more than the most rudimentary knowledge of how government works and of the technical issues underlying most rulemakings, information technology will not effectuate any but the most trivial change in ordinary citizens’ engagement in regulatory policymaking. Rather than inspiring members of the public to participate in the arcane or technical discussions surrounding government regulations, technology is instead being used by citizens to communicate with friends and family, follow sports and games, or engage in other forms of entertainment.

If information technology is not sufficient to engage a broad segment of the public in meaningful deliberation about regulatory policy issues, should e-rulemaking efforts be abandoned? Only if e-rulemaking’s sole or main purpose is to advance strong democracy. But notwithstanding the arguments made by its proponents, strong democracy is not the most realistic and compelling justification for e-rulemaking. A much more pragmatic objective is to expand and solidify the information base underlying regulatory decision-making. Regulators are undoubtedly better informed when they receive input from outside experts and interested parties. These outsiders bring distinct perspectives on regulatory problems based both on their differences in interests and differences in the scale or level at which they interact with a regulatory issue. The local sanitation engineer for the City of Milwaukee, for instance, will probably have useful insights about how new EPA drinking water standards should be implemented that might not be apparent to the American Water Works Association lobbyists in Washington, D.C. If e-rulemaking makes it more feasible for that local sanitation engineer, or other knowledgeable and motivated experts and affected interests across the country, to become aware of and submit comments on relevant regulations, then e-rulemaking can meaningfully expand the information base for regulators’ decisions.

As such, even though e-rulemaking is unlikely to achieve the goals of strong democracy, it is reasonable to expect regulators' decision making can be improved by allowing at least a somewhat broader set of well-organized and sophisticated actors to mobilize their resources, monitor government decision-making, and share potentially valuable information and insights with government officials. Rather than advancing "strong democracy," e-rulemaking seems more likely to achieve a more modest "weak democracy" -- but with the promise of delivering additional "strong information."

November 30, 2007

Options of 311 and a glimpse at Germany's plans for a networked N-1-1 solution (D-115 Buergertelefon) - Part I

The move to establish an easy to remember number (311) for non-emergency government services has lately gained attention around the globe. There are now initiatives underway in Germany (D115) and the UK (101). After 10 years, more and more counties and cities decide to start 311 projects. Yet 311 is far from being available for the whole population in the U.S. if we consider an earlier post of mine (map of U.S./Canadian 311 service center projects). In order to discuss the alternative or future options of 311 I will first take a look at the general options a government can follow to establish the phone as public service delivery channel. Part I will present the five options. The combination of performance management and service centers is mostly excluded to reduce complexity. The models are based on a country with a federal government structure. Part II which will be added in a couple of days will discuss the future of 311 and issues such as performance management.

The central approach
At first glance it is probably the easiest way to set up a central service center for any government. This can be a single, big service center or a number of service centers which are virtually connected. In Figure 1 below a service-center that covers more than one level of government (either of the same level e.g. several cities or several cities and a county) is called "Regional Service Center". The core aspect of this concept is the central character: Governance, finance (e.g. federal budget) and data bases. While centrality makes many things like setting standards or reducing redundancies easy, data bases are the central challenge of this approach. Not the technology but rather the content. Just gathering and maintaining the data from all levels of governments sounds like a goal that is either unrealistic (if we consider the principle of subsidiarity in a federal state which is many times protected by the constitution) or never ending. Moreover, if we think of the way 311 is used as a tool for performance management and tapping into the local knowledge of citizens there is challenge on how this data gets redistributed to the right sources.

central.png
Figure 1: The Central Approach


The 311 approach
I am not going into much detail here. An advantage of 311 is that it avoids the political battles of a central approach or the move to start with a multi-jurisdictional approach. Figure 2 shows the current situation in the U.S. We have mostly 311 centers on the local level. They may have information on higher jurisdictions in their data bases but they are generally not fully integrated in the service value chain. A few Regional Service Centers can be found already. For example, Miami-Dade County has integrated the City of Miami. 34 cities have not been integrated yet. The challenge of administrators in Miami-Dade derives from budget constraints (property tax issue) or the regulatory environment. An additional challenge is to come up with finance and service level agreements that result in benefits for both sides and a sustainable service to the citizens. As one administrators once pointed out to me: "Setting up the call center and data base is easy. Changing the integrated administrations (departments) and preparing them for the change in citizens' expectation is the real challenge". Finally, Figure 2 also points to two further issues of this approach. First, 311 results in a lot of isolated and many times redundant relationships (either data or other form of agreements). Second, it is difficult to realize country wide accessibility. Less populated areas, therefore, the municipalities will lack the financial and HR capacity to realize 311 on their own.

311.png
Figure 2: The 311 Approach


The Central/311 Hybrid approach
This model (Figure 3) is generally a combination of the central and the 311 approach. Certain information and services that are provided by higher jurisdictions (here: State/Federal) are managed and available from a central unit/access point. This avoids some of the redundancies of the 311 approach. Regional and local service centers may develop at different speeds and provide varying degrees of services. Therefore, political battles are less likely to come up as would be the case in the central model. Service centers are not exchanging their local data or services with other service centers.

star.png
Figure 3: The Central/311 Hybrid Approach


The Networked Approach
The networked approach generally builds on most of the components described in the 311 model. The core difference is that all of the service centers build a network. Information is shared widely while each service center integrates government entities based on its needs or plans. Figure 4 shows the complexity of the network and the probability of creating highly redundant activities and relationships. In order for the network to function all members need to establish some form of governance to solve issues of standards and coordination.

networked311.png
Figure 4: The Networked Approach


The Multi-centric Approach
The Multi-centric approach combines aspects of the central, 311 and network approach. It characteristics of a central approach because there are central units/db which provide information/services/coordination for a certain subset of service-centers within one "center". The service centers can evolve at different speeds and service-depths. They can be local or regional service centers. Therefore, the multi-centric approach starts like the current 311 activities. However, there is a core difference. Within one "center" the service centers are supposed to coordinate their efforts. In addition, there is a central unit (see top left of Figure 5) which coordinates and supports (e.g. good practice sharing, etc.) the overall efforts of all the "centers" and the service centers. Finally, the multi-centric approach also adopts the idea of the network approach. Each center shares information/services with other centers.

multi_centric.png
Figure 5: The Multi-centric Network Approach

The multi-centric model is currently the favored approach for the introduction of the project "D115 Behördentelefon / Behördeneinheitliche Rufnummer" in Germany.

September 27, 2007

Overview of U.S. and Canadian 311 city and county service center

I have created a mash-up of the U.S. and Canadian 311 projects (last update: 9/26/07) which I would like to share with you. There are currently around 70 service centers (311) in the U.S.. Most of the 311 projects have been realized on the municipal level and in most of the U.S. biggest cities. While 11 countys have decided to offer 311, not all of them are multi-jurisdictional, that is information and services from the municipalities within a county are not integrated. Furthermore, 311 services can have various levels of sophistication and may either be operated by the police department or by newly generated 311 service units/departments.

The first city to test 311 was Baltimore in 1996, however, it was Chicago which used 311 in 1999 in a much broader way for public service provision, city management and accountability. The City of Chicago's 311 still is the first place to visit and learn from for many elected officials and public managers. Today, New York City is the biggest 311 implementation in the U.S. (size of the service center/ population served) and probably the most well known implementation due to the global media coverage it received. With a pop of approx. 5400 Alaska's City of Bethel is the smallest place to use the number.

Except for the City of Somverille none of the cities in Massachusetts have implemented 311. Given the close proximity of cities in the greater Boston metro area its really hard to understand from a citizen's perspective why there couldn't be a single 311 solution for the whole area. After all, there would be around 3.6 million people less to serve than in NYC and there should be many information redundancies.


View Larger Map

Blue = Municipal 311 (Realized)
Red = County 311 (Realized)
Yellow = Planning or implementation stage
Green = 311 in Canada

If you know about new 311 projects please email me.

September 19, 2007

Finding those mutual friends

There has been a spate of stories (also this story from globe) recently about the use of cell phones to track the locations of your friends. There has also been some talk of linking data from the social networking sites to cell phones, so, for example, you could walk into a room and instantly find someone who you had never met, but who was a friend of a friend (through facebook or some other database that was stored in the phone).

These are clever ideas, but let me throw out another idea: a program for phones that facilitates those “you grew up in Long Island? Did you know John Smith, by any chance?” moments. (A small digression—this exact question (except for the name) was posed to me once. And, surprisingly, I actually did know the person in that case.)

The basic idea is quite simple—if you are talking to someone, and you both have the program, have the phones link via Bluetooth and:

1) find overlapping phone numbers and report them back. A more extensive version of this would also match any incoming or outcoming calls the phones have ever made.

2) collect and match structured data about the owners of the phones—where and when you went to school, where you’ve lived different years, etc., and report back matches. This is all of that information that people initially exchange when they meet. While no substitute, this could be faster and more thorough way to those “Do you know” questions. That is, you would both instantly find out if you both happened to live in Ann Arbor for the same two years in the 1990s.

This would not be a hard program to write, but there is the classic chicken-egg problem of how to get enough people to sign on to make it work. Not my problem—but if you write such a program, let me know.

August 4, 2007

Are you in my network?

Interesting article in the NYT this morning: it seems as if the business strategies of cellphone networks have an impact on social networks. People who are in the same network talk more to each other, than people who don't have the same cellphone network.

The article explains how in informal friendship networks the frequency and duration of cellphone calls is lowered as soon as one of the participants switches to another network and that business acquaintances become "friends" through longer and more frequent phone calls when they are in the same cellphone network.

They refer to research on cellphone use being conducted at the Universities of Notre Dame and Michigan. I am wondering if people here at the Media Lab have found out a similar connection: a question for our bloggers Ben and David.

I had not thought about my own personal cellphone usage in this way, mainly because I am not checking how many minutes I have left. From a research standpoint, is your cellphone network/provider really powerful enough to influence the duration and frequency of interactions with people you do not consider your friends and only talk to on purely professional topics?

March 27, 2007

DevalPatrick.com

Deval Patrick recently relaunched his campaign website to be a Web 2.0 style website, allowing anyone to post issues and have people vote on them-- see Boston Globe article, which is implicitly critical. It’s an interesting experiment, and notable that it is not an official government website, but still, essentially, a campaign website (e.g., you can donate money to his campaign).

March 14, 2007

Social Networks and Communication Neworks

The University of Toronto’s NetLab has been doing some exciting research on how to measure social networks and communication behavior. Their recent conference paper, “Collecting Social Network Data to Study Social Activity-Travel Behavior: An Egocentric Approach,” discusses new methods of collecting data about social network, travel behavior, and the use of communication technologies. This is exciting research because it shows how to effectively measure two important elements of social life – the cognitive dimension of perceiving the existence of social ties, and the behavioral dimension of interaction that actually occurs with social ties. Moreover, this research incorporates multiple types of communication, including communication that occurs in-person, telephone, and email. The advantage of this approach is that rather collecting data about only certain kinds of ties or ways of interacting – such as the General Social Survey’s question about “those with whom you discuss important matters” – measuring both the cognitive and behavioral elements of social ties gives a more comprehensive understanding of the extent to which social life exists in America and how it actually occurs.

February 15, 2007

Mobile phones in the developing world - Part III

I recently came across another interesting news article about the adoption of mobile phones in developing countries. This particular article focuses on mobile phone adoption from the perspective of companies who are selling mobile phones in India. These companies are scrambling to make low-cost phones that will endure dust, heat, and long periods of time without recharging. What strikes me most about this article are the lengths that people living under impoverished conditions will go to connect, and stay connected, with their social networks. Having little else, they are still willing to spend a large amount of their income on a single piece of social technology. Yet, from the social support perspective, this makes perfect sense. If people in India are anything like those studied in America, they exchange different kinds of support with different kinds of ties, and mobile phones enable them to stay connected to a variety of ties like never before. On the other hand, while Americans often use their networks to get ahead, people in India may need them just to get by.

January 31, 2007

Mobile phones in the developing world - Part II

Inspired by Jeph's entry on mobile phones and the developing world, I would like to provide some additional information on this topic. In Africa, 50% of telephone lines are in major cities and 90% of Africa’s overall telephone network is located in South Africa. Mobile phone penetration is now about 9% compared to an internet penetration of 2.6%. Morocco’s mobile phone penetration was 24 per 100 inhabitants in 2004, while fixed line penetration remained unchanged at its 1995 level (4 per 100 inhabitants). Indeed, Researchers of London Business School (Link to the study sponsored by Vodafone) found that, in a typical developing country, a rise of ten mobile phones per 100 people boosts GDP growth by 0.6 percentage points. Here is a link to a presentation by one of the researchers.
Average US landline/cell phone penetration is around 94% compared to an average internet penetration of 68% in the US. According to the International Telecommunications Union (ITU) overall landline penetration in Europe was 56% and 88% of the population had mobile phones in 2004. Asian countries like Japan or Korea remain the leaders in 3G and are already working on the next version. All in all, mobile phones are much more pervasive and capable to bridge the digital divide (infrastructure, socia or income related).

Update: Here is a link to a related study in the McKinsey Quarterly that just came out.

January 29, 2007

Mobile Phones in the Developing World

I always enjoy reading about how communication technologies are adopted in different counties. I recently read an article in the Washington Post about the use of mobile phones in the "developing world," which does a good job of mentioning the many social factors that explain why mobile phones are often heavily adopted in poor countries.

Of the many factors mentioned, I was most struck by the argument that that people in poor countries may find mobile phone email particularly useful because it is extremely low cost and non-intrusive. These are the same two factors that helped kick-start the now highly prevalent use of mobile phone email in Japan. Japanese teens were the first in Japan to use mobile phone email, because it was cheep enough for them to use often and because its non-intrusive nature allowed them to stay connected without drawing attention from parents and teachers. Of course, there also many differences between the uses of mobile phone email among Japanese teens and those in poor countries. Many Japanese teens received phones from their parents, while people in poor countries often adopt them for business purposes. Nevertheless, the interesting thing about these two cases is that it was the congruency between the affordances of the technology and the social situations that ultimately lead to its integration into everyday life.

January 17, 2007

New PEW Study on Online Social Networking Websites and Youth

The PEW Internet & American Life Project has just published a new study on Online Social Networking Websites and Youth.

They define online social networking websites as:

A social networking site is an online place where a user can create a profile and build a personal network that connects him or her to other users.

One of the main and interesting findings is that 55% of the teens between 12-17 are using social networking platforms to connect with their friends online - girls mainly to reinforce existing relationships and boys more to connect to new friends or dating purposes. The findings also show, that 82% of the respondents said, that they are using online social networking sites to stay in contact with friends who they rarely see.

This supports the theory in our working paper on the sustainability of online ties, that social networking platforms can support the maintenance of existing ties or to reconnect with former friends. See my earlier entry on the sustainability of online ties here on the IQ blog and also on my social networking blog.

January 13, 2007

cRANKy.com - first age-relevant search engine/social networking plattform

I just discovered the first age-relevant search engine - slash social networking plattform: cRANKy.com. It is targeted towards +50 year olds (seniors and baby boomers). They intend to provide information on specific topics such as jobs after retirement, how to become 100 years old, how to make new friends, etc.

I like the “How to make friends” section - which ties into what Thomas and I are working on: people in specific phases of their lifes are only adding specific types of (new) contacts to their network of friends. Especially when you retire - you won’t see your co-workers on a daily basis anymore, your routines are changing and you might loose some of your contacts. See my earlier post on the sustainability of online ties.

It’s also great, that the most relevant topics are pre-sorted by relevance (to avoid being overwhelmed by too many results), there are some prominent buttons to increase the text size and you can top 10 yourself, so that information can be pushed at you.

December 8, 2006

What makes online ties sustainable?

Recently we heard more and more that online social networking platforms don’t really work - Alexa teaches us, that people tend to sign up for MySpace, Facebook or openBC, but platform providers have the hardest time to keep the network alive: people tend to sign up, but don’t or only infrequently come back to their profile.
This made my co-author Thomas Langenberg, EPFL Lausanne in Switzerland, and me start to think about the question: What makes online ties sustainable? We came up with a research design that looks at four different phases of a life cycle of online ties.

Here is the abstract of our paper:

Recently, the Pew Internet & American Life Project published a study about the number of social relations people maintain online and the omnipresent question was raised again: are actual face-toface contacts declining over time and are they replaced by online social interactions. Our virtual life is scattered in online profiles across sites such as openBC.com, Friendster.com, Match.com or MySpace.com. There are currently more than 400 different online social networking sites – with new sites popping up every day. Building on existing factors of persistence and sustainability of network ties in general, we address the key research questions: Which factors lead to the creation, maintenance, decay and reconnection of online network ties? Our research draws on prominent issues in the social network literature, which address the gap between research on offline and online social networks. We examine individual, dyadic, structural and content-related characteristics to understand how and why actors in different phases of their life cycle turn to online ties. Within the presented research framework, we derive propositions and develop a research design to collect and analyze qualitative and quantitative network data. The overall goal is to develop recommendations on how online social networks can become sustainable over time, and we develop questions and avenues for further research.

We came up with the following taxonomy of online vs. offline networks in our paper:

sntypology.jpg
You can download the full paper on our Working Paper website of the Program on Networked Governance.

Full citation:

Mergel, I./Langenberg, T. (2006): What makes online ties sustainable? A Research Design Proposal to Analyze Online Social Networks, PNG Working paper No. PNG06-002, Cambridge.

November 9, 2006

Mobile Phone Service Providers and Customer Location Information

I recently finished serving as an expert witness in a court case in which I had to provide my opinion about the possible locations of a mobile phone given cellular tower IDs and base station positions. While this information had to be subpoenaed from Verizon as part of the litigation, it may be disconcerting for some to acknowledge that in databases distributed throughout the world, mobile phone service providers are storing records of location and social network data for one out of three people on Earth.

Besides the data’s obvious utility in courtroom trial cases like the one I was testifying in, I’m curious about the long-term consequences of commercial companies recording a time series of locations and communication events for billions of people. Who legally owns this data? Because carriers like T-Mobile & Sprint now publicly disclose the locations of their towers, base station locations are no longer the corporate secret they once were, and subsequently can’t be used to prevent customers from obtaining the location information collected about them. If I ask my T-Mobile representative to provide me with my call log history, they don’t seem to have a problem with disclosing my communication events to me. However, when asked to provide me with an approximation of the locations associated with each of my calls, they still claim this is prohibited. So, empirically at least, it doesn’t appear that the customers own the location data collected about them. And if the customers don’t own this information, then I imagine by default, the mobile operators are the ones who own the records of movement data for all of their customers.

What guidelines do mobile operators have to abide by when using this data? Can it be sold to a 3rd party? How much would a detailed time-series of my locations over the last five years go for on Ebay? Who would be the highest bidder? Urban planning consultants interested in public transportation usage? Companies working on developing the next census? Wall St traders interested in where I’m doing my grocery shopping?

This data clearly has value. Already carriers are selling real-time location information to companies who use this information to extrapolate the location and speed of the individual and use this data to offer road traffic updates and forecasts. As the major carriers’ billion dollar networks turn into a commodity infrastructure, mobile operators are going to be ever more interested in monetizing the location data generated from their customers. (“This speeding ticket has been brought to you courtesy of Cingular Wireless. Raising the bar.”)

So here is an exercise for the interested reader – call up your own service provider and ask for the location information associated with your call logs. Let me know if you’ve had any luck.

March 15, 2006

The Social Affordances of Email in Japan and America

During a recent presentation at the University of Tokyo I discussed the social affordances of email. I defined social affordances as the social opportunities and constraints provided by technology. (If you want to read more about this topic, see my co-authored paper : "The Social Affordances of the Internet for Networked Individualism," in Journal of Computer Mediated Communication, 8, 3.) After I listed a number of email’s social affordances, one of the audience members pointed out that those affordances only apply to PC based email. By contrast, there exists a substantially different set of affordances for mobile phone based email. Given that my research is only about the use of email in America, my lack of attention to mobile phone email was intentional. There are not enough Americans using this technology for it to be relevant to my current research. Nevertheless, this comment got me thinking about the difficultly of making cross-national generalizations about the social uses of particular technologies. For example, even though the use of PC email is almost as common in Japan as it is in America, the wide-spread use of mobile phone email in Japan may change how the Japanese use PC email.

Continue reading "The Social Affordances of Email in Japan and America" »

March 8, 2006

Network Visualization Tools

I find that when I’m inundated with network data, the best way to get my head around it is through visualization. The human eye seems to be able to identify the important structure and topological dynamics much easier than an algorithm. Over the years I’ve spent most of my time using a Window/Linux application called Pajek. I use a Matlab script to turn an adjacency matrix into files that Pajek can interpret as a network. It supports different shapes, colors, and edges - and even can visualize (more or less) dynamic networks.

pajek_networks_sm.png

One of my Pajek networks from here.

However, things start to break down when the networks go beyond a few hundred nodes. There are several packages for large-scale network visualization - however most come with serious limitations. Walrus creates beautiful networks, but unfortunately they need to be spanning trees.

walrus.png
A 500,000 node Walrus network

There are plenty of other network analysis tools out there - but it’d be great to hear people’s experiences actually using them on real data...

February 14, 2006

What would you do with the telephone call network of an entire country?

I’m beginning a collaboration with British Telecom in an effort to analyze their massive call network dataset. This is a dynamic, directed network that contains ~250 million nodes (ie: distinct phone numbers) and ~2000-5000 edges (ie: calls) generated each second. The phone numbers are of course one-way hashed such that it is impossible to link a node’s identifier to an actual phone number. However we do have information about the country and region to which the node belongs (ie: country code / area code). While it is not inclusive of every call to and from the UK, it is estimated that the dataset includes approximately 80% of landline calls and 30% of mobile calls.

So my question to the complex systems / social network community is this: what are some questions we should attempt to ask of this dataset? Possible examples include calculating the strength of a particular region’s relationships with other regions and countries, analyzing the dynamics involved in “call cascades?, inferring the average size of an individual’s hierarchical social groups (from close friend to possible acquaintance), etc...

duration2.gif

While many metrics may be impossible to calculate for a network of this magnitude, simple sampling can yield interesting results. For example, the plot above represents the duration of outgoing calls from 100,000 randomly sampled nodes during 6 month intervals over the course of October 1995 to March 1998. It is clear that there are an increasing number of very long calls (over 10^4.2 seconds ~ 4.5 hours) which could be a good indicator of the uptake of dial-up internet in the UK during this timeframe.

February 12, 2006

Longitudinal Data and the Adoption of Technology

I've spent this last week working on a paper with Kakuko Miyata and Barry Wellman. The paper uses longitudinal survey data collected in Japan to understand the causal relationship between the use of keitai (internet enabled mobile phones) and the reception of social support. This is one of the first opportunities that I've had to write a paper based on longitudinal data, and I'm thoroughly enjoying the experience. In addition to providing me with an understanding of the causal relationship between the technology and social behavior, the data is also allowing me to chart the adoption of a new technology, as it has become integrated into lives a general public. This experience has made me wonder about the extent to which the adoption of keitai is the result of a social network structure that is more prevalent in Japan than in countries. My hope is that more longitudinal studies of this nature will be conducted in different countries, so that I might someday better understand the extent to which adoption patterns are the result of differences in network structure, vs. other factors, such as culture, marketing, or investment in technological infrastructure.

February 10, 2006

The Strength of Weak Ties Revisited - A Practical Example

Having discussed Granovetter's seminal paper on "The Strength of Weak Ties" in our last class on Network Analysis, I just found a 21st century application of the theory on the website of Ideentower.blogs.com.

There is a relatively new service on the web, which allows people to connect to each other when traveling from A to B. The service is called AirTroductions and provides interested individuals to register and subsequently look for other, unknown individuals, that might be on the same flight. The purpose of the service is to allow people make interesting contacts which eventually lead to all type of relationships.

I found this interesting as another example for how easy it is today to build weak ties with modern web technology!

January 31, 2006

Follow up: Google bombs and the autonomy of search engine vendors

In my entry on Google bombs on 11/19/2005 I raised the following question:

"How will governments react to such movements of altering the search results in an unfavorable way in the future as knowledge becomes more important? How will search engine providers react? The easiest way to approach this would be to influence or enforce rules on search engine vendors. Hence, we could ask whether search engine providers need to be kept as autonomous as central banks with respect to knowledge?"

Well, as of 1/25/2006 we got an answer to this when reports on Google's self-censored search engine for China came out. However, as a other reports show, censorship also exists in other countries like Germany or France for certain terms. So in fact there is a need to watch developments in this regard carefully...What do you think or propose?

Related articles:
Harvard Law School, Berkman Center for Internet & Society
NY Times on Google and China search engine version
Wired on Google and their geolocations on searches

Continue reading "Follow up: Google bombs and the autonomy of search engine vendors" »

January 22, 2006

Citizen Relationship Management ? - Part I

My next entries will discuss the application of Customer Relationship Management in the public sector. Other terms used are citizen or constituent relationship management. As this is a relatively new topic and less applied concept in the pulic sector I hope our visitors are interested in sharing some of their ideas or questions with me.

What is CiRM?
In how war is CiRM different from CRM?
How is it understood in government?
How is CiRM implemented?
Will it have an impact on customer service in the ps? What other impacts do you expect.
What other questions should we ask?

I am looking forward for your input. I will provide further information on Citizen Relationship Management at my website.

January 9, 2006

NSA data mining—what patterns to look for: expansive scenario (II)

A more expansive scenario would be that the NSA collects all phone log data from US sources as well as non-US calls that pass through US switches, plus locational information from cell phones where available (+ e-mail traffic, etc).

The expansive scenario offers a significant security and logistical advantages to the NSA. The security advantage is that under the more limited scenario, the NSA would have to share critical security information with telecomms, by asking them for information about only certain individuals. That delimited information is terribly sensitive intelligence—by telling telecomms who they want to monitor, etc, it is essentially telling them who the government has received intelligence about.

Continue reading "NSA data mining—what patterns to look for: expansive scenario (II)" »

January 7, 2006

NSA data mining—what patterns to look for (I)

So, what data mining could one do with the data the NSA has collected from telecomm companies? Obviously, it is still unclear as to what is being collected, so this is quite speculative, which is a little different from my normal role of cautious academic. My hope is that this speculation, in the end, will yield some productive discourse about this important subject. I also want to make clear that I am not endorsing (or condemning) such data mining for now. Later I will discuss some of the privacy and policy issues. For now, I just want to do a thought experiment of how one might analyze these data in a fashion that might detect terrorist activity.

My assumption here is that the objective is to identify candidate nodes (individuals) for surveillance.

I am going to start with what I consider a less expansive scenario. In this particular scenario, one is starting out with some phone numbers and e-mails that are designated as “high risk?—e.g., from other intelligence. A simple analysis would simply snowball outwards from these high risk nodes to their contacts, and to their contacts’ contacts, etc. As one snowballs outwards, one will likely find overlaps, where some nodes are members of multiple circles. In the simplest analysis, the more circles that a node is a member of (and the closer to the center of those circles), the higher risk they should be considered.

Obviously, the analysis should get substantially hairier than that, because of the nature of the sampling from the network. For example, I am guessing that the identifications of high risk nodes are not independent events. Imagine that an Al Qaeda cell is identified and its members apprehended in Jordan, and their computers, address books (or equivalents) acquired. One would then snowball outwards from these contacts. However, to find overlap among the contacts of these cell members presumably conveys different information than if one found overlap among the contacts of different cells from different countries (presumably the latter would be more significant).

One could devise a weighting system that depends on the number of paths that go through a particular node, other information about nodes, etc, to develop a ranking of who should be watched. These weights could be validated by fitting them to part of the network data, and then examining whether the technique was effective at identifying those nodes that you knew were already “high risk.?

Ideally, one would use communication data going back in time as far as possible—thus, while telecomm companies are sharing data, you would want them to go back as far as possible. This would also be useful in case you wanted to do sequence and timing analysis—e.g., it’s not just who you call, but it’s when you call (say after some event), or that you called Anne after Joe called you.

Obviously, there are lots of difficult issues re sampling. Further, one would hypothesize that any terrorist worth their salt would be careful about recording contact information, and, more generally, their use of electronic communication. And I would guess that most of the people that terrorists communicate with are non-terrorists, and their contacts, in turn, are even less likely to be terrorists, so the vast majority of people caught in this net are going to be non-terrorists. So, to mix metaphors, one may have removed from the haystack proportionally more hay than needles, but you are still left with a very large haystack with just a few needles.

Once one has identified some risky nodes, the next step would be to monitor actual communications. Presumably, the NSA has finite capacity to have humans listen to conversations, and thus the key management question is how to allocate this scarce resource. The first level of monitoring would therefore simply be recording of conversations. Presumably, this is fairly cheap to do, so, putting civil liberties concerns aside, one would adopt a pretty low risk threshold for recording. This would allow going back in time for human monitoring if an individual were subsequently identified as high risk. A second level, if it is technically possible (at some level it surely is), would be to apply voice recognition to those recordings, where the content of conversations would adjust the evaluated risk level of those nodes. Further, such voice recognition could pick out candidate snippets of conversations for human monitoring. Such “snippet-based? monitoring, I think, would explain why the FISA court process was circumvented, since it might result in the brief, human-based monitoring of a very large number of people (conceivably exceeding the number of warrants approved by the FISA court in its history very quickly), and in the computerized monitoring of a still larger numbers of people. That is, the oversight process specified by FISA would be unable to cope with the sheer volume of requests. Further, the basis of monitoring these snippets is probably weaker than what has traditionally been brought before the FISA court. It would also explain why some defenders of the policy (who presumably know more than has been publicly released) have stated that having a computer monitor your conversation was not a privacy intrusion (thus suggesting that a major component of the program did involve computerized monitoring).

This is the less expansive scenario that I have come up with (although how expansive it is depends on a number of parameters—how many steps out one goes from the initial sample, what is the threshold for monitoring, etc, so the actual numbers of people who are in some fashion caught in the net might number anywhere from thousands to millions). This is a pretty rudimentary analysis, as compared to how one would actually do it, but I think has the essential ingredients. My next entry will consider a more expansive scenario.

December 30, 2005

Social network analysis, the NSA, and “pattern analysis?

The story about the NSA eavesdropping program has received a lot of attention over the last week. The follow up story has received somewhat less attention, but may be more important, see story from December 24 NYT: Spy Agency Mined Vast Data Trove, Officials Report (by ERIC LICHTBLAU and JAMES RISEN):

“What has not been publicly acknowledged is that N.S.A. technicians, besides actually eavesdropping on specific conversations, have combed through large volumes of phone and Internet traffic in search of patterns that might point to terrorism suspects. Some officials describe the program as a large data-mining operation.?

“A former technology manager at a major telecommunications company said that since the Sept. 11 attacks, the leading companies in the industry have been storing information on calling patterns and giving it to the federal government to aid in tracking possible terrorists.

"All that data is mined with the cooperation of the government and shared with them, and since 9/11, there's been much more active involvement in that area," said the former manager…

This is a remarkable story, and raises some interesting questions: (1) exactly what data are telecomm companies sharing with the government; (2) what could usefully be gleaned from these data; and (3) what are the privacy implications?

There is a lot more we don’t know about this story than we do know, but it is worth beginning a discussion on the value and the costs of these data under different scenarios of exactly what information is being shared. My next few entries will aim to begin a discussion on these issues, grounded primarily in a social network perspective.

Briefly, what data do telecomm companies have? Focusing on the telephone data, for now, they have (1) phone log data; (2) varying amounts of locational information for cell phones; and (3) varying amounts of information linking individuals to particular phone numbers (e.g., not so much for some pay as you go phones, more for other types of phones). My understanding is that little remains of the bits that flow over the network (i.e., the content).

These are thus a type of social network data, along the lines of my preceding entry on the “behavioral flows? of relationships. That is, for any given dyad one can observe the timing and duration of calls.

Whose phone data is being tracked? It is not clear from the article. Clearly, the focus is on international communication (domestic to international, and international to international calls routed through switches that are on US soil). Is purely domestic communication also being tracked? The article suggests not:

“This so-called "pattern analysis" on calls within the United States would, in many circumstances, require a court warrant if the government wanted to trace who calls whom.?

This sentence is ambiguous, however—e.g., given that the sharing of data by the telecomm companies is voluntary, what are the statutory limits on their sharing data with the government? Is there a prohibition on a telecomm company voluntarily handing over information to the government regarding one of their customers’ phone logs? I do not know of such a prohibition, but if a reader does, please do comment.

For the next entry: Given that these are essentially social network data, from what we know from the research on social networks, what insights might they yield?

November 17, 2005

Adapting to different social circles: Are people changing their online personality depending on the social context?

When it comes to social software, a myriad of platforms and websites sprang out of the ground during the last couple of years: The Social Networking Services Meta list shows 380 different social networking platforms, covering interest areas such as business networking, dating, friend networking, pet networking, photo sharing or face-to-face facilitating sites.

It seems as if all these content areas are targeting different user groups, therefore different social circles in which the users are active.

Even though, it might be that some of the circles have overlapping neighborhoods of actors, it is more likely, that people would chose different social networking platforms for different purposes: for example, A might probably want to connect to B for dating purposes on a different platform than the one he uses with C for business contacts.

This leads to my question: Are people changing their personality (or at least are they (inter)acting differently, displaying different kinds of information = showing a different face) on different platforms? If so, where are the differences and why are they occurring?

One way of analyzing these differences would be a) to conduct a self-study or b) to collect data on people that you know of who signed up for different platforms. What would be a robust way to analyze these differences?

Looking forward to your comments :)