November 2014

Sun Mon Tue Wed Thu Fri Sat


3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29

Editor Login

Convener in chief:

David Lazer
(Methodology, Networked Governance)


Stanley Wasserman
(Current Trends, Methodology, Social Networks)

David Gibson
(Social Networks, Interaction, Theory)

Yu-Ru Lin
(Networks, Visualization)

Ines Mergel
(Knowledge Sharing, Social Computing, Social Software, Government 20)

Maria Binz-Scharf
(Qualitative Methodology, Knowledge Sharing, eGovernment)

Alexander Schellong
(Admin, eGovernment, Government 20, Citizen Relationship Management)



Recent Entries

Recent Comments


Powered by
Movable Type 4.24-en

View Blog Stats

Blog Directory & Search engine
Academics Blog Top Sites

Blog Flux Local - Massachusetts
Blog Flux Directory

2 November 2014


Predictions for Global Executive Elections, 1 November 2014

Sorry, belated by a few hours this month, but everything is reproducible for November 1.

Uruguay - incumbent party win - 83.6%
Namibia - incumbent party win - 54.2%
Romania - incumbent party lose - 75.2%
Tunisia - incumbent party lose - 100%
Nigeria - incumbent party win - 95.0%

We again place Tunisia at 100% confidence of the incumbent party losing because of the dissolution of the incumbent party. We have had difficulty finding polling data on the Namibian election, which is the reason for the low confidence in the expected result.

Our confidence in the Nigerian election has increased due to the current office holder's announcement that he is running and early polling in his favor. We have started tracking some additional elections that will occur in the next six months, but are holding off predictions until candidates are named.

Ref: C1, DM2, and DP17

By David Lazer | 7:08 AM | Comments (0)

31 October 2014

Citizen science

Landmark month for Volunteer Science!

As we get to the last few hours of October, I am pleased to announce that we hit a new record for number of subjects in a month-- at 11,000+! I would wager that is more than all other behavioral research labs in Boston put together in October. Please do help us continue recruiting subjects, and in the not too distant future we will be recruiting more researchers to conduct experiments as well.

By David Lazer | 10:12 PM | Comments (0)

Big data

The new google flu trends: no sale

Google just released a new version of Google flu trends (GFT). GFT, as readers of the blog likely know, is an effort--launched with a paper in Nature-- by Google to track the flu based on search terms. The idea is that when lots of people are sick with the flu, there are more searches for things like "cures for the flu." This project has come to represent the possibilities and foibles of "big data", and along with coauthors I critiqued GFT in a paper this last March. GFT had been missing by large margins for a number of years, and there were a number of major statistical problems that we identified (most importantly, that GFT added only incrementally to the lagged CDC data, and should have integrated those data into the projections), but the most critical issue is that the methodology and data are opaque.

So: what about the new version of GFT? Well, there's good news, and there's bad news. The good news is that the new method claims to take "official CDC flu data into account". The (really) bad news is that the methodology is now much more opaque. As of today, there is no accounting whatsoever of how these numbers are generated, and because those numbers are now an unknown and perhaps dynamic mix of search and CDC data, third parties can no longer mash up the GFT data with other types of signals of flu prevalence.

Why not share the underlying data streams of the 50 or so GFT search terms? Surely the community of data scientists/researchers and the like could do something valuable with these data. The answer from the project lead at Google, Christian Stefansen, quoted in the Wall Street Journal, states "We would love to, but if we were to do that, it would be easy for someone to game the model.... We're at this intersection between providing a service for free and making it researchable, so we're trying to strike the best of both worlds."

Here is my proposal to Google: don't give the research community the GFT search terms. But give us counts for the next 100, at the state level. Make it a contest to see what teams do the best, by some reasonable criterion. Then take some of the top methodologies used to aggregate those terms and apply to the core GFT search terms. This harnesses the crowd while allowing the core set of terms to remain hidden. What is the downside to this? Because the data would be aggregated at the state level, there would be no privacy issues implicated. And the incremental impact on any leakage of proprietary information regarding how the Google algorithm works would, arguably, be quite tiny. The upside is clear-- hundreds of able minds competing to improve GFT. Further, releasing information at the state level would allow more finely granular projections of flu prevalence, which would be much more valuable for policy makers and for modeling efforts in projecting into the future.

So, Google: how about it?

By David Lazer | 7:51 PM | Comments (0)

1 October 2014


Predictions for global executive elections, 1 October, 2014

Here are our October predictions:

Brazil - incumbent party win - 91.4%
Bosnia and Herzegovina - incumbent party lose - 59.2%
Bosniak Election - incumbent party lose - 53.0%
Croat Election - incumbent party win - 90.5%
Serbian Election - incumbent party win - 83.7%
Bolivia - incumbent party win - 93.5%
Mozambique - incumbent party win - 89.5%
Uruguay - incumbent party win - 82.3%
Namibia - incumbent party win - 53.9%
Romania - incumbent party lose - 75.3%
Tunisia - incumbent party lose - 100%
Nigeria - incumbent party win - 59.1%
Yemen - incumbent party win - 80.4%

Here are the notes for this month's predictions. First, we are still splitting up Bosnia and Herzegovina. While we were able to access some polling data that disaggregates the complex structure of the election last month, it should be noted that they indicate very close races (all within margin of error) and a high level of undecided voters. This means our predictions are probably overconfident.

Second, we again place Tunisia at 100% confidence of the incumbent party losing because of the dissolution of the incumbent party.

Finally, the Nigeria and Yemen election predictions should be considered very tentative, since they are being made so far out from the election.

Ref: C1, DM2, and DP16

By David Lazer | 8:22 PM | Comments (0)

1 September 2014


Predictions for Global Executive Elections, 1 September 2014

This month, we are only posting one set of predictions. The beta version 2.0 has been doing well enough that we think it is time we move over to it.

Brazil - incumbent party win - 51.8%
Bosnia and Herzegovina - incumbent party lose - 59.3%
Bosniak Election - incumbent party lose - 50.7%
Croat Election - incumbent party win - 90.7%
Serbian Election - incumbent party win - 84.0%
Bolivia - incumbent party win - 93.6%
Mozambique - incumbent party win - 89.8%
Uruguay - incumbent party win - 83.3%
Namibia - incumbent party win - 53.3%
Romania - incumbent party lose - 75.6%
Tunisia - incumbent party lose - 100%
Nigeria - incumbent party win - 57.1%
Yemen - incumbent party win - 78.8%

Note, we again place Tunisia at 100% confidence of the incumbent party losing because of the dissolution of the incumbent party.

Finally, the Nigeria and Yemen elections have some uncertainty in their coding because it is still uncertain who will run. In Nigeria, for example, the incumbent has states that he will not run, but is facing political pressure to change his mind.

Ref: C1, DM2, and DP15

By David Lazer | 4:02 PM | Comments (0)

12 August 2014


Brian Granger on "Open, Reproducible and Exploratory Data Science" August 21

Please come by Northeastern next week for this talk by Brian Granger, especially if you are interested in IPython/Jupyter.

Open, Reproducible and Exploratory Data Science
Professor Brian Granger
Physics, Cal Poly State University
Lead Developer and Co-Founder, IPython and Jupyter Projects

Center for Complex Network Research (CCNR), Northeastern University
3:30 - 5:00pm, Thursday, August 21

Data Science involves the application of scientific methodologies to data driven computations across a wide range of fields. As Drew Conway has clarified, it sits at the intersection of hacking/programming, math/statistics and domain specific expertise. Because data science is data- and computing-centric it requires powerful software tools. In this talk I will describe open source software tools for data science that i) are built with open languages, architectures and standards, ii) promote reproducibility and iii) are optimized for exploratory data analysis and visualization.

In particular, I will describe the Jupyter Notebook (formerly named IPython), an open-source, web-based interactive computing environment for Python, R, Julia and other programming languages. The Notebook enables users to create documents that combine live code, narrative text, equations, images, video and other content. These notebook documents provide a complete and reproducible record of a computation, its results and accompanying material and can be shared over email, Dropbox, GitHub or converted to static PDF/LaTeX, HTML, Markdown, etc. Most importantly, the Jupyter Notebook is built on top of an open architecture for interactive computing that is completely language neutral, allowing it to serve as a foundation for other data science projects and products.

One of the most important aspects of data science is interacting with data. This involves iterative cycles of visualization, computation and human computer interaction to extract understanding and make predictions. Jupyter now provides an architecture for interactive JavaScript/HTML/CSS widgets that allows users to interact with their data in a direct and simple way by automatically creating appropriate user interfaces for Python objects and functions. This allows the power of modern JavaScript libraries (d3.js, leaflet.js, backbone.js, etc.) to be leveraged in Python/Julia/R driven computations.

Throughout the talk, I will provide examples of how IPython is being used across a wide range of fields including science, engineering, social sciences, finance, computer science, industry, publishing and journalism. Jupyter/IPython is funded through the Alfred P. Sloan Foundation, the Simons Foundation, the National Science Foundation, Microsoft and Rackspace.

Brian Granger is an Associate Professor of Physics at Cal Poly State University in San Luis Obispo, CA. He has a background in theoretical atomic, molecular and optical physics, with a Ph.D from the University of Colorado. His current research interests include quantum computing, symbolic computer algebra, parallel and distributed computing and interactive computing environments for scientific computing and data science. He is a lead developer on the IPython project, a co-founder of Project Jupyter, creator of PyZMQ and is an active contributor to a number of other open source projects focused on scientific computing in Python. He is @ellisonbg on Twitter and GitHub.

By David Lazer | 9:53 AM | Comments (0)

1 August 2014


August 1 Executive Election Predictions

As with last month, there are two sets of predictions presented this month. The first is with our version 1.0 model - the same model as what we have used for previous predictions and that will serve as a reference point for version 2.0 (which also uses slightly updated data as input, hence the DM1 vs DM2):

Model Version 1.0 Predictions
Turkey - incumbent party win - 98.6%
Brazil - incumbent party win - 98.8%
Bosnia and Herzegovina - incumbent party win - 82.9%
Bosniak Election - incumbent party win - 96.8%
Croat Election - incumbent party win - 74.9%
Serbian Election - incumbent party win - 74.9%
Bolivia - incumbent party win - 99.7%
Mozambique - incumbent party win - 99.6%
Uruguay - incumbent party win - 98.7%
Namibia - incumbent party win - 56.0%
Romania - incumbent party lose - 80.9%
Tunisia - incumbent party lose - 100%
Ref: C1, DM1, and DP14

Model Version 2.0 Predictions
Turkey - incumbent party win - 83.4%
Brazil - incumbent party win - 91.5%
Bosnia and Herzegovina - incumbent party lose - 58.2%
Bosniak Election - incumbent party win - 66.6%
Croat Election - incumbent party lose - 60.8%
Serbian Election - incumbent party lose - 60.8%
Bolivia - incumbent party win - 93.0%
Mozambique - incumbent party win - 89.4%
Uruguay - incumbent party win - 81.3%
Namibia - incumbent party win - 53.7%
Romania - incumbent party lose - 75.8%
Tunisia - incumbent party lose - 100%
Ref: C1, DM2, and DP14

Here are the notes for this month's predictions. First, Tunisia is set to 100% incumbent party loss probability because the incumbent's party was disbanded. Second, results for Bosnia and Herzegovina are difficult to interpret because there are three presidents being elected. We have attempted to break it down by individual presidential election, but all of these results should be taken as tentative until we have public opinion data for the elections. We expect to receive better data in the next few weeks that should help us better discriminate these elections in both models. Third, the prediction for Turkey has shifted because of the release of public opinion polls showing the incumbent party's candidate with a strong lead.

By David Lazer | 9:00 AM | Comments (0)

2 July 2014

Big data

A modest proposal to Facebook

Many/most/all of the readers of this blog have now heard of the Facebook emotion contagion study published in PNAS last week. Briefly: Facebook researchers, in collaboration with scholars at Cornell and UCSF, experimentally manipulated the algorithm that determines the subset of posts you see on Facebook, such that some people saw more positive posts, and others more negative posts. Their finding was, roughly, that negativity begets negativity, and positivity positivity. This paper has gone through a remarkably fast cycle of "isn't that interesting but a bit creepy" to methodological critiques ("they're not really measuring emotion") to a vast "Facebook is unethically manipulating our emotions!"

This post is not a commentary on the science, the ethics of this study, when is consent required, the structure of ethical self regulation (via IRB) in the US vs other countries (usually with no equivalents of IRBs), or the generally important question of the implications of our increasingly algorithmically organized societies. These will be subjects for future posts, and of many future classroom discussions I will have with doctoral students about research ethics. Rather, my concern right now is that this event has the potential to damage our collective capacity to create knowledge regarding human society because of the potential for public relations fiascos for companies. Of course--knowledge production will continue regardless, but perhaps all be safely proprietary, within the research departments of companies. Such an outcome would be terrible, not only for our collective understanding of human society, but also for these companies, because, paradoxically, the participation in vigorous public intellectual debates is important for the capacity of developing proprietary knowledge. Knowledge does not grow in hermetically sealed silos, and it is not coincidence that our creative industries have grown up in near proximity to universities, which at their best are highly permeable intellectual hot houses.

I'd therefore like to make a modest proposal about academic-industry cooperation, which is that companies like Facebook should create opt-in experimental panels, with an initial clear, short, transparent and in your face, fairly general and flexible consent about the types of ways their sociotechnical environment would be (modestly) experimentally varied. (And if certain experiments exceeded those parameters, there could be an additional consent required for specific experiments.) Subsequent to the completion of a study, study participants would be informed of the study, with a plain English explanation of the findings, as well as access to subsequent publications. Indeed, I'd note that my team has created a platform along this model, Volunteer Science, which is partially built on top of the Facebook API. Our challenge is building a user base. Facebook would not have a problem building a volunteer army to help out science--they could have a million recruits tomorrow.

I don't claim this is a cure all, but it would cure a lot--indeed, I think the entire current mess would have been avoided if the research had been done on such a volunteer base.

I'd note that Facebook and the like would (and will) continue to do A/B testing, and generally experimentally tweaking their algorithms in ways that (1) create variations in individual experience, and (2) have potentially important consequences, individually and collectively. This should be vigorously studied by scholars, and debated and scrutinized in the broader society. But the issue of whether and how a company like Facebook can participate in academic research, and in particular conduct field experiments, is actually solvable.

By David Lazer | 10:25 PM | Comments (0)

1 July 2014


July 1 Executive election predictions

As with last month, there are two sets of predictions presented this month. The first is with our version 1.0 model - the same model as what we have used for previous predictions and that will serve as a reference point for our updated version. The second is the beta version 2.0 that also uses updated data.

Model Version 1.0 Predictions
Indonesia - incumbent party lose - 75.5%
Turkey - incumbent party lose - 64.3%
Brazil - incumbent party win - 98.8%
Bosnia and Herzegovina - incumbent party win - 82.9%
Bolivia - incumbent party win - 99.7%
Mozambique - incumbent party win - 99.6%
Uruguay - incumbent party win - 98.7%
Namibia - incumbent party win - 56.0%
Romania - incumbent party lose - 80.9%
Tunisia - incumbent party lose - 69.0%
Ref: C1, DM1, and DP13

Model Version 2.0 Predictions
Indonesia - incumbent party lose - 70.7%
Turkey - incumbent party lose - 55.1%
Brazil - incumbent party win - 91.9%
Bosnia and Herzegovina - incumbent party lose - 59.1%
Bolivia - incumbent party win - 92.9%
Mozambique - incumbent party win - 89.8%
Uruguay - incumbent party win - 83.7%
Namibia - incumbent party win - 54.9%
Romania - incumbent party lose - 75.0%
Tunisia - incumbent party lose - 55.1%
Ref: C1, DM2, and DP13

Here are the caveats for this month's predictions. The results for Turkey are still early, since campaigning will not begin until July 11 and candidates are still not certain. Bosnia and Herzegovina has a three president system that we are still figuring out how to model. Currently we are counting a loss by any incumbent party as a loss. In Bolivia there is no official candidate list yet. Similarly, in Romania, most candidates have not yet been announced. Finally, Tunisia is a transition state and, while there are plenty of polls, the election has been delayed several times already and there are a lot of potential candidates.

By David Lazer | 9:00 AM | Comments (0)

13 June 2014

WIRE Workshop @ Harvard on June 17: Working with Internet Archives for Research

Many of the readers of this blog will be interested in this event:

WIRE Workshop: Working with Internet Archives for Research
Institute for Quantitative Social Science (IQSS)
Harvard University
1737 Cambridge Street
Cambridge, MA

Tuesday June 17, 2014
Room S010
1pm - 5pm

Please join us on Tuesday for a series of public presentations highlight ongoing research at the intersection of network analysis, large-scale data and archival Internet studies. This workshop is hosted by a team of scholars from Rutgers University, Northeastern University, and the Internet Archive. The aim of the workshop is twofold. The workshop will provide a forum for presentations and discussions of ongoing research involving community development and historical Internet data. Presentation sessions will focus on a variety of themes, derived from ongoing research about online community emergence and evolution. A closing session will be devoted to discussing future research needs and unanswered research questions with regard to data and access to historical Internet records. The workshop will provide a mechanism for discussing the functions that should be incorporated into a prototype historical Web extractor, and for outlining potential research questions to be addressed with a prototype tool and databases. In addition, key questions gathered during the workshop will serve as initial discussion points for the online community that will support ongoing interaction between researchers.

This event is cosponsored by the NetSCI Lab at Rutgers, NULab for texts, maps, and networks at Northeastern, IQSS at Harvard, and the Internet Archive.

For more information see:


1:00pm - 1:45pm: Opening Session
Welcoming Remarks
David Lazer, Northeastern University
Overview of the Archive Hub project and Internet Archive Research
Matthew Weber, Rutgers University

1:45pm - 2:30pm: Internet Archives and Research Potential
Insight into the Internet Archives
Kris Carpenter, Director, Web Archive, Internet Archive
Web Wide Crawls
Vinay Goel, Senior Data Engineer, Internet Archive

2:30pm - 3:00pm: Research Highlights
Ancient History of the UK Web
Eric Meyer and Scott Hale, Oxford Internet Institute

3:00pm - 3:30pm: Coffee Break

3:30pm - 5:00pm: Research Highlights
Research Infrastructure for the Study of Archived Web Materials
Neils Brugger, Associate Professor, Head of the Centre for Internet Studies and of NetLab, Aarhus University
ALEXANDRIA: Temporal Retrieval, Exploration and Analytics in Web Archives
Wolfgang Niejdl, Director, LS3 Research Center
WebScience and Archival Internet Research
Thanassis Tiropanis, Senior Lecturer, University of Southampton

5:00pm - 5:30pm: Challenges for Future Research

By David Lazer | 11:19 AM | Comments (0)