September 2007
Sun Mon Tue Wed Thu Fri Sat
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30            

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries

Categories

Blogroll

SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« September 25, 2007 | Main | September 28, 2007 »

27 September 2007

How do you get 7,000,000 cell phone records?

Not to take anything away from David Lazer's presentation today at the Applied Stats workshop, but the star of his talk was the data. The crowd favorite appeared to be a dataset of all cell phone transactions over a several-week period for 7,000,000 subscribers somewhere in Europe (wouldn't say where). David and his colleagues have built a graph of interpersonal connections based on the call data, and are trying to answer questions like, "How many degrees of separation are there between two randomly selected people in the network?" (Answer: 13.) But to me an even more compelling question came up in the Q&A session: where do you get data like this?

David's answer was basically that you need to know the right people; it sounded as if he or one of his colleagues knew key executives at the phone company who were able to provide the call records. Lee Fleming offered that grad students might find their way to data like this by getting to know scholars like David who have access to it. (How many degrees of separation are there between you and your dream dataset?)

But the importance of knowing cell phone execs would be the wrong takeaway from David's talk, which after all was basically about how we are all awash in data these days. Yes, to get data on cell phone calls you may need to have friends at the phone company, and yes, to get information on where a group of MIT students spends every hour of the day over a few weeks you will have to launch your own experiment (as described in David's talk today), but for those of us with fewer connections and smaller research budgets there is still an enormous amount of data out there to collect, much of it from the web. I've actually spent a fair amount of time in the past year learning how to collect data from the web, and I look forward to blogging here about web scraping and other data collection approaches in the next few months. But right now I'm going to go check whether David left any tracking devices in my bag.

Posted by Andy Eggers at 12:44 AM