May 2014

Sun Mon Tue Wed Thu Fri Sat
       

1

2 3
4 5 6

7

8 9

10

11

12

13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Editor Login


Convener in chief:


David Lazer
(Methodology, Networked Governance)

Editors:


Stanley Wasserman
(Current Trends, Methodology, Social Networks)

David Gibson
(Social Networks, Interaction, Theory)

Yu-Ru Lin
(Networks, Visualization)

Ines Mergel
(Knowledge Sharing, Social Computing, Social Software, Government 20)

Maria Binz-Scharf
(Qualitative Methodology, Knowledge Sharing, eGovernment)

Alexander Schellong
(Admin, eGovernment, Government 20, Citizen Relationship Management)

Categories

Archives

Recent Entries

Recent Comments

Notification

Powered by
Movable Type 4.24-en




View Blog Stats

Blog Directory & Search engine
Academics Blog Top Sites

globe_blogs.gif
Blog Flux Local - Massachusetts
Blog Flux Directory

« Free copy of Danah Boyd's "It's complicated - the social lives of networked teens" | Main | Why do people search for health related information? A survey by Google »

15 March 2014

Might the CDC data be wrong?

Judging from the number of tweets with "big data hubris" (I'll admit the irony of my using this metric) our paper on Google Flu Trends has gotten a bit of a buzz. There is one small point I would like to elaborate on. A few people have suggested perhaps GFT is right and the CDC data are wrong. In our analysis/discussion we are not assuming that the CDC data are "right" (indeed, in a trivial sense they must be wrong and the statistical question is, generally, how wrong are they). However, GFT is built on top of CDC data--technically, it's not a predictive model of flu prevalence, it's a predictive model of future CDC reports about the present. If the CDC data have warts, if GFT is working well, it will fit those warts. If the CDC data underestimate flu prevalence in certain regions, say, then GFT will as well.

As we note in the paper, the interpretation would be different for a methodology that directly aimed to measure flu prevalence. In that case, if there were a deviation, one would have make an assessment as to which method was more likely to be accurate.

Of course, there are a few minor caveats to this. For example, if for some exogenous reason the CDC data were to dramatically drop in quality at a certain point in time (say, if funding for data collection were slashed) then there could be an argument that an approach such as GFT would, for a period at least, be more accurate than CDC data (since GFT would have been fit to the previously higher quality CDC data). But I don't think we have any reason to believe that this is the case currently.

Posted by David Lazer at March 15, 2014 8:11 AM