11 December 2008
Amanda Cox from the NYT graphics department gave a fun talk yesterday about challenges she and her colleagues face.
One of the challenges she discussed is statistical uncertainty -- how to represent confidence intervals on polling results, for example, while not sacrificing too much clarity. Amanda provided a couple of examples where the team had done a pretty poor job of reporting the uncertainty behind the numbers; in some cases doing it properly would have made the graphic too confusing for the audience and in others there may have been a better way.
She also talked about "abstraction," by which I think she meant the issue of how to graphically represent multivariate data. She showed some multivariate graphics the NYT had produced (the history of oil price vs. demand, growth in the CPI by categorized component) that I thought were quite successful, although some in audience disagreed about the latter figure.
Amanda also showed the figure that I reproduced and discussed in an earlier post, in which I reported that the NYT graphics people think that the public can't understand scatterplots. Amanda disagrees with this (she said it annoys her how often people mention that point to her) and showed some scatterplots the NYT has produced. (She did say she thinks people understand scatterplots better when there is an upward slope to the data, which was interesting.)
The audience at the talk, much of which studies the media in some capacity and nearly all of which reads the NYT, seemed hungry for some analysis of the economics behind the paper's decision to invest so much in graphics. (Amanda said the paper spends $500,000 a month on the department.) Amanda wasn't really able to shed too much light on this, but said she felt very fortunate to be at a paper that lets her publish regression trees when, at many papers, the graphics team is four people who have their hands full producing "fun facts" sidebars and illustrations of car crash sites.
8 December 2008
Please join us this Wednesday, December 10th, when Amanda Cox (who is with the New York Times) when she will present "Open Problems in NYT Graphics". Amanda provided the following abstract:
The New York Times graphics department is a group of about 30 journalists who make the charts, maps and diagrams for the print and online versions of the paper. This talk is a (completely unofficial) guide to some of the problems the department faces on an ongoing basis, including how to represent uncertainty in an accessible way, and how to move beyond something I call "Here is some data:" toward something closer to inference.
The applied statistics workshop meets at 12 noon in room K-354, CGIS-Knafel (1737 Cambridge St) with a light lunch. Presentations start at 1215 pm and usually end around 130 pm. As always, all are welcome and please email me with any questions.r
5 December 2008
From my point of view, an applied quantitative social science study is usually a process containing three parts. The first part is about theoretical/formal modeling (with either explicit or implicit assumptions), the second about deriving empirical implications from the model and the last about applying (or inventing in some cases) appropriate statistical methods to collect evidence and evaluate the derived empirical implications.
Professor Liberson and his coauthor in a recent article that I will point out below called this entire process as implication analysis, while previously for me, I tend to think implication analysis is only the second part of this process, something like comparative static and dynamic analysis, etc. But given that some of us and probably more of us are increasingly interested in producing works by integrating the above process, it seems natural to give a name to this integrated approach, as compared to formal analysis and empirical/statistical analysis.
Certainly, the integrated approach increases the complexity of research, as there are many things can go wrong between theory and data. A symposium on implication analysis, started with Stanley Lieberson and Joel Horwich's paper, "Implication Analysis: A Pragmatic Proposal for Linking Theory and Data in the Social Sciences," and followed by five response papers in the latest Sociological Methodology (Volume 38 Issue 1, December 2008), tries to address some of these issues, including specification of testable hypotheses, assessment of data quality, validation of estimates in different contexts, dealing with inconsistent evidence, etc.
Just FYI, Washington University's Weidenbaum Center and Department of Political Science will sponsor a new summer institute on Empirical Implications of Theoretical Models in politics in 2009.
Here is the institute's website. http://wc.wustl.edu/eitm.html
1 December 2008
Please join us this Wednesday, December 3rd when Michael Peress, Department of Political Science, University of Rochester, will be presenting, "Estimating Proposal and Status Quo Locations Using Voting and Cosponsorship Data". Michael provided the following abstract,
Theories of lawmaking generate predictions for the policy outcome as a function of the
status quo. These theories are difficult to test because existing ideal point estimation techniques do not recover the locations of proposals or status quos. Instead, such techniques only recover cutpoints. This limitation has meant that existing tests of theories of lawmaking have been indirect in nature. I propose a method of directly measuring ideal points, proposal locations, and status quo locations on the same multidimensional scale, by employing a combination of voting data, bill and amendment cosponsorship data, and the congressional record. My approach works as follows. First, we can identify the locations of legislative proposals (bills and amendments) on the same scale as voter ideal points by jointly scaling voting and cosponsorship data. Next, we can identify the location of the final form of the bill using the location of last successful amendment (which we already know). If the bill was not amended, then the final form is simply the original bill location. Finally, we can identify the status quo point by employing the cutpoint we get from scaling the final passage vote. To implement this procedure, I automatically coded
data on the congressional record available from www.thomas.gov. I apply this approach to recent sessions of the U.S. Senate, and use it to test the implications of competing theories of lawmaking.
A copy of the paper is available here.
The applied statistics workshop meets at 12 noon in room K-354, CGIS-Knafel (1737 Cambridge St) with a light lunch. Presentations start at 1215 pm and usually end around 130 pm. As always, all are welcome and please email me with any questions.