25 February 2008
This Wednesday the Applied Statistics Workshop will welcome Matthew Harding, Dept. of Economics, Stanford University. Matthew will be presenting his research, "A Bayesian Mixed Logit-Probit Model for Multinomial Choice", a project that is joint with Jerry Hausman and Martin Burda. Here is an abstract for the presentation:
In this paper we introduce a new flexible mixed model for multinomial discrete choice where the key individual- and alternative-specific parameters of interest are allowed to follow an assumption-free nonparametric density specification while other alternative-specific coefficients are assumed to be drawn from a multivariate normal distribution. A hierarchical specification of our model allows us to break down a complex data structure into a set of submodels with the desired features that are naturally assembled in the original system. We estimate the model using a Bayesian Markov Chain Monte Carlo technique with a multivariate Dirichlet Process (DP) prior on the coefficients with nonparametrically estimated density. We bypass a problem of prior non-conjugacy by employing a "latent class" sampling algorithm for the DP prior. The model is applied to supermarket choices of a panel of Houston households whose shopping behavior was observed over a 24-month period in years 2004-2005. We estimate the nonparametric density of two key variables of interest: the price of a basket of goods based on scanner data, and driving distance to the supermarket based on their respective locations, calculated using GPS software. Supermarket dummies form the parametric part of our model.
The workshop meets at 12 noon with a light lunch and presentations usually begin at 1215. Our workshop is located at 1737 Cambridge St, CGIS-Knafel, room N-354.
23 February 2008
A study published in the New England Journal of Medicine last month showed that widely-prescribed antidepressants may not be as effective as the published research indicates. After reading about the study in the NYT, I recently read the article and was struck by how well the authors were able to document the somewhat elusive phenomenon of publication bias.
Researchers in most fields can document publication bias only by pointing out patterns in published results. A jump in the density of t-stats around 2 is one strong sign that null reports are not being published; an inverse relationship between average reported effect size and sample size in studies of the same phenomenon is another strong sign (because the only small studies that could be published are the ones with large estimated effects). These meta-analysis procedures are clever because they infer something about unpublished studies from what we see in published studies.
As the NEJM article makes clear, publication bias is more directly observable in drug trials because we have very good information about unpublished trials. When a pharmaceutical company initiates clinical trials for a new drug, the studies are registered with the FDA; in order to get FDA approval to bring the drug to market, the company must submit the results of all of those trials (including the raw data) for FDA review. All trials conducted on a particular drug are therefore reviewed by the FDA, but a subset of those trials are published in medical journals.
The NEJM article uses this information to determine which antidepressant trials made it into the journals:
Among 74 FDA-registered studies, 31%, accounting for 3449 study participants, were not published. Whether and how the studies were published were associated with the study outcome. A total of 37 studies viewed by the FDA as having positive results were published; 1 study viewed as positive was not published. Studies viewed by the FDA as having negative or questionable results were, with 3 exceptions, either not published (22 studies) or published in a way that, in our opinion, conveyed a positive outcome (11 studies). According to the published literature, it appeared that 94% of the trials conducted were positive. By contrast, the FDA analysis showed that 51% were positive. Separate meta-analyses of the FDA and journal data sets showed that the increase in effect size ranged from 11 to 69% for individual drugs and was 32% overall.
One complaint -- I thought it was too bad that the authors did not determine whether the 22 studies that were "negative or questionable" and went unpublished were not submitted ("the file drawer problem") or rejected by the journals. But otherwise very thorough and interesting.
22 February 2008
A major item of interest in applied health economics is to understand the impact of health shocks on household income, investments and consumption. This relation is particularly important in developing countries that don’t have programs like universal health insurance or social insurance like Medicaid. Alas it’s also a major challenge to establish causal effects and mechanisms through which the shocks might operate. A main culprit is endogeneity, since health affects wealth and vice versa. As result there is a huge and truly inter-disciplinary literature on the topic, much of it with suspicious identification strategies.
The main struggle is to find a plausibly exogenous exposure to health shocks that have real-life relevance. A new paper by Manoj Mohanan takes this challenge seriously and looks at the effect of health shocks from bus accidents on household’s consumption, and examines what mechanisms households rely on to smooth consumption. (Full disclosure: Manoj is a classmate of mine, and I really like his work!)
To address the endogeneity problem, the paper focuses on people who have been in bus accidents as recorded by the state-run bus company in Karnataka, India. Clearly, finding a good control group is critical: people who travel on public buses may be different from those who don’t. For starters, they actually took the risk of getting on a bus – if you have ever been on the road in a developing country you’ll know what this means. Manoj’s approach is to select unexposed individuals among travelers on the same bus route, after matching on age, sex and geographic area of residence. Hence, conditional on these factors, the bus accident can be treated as exogenous.
He then compares the two groups on various dimensions. He finds that households reduce educational and festival spending by a large amount, but appear to be able to smooth food and housing consumption. He is unable to find effects on assets or labor supply. The principal coping mechanism is debt accumulation. Overall this suggests that not all is well: debt traps aside, reducing investments in education could be very costly in the long run (on this point see also Chetty and Looney, 2006).
18 February 2008
This Wednesday, 2/20, the applied statistics workshop welcomes Jim Snyder, Arthur and Ruth Sloan Professor of Economics and Political Science at MIT. He will be presenting "The Wealth of Political Office in the US, 1840-1870" work that is joint with Pablo Querubin, Department of Economics, MIT. Jim provided the following abstract and the attached article:
The second half of the 19th century was known as a corrupt era in U.S. politics. Using the censuses of 1850, 1860 and 1870, we find the wealth of all candidates running for the U.S. House of Representatives during the period 1840-1870. We use this data to estimate several quantities of interest, including: How wealthy were these candidates compared to others in the population at the time? How did the wealth accumulation of these candidates compare to others in the population? How did the wealth levels and accumulation vary by party? How did those candidates who won a congressional race by a close margin compare with those who lost by a close margin? This last quantity, which exploits a regression-discontinuity approach, provides a good estimate of the monetary ``rents'' to a congressional seat at that time.
As always, the workshop will convene at 12 noon with a light lunch and the presentation will begin at 1215. We are located in CGIS-Knafel, 1737 Cambridge St, room N-354.
14 February 2008
Consider this scenerio. I write a paper. I put it on my web site. The site now gets about 5 million hits a year. (Even if most of them are looking for directions to Gary Indiana, that's a fair amount of distribution.) But if I get lucky and the paper is published in the lead journal in some academic field, the journal prints around 15,000 copies and I'm supposed to take it off my web site. In what universe does this make sense?
The Faculty of Arts and Sciences has at Harvard has now taken action to avoid this situation and adopted this policy:
The Faculty of Arts and Sciences of Harvard University is committed to disseminating the fruits of its research and scholarship as widely as possible. In keeping with that commitment, the Faculty adopts the following policy: Each Faculty member grants to the President and Fellows of Harvard College permission to make available his or her scholarly articles and to exercise the copyright in those articles. In legal terms, the permission granted by each Faculty member is a nonexclusive, irrevocable, paid-up, worldwide license to exercise any and all rights under copyright relating to each of his or her scholarly articles, in any medium, and to authorize others to do the same, provided that the articles are not sold for a profit. The policy will apply to all scholarly articles written while the person is a member of the Faculty except for any articles completed before the adoption of this policy and any articles for which the Faculty member entered into an incompatible licensing or assignment agreement before the adoption of this policy. The Dean or the Dean's designate will waive application of the policy for a particular article upon written request by a Faculty member explaining the need.
To assist the University in distributing the articles, each Faculty member will provide an electronic copy of the final version of the article at no charge to the appropriate representative of the Provost's Office in an appropriate format (such as PDF) specified by the Provost's Office. The Provost's Office may make the article available to the public in an open-access repository.
What do you think? Do you think your university could (or should) adopt this? (For more information, see this site.)
11 February 2008
This Wednesday the applied statistics workshop presents Donald Rubin -- Department of Statistics, Harvard University – who will present, “Direct and Indirect Causal Effects: An unhelpful distinction?" Don has suggested the following papers provide a helpful background to his talk:
2003 - “Assumptions Allowing the Estimation of Direct Causal Effects: Discussion of `Healthy,
Wealthy, and Wise? Tests for Direct Causal Paths Between Health and Socioeconomic
Status’ by Adams et al.’”. Journal of Econometrics, 112, pp. 79-87. (With F. Mealli.)
2004 - “Direct and Indirect Causal Effects Via Potential Outcomes.” The Scandinavian Journal of
Statistics, 31, pp. 161-170; 196-198, with discussion and reply
2005 “Causal Inference Using Potential Outcomes: Design, Modeling, Decisions.”
The Journal of the American Statistical Association, 100, 469, pp. 322-331.
The workshop meets at 12 noon in room N-354 CGIS-Knafel (1737 Cambridge St) with a light lunch, with presentations usually beginning at 1215.
5 February 2008
Another Tuesday, another primary election or twenty, and another opportunity for things to go wrong with pre-election polls. The super Tuesday states, which had not seen much attention from pollsters earlier in January, have seen a deluge of polls released in the last week, nicely summarized at the Mystery Pollster blog. As Mark Blumenthal's recent post points out, "Somebody's gonna be wrong". The amount of dispersion present in the recent polls on both sides of the election exhibit is far more than can be accounted for by sampling variability. There are always house effects present in any polling context, but this borders on the ridiculuous. Unlike New Hampshire, no matter how the results turn out (barring a possible McCain collapse), we probably won't see as great a hue and cry about the pollsters this time because their pre-election predictions are all over the map.
Speaking of New Hampshire, Adam Berinsky from MIT e-mailed a few weeks ago to point to several studies that do look at errors in polling when a black candidate is on the ballot. These include Voting Hopes Or Fears?: White Voters, Black Candidates & Racial Politics by Keith Reeves, "Race-of-Interviewer Effects in a Preelection Poll: Virginia 1989" by Finkel, Guterbock, and Borg, and last but not least, Berinsky's Silent Voices: Public Opinion and Political Participation in America. This just goes to show me that a quick Google Scholar search on "Bradley effect" misses a lot of good stuff. We'll see if there is more evidence of such an effect tonight, but it is worth noting that in South Carolina things went the other way; Obama did much better than projected by the pre-election polls. Since they didn't get the outcome wrong, however, the pollsters didn't get nearly as much grief as they did after New Hampshire.
4 February 2008
Apologies for the late post—we’ve experienced some last minute scheduling changes. This week Kevin Quinn, Department of Government, will present ‘Assessing Political Positions of Media’ a project that is joint with Daniel Ho, Stanford Law School. Kevin provided the following abstract:
Although central to understanding the role of the media, few quantitative measures of the
political positions of media exist. We amass a new, large-scale dataset to shed light on this question. Collecting and classifying over 1500 editorials adopted by 25 major U.S. newspapers on 495 Supreme Court cases from 1994-2004, we apply an item response theoretic approach to place newspapers on a substantively meaningful and long validated scale of political preferences. Our results provide significant insights into the study of the media. We show that 18 of the 25 papers are more likely to the left of the median Justice for this period, but also considerable evidence that this may be an artifact of the liberalness of urban, elite, high circulation papers.
Kevin also provided a link to the paper, which is available here
Our workshop will convene this Wednesday at 12 noon with a light lunch, with the presentation to start at 1215. We are located in CGIS-Knafel (1737 Cambridge St) Room N-354.
2 February 2008
This year's Spring Conference of the Harvard Program on Survey Research is on ``New Technologies and Survey Research.'' It will be held on May 9, 2008, 9:00am to 5:00 pm at IQSS, and is open to the public.
See here for details.
1 February 2008
Abstracts are now being accepted for the 2008 useR! conference in Dortmund, Germany. This conference is designed to bring R users and developers together to trade ideas and find out what is new in the sprawling world of R. Several of us went to the Vienna conference a few years ago, and found it very useful. Previous editions have had a good mix of academic and private sector participants, and I learned more than I have at some of the more traditional academic conferences. The announcement from the useR webpage is below; the website is at http://www.statistik.uni-dortmund.de/useR-2008/
useR! 2008, the R user conference, takes place at the Fakultät Statistik, Technische Universität Dortmund, Germany from 2008-08-12 to 2008-08-14. Pre-conference tutorials will take place on August 11.
The conference is organized by the Fakultät Statistik, Technische Universität Dortmund and the Austrian Association for Statistical Computing (AASC). It is funded by the R Foundation for Statistical Computing.
Following the successful useR! 2004, useR! 2006, and useR! 2007 conferences, the conference is focused on
- R as the `lingua franca' of data analysis and statistical computing,
- providing a platform for R users to discuss and exchange ideas how R can be used to do statistical computations, data analysis, visualization and exciting applications in various fields,
- giving an overview of the new features of the rapidly evolving R project.
As for the predecessor conference, the program consists of two parts:
- invited lectures discussing new R developments and exciting applications of R,
- user-contributed presentations reflecting the wide range of fields in which R is used to analyze data.
A major goal of the useR! conference is to bring users from various fields together and provide a platform for discussion and exchange of ideas: both in the formal framework of presentations as well as in the informal part of the conference in Dortmund's famous beer pubs and restaurants.
Prior to the conference, on 2008-08-11, there are tutorials offered at the conference site. Each tutorial has a length of 3 hours and takes place either in the morning or afternoon.
Call for Papers
We invite all R users to submit abstracts presenting innovations or exciting applications of R on topics such as:
Applied Statistics & Biostatistics
Chemometrics and Computational Physics
Econometrics & Finance
Environmetrics & Ecological Modeling
High Performance Computing
Marketing & Business Analytics
Statistics in the Social and Political Sciences
Visualization & Graphics
and many more.
We recommend a length of about one page in pdf format. The program committee decided on the presentation format. There is no proceedings volume, but the abstracts are available in an online collection linked from the conference program and in a single pdf file.
Deadline for submission of abstracts: 2008-03-31.