May 2013
Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries

Categories

Blogroll

SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


April 28, 2013

App Stats: Roberts, Stewart, and Tingley on "Topic models for open ended survey responses with applications to experiments"

We hope you can join us this Wednesday, May 1, 2013 for the Applied Statistics Workshop. Molly Roberts, Brandon Stewart, and Dustin Tingley, all from the Department of Government at Harvard University, will give a presentation entitled "Topic models for open ended survey responses with applications to experiments". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Topic models for open ended survey responses with applications to experiments"
Molly Roberts, Brandon Stewart, and Dustin Tingley
Government Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, May 1st, 2013 12.00 pm

Abstract:

Despite broad use of surveys and survey experiments by political science, the vast majority of survey analysis deals with responses to options along a scale or from pre-established categories. Yet, in most areas of life individuals communicate either by writing or by speaking, a fact reflected in earlier debates about open and closed ended survey questions. Despite good reasons to collect and analyze open ended data, it is relatively rare in the discipline and almost exclusively done through a process involving human coding of survey responses. We present an alternative, semi-automated approach, the Structural Topic Model (STM) (Roberts et al. 2013), that draws on recent developments in machine learning based analysis of textual data. A crucial contribution of the method is that it incorporates information about the text, such as the author's gender, country of origin, treatment status, or when something was written. This paper focuses on how the STM is extremely helpful for descriptive, exploratory, or inferential purposes for survey researchers and experimentalists. The STM makes analyzing open ended responses easier, more revealing, and capable of being used to estimate treatment effects. We illustrate these innovations with several experiments.

Posted by Konstantin Kashin at 11:25 PM | Comments (0)

April 22, 2013

App Stats: Vadhan on "Privacy Tools for Sharing Research Data"

We hope you can join us this Wednesday, April 24, 2013 for the Applied Statistics Workshop. Salil Vadhan, Professor of Computer Science and Applied Mathematics from the School of Engineering & Applied Sciences at Harvard University, will give a presentation entitled "Privacy Tools for Sharing Research Data". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Privacy Tools for Sharing Research Data"
Salil Vadhan
School of Engineering & Applied Sciences, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, April 24th, 2013 12.00 pm

Abstract:

I will give an overview of a large, new multidisciplinary project at Harvard on "Privacy Tools for Sharing Research Data." The project is a collaborative effort between the Center for Research on Computation and Society, the Institute for Quantitative Social Science, and the Berkman Center for Internet and Society, and is funded as a Frontier grant in the NSF Secure and Trustworthy Cyberspace Program, building on seed funding from Google. The goal of the project is to help enable the collection, analysis, and sharing of personal data for research in social science and other fields while providing privacy for individual subjects. Bringing together computer science, social science, statistics, and law, we seek to refine and develop definitions and measures of privacy and data utility, and design an array of technological, legal, and policy tools for social scientists to use when dealing with sensitive data. These tools will be tested and deployed at the Harvard Institute for Quantitative Social Science's Dataverse Network, an open-source digital repository that offers the largest catalogue of social science datasets in the world. In addition to contributing to research infrastructure for social scientists around the world, the ideas developed in the project may benefit society more broadly as it grapples with data privacy issues in many other domains, including public health and electronic commerce.

Posted by Konstantin Kashin at 12:14 AM | Comments (2)

April 15, 2013

Guest Post by Patrick Lam on "Estimating Individual Causal Effects"

Last week, I gave the applied statistics talk at IQSS on some of my research on estimating individual causal effects. Since there was some interest from folks who could not attend, I thought I would give a brief overview of my argument and research.

In the majority of empirical research, the quantity of interest is likely to be some type of average treatment effect, either through a regression model or some other clever research design. For example, we often run a regression of an outcome Y on some treatment W and covariates X and interpret the beta coefficient on W as the "effect" of W on Y given assumptions of ignorability of treatment assignment and no interference across units. While this average treatment effect ATE (and its fancier cousins ATT, ATC, CATE, LATE, etc.) is the easiest causal quantity to estimate, I argue that an ATE is not a very useful or interpretable quantity. Define an individual causal effect (ICE) as Y_i(1) - Y_i(0) for any individual i. An ATE is simply the average of all the individual causal effects in the data or in some larger population: E[Y(1) - Y(0)]. An ATE is not the effect for any specific individual or groups of individuals. It is not even the effect for the average individual. However, implicitly we often have a tendency to attribute the ATE as THE EFFECT for any individual, which is only true if we make the usually unreasonable assumption of constant treatment effects. In short, the ATE is a one-number summary that applies to exactly no individual of interest.

To see this in a trivial and simple example, suppose we have a female birth control pill that in reality prevents pregnancy for every woman that takes it. Now suppose that we didn't know that, but we wanted to test how effective the pill was. So we randomly assign the pills to a evenly distributed sample of men and women. Our results would suggest that the pill was effective in preventing pregnancy approximately 50% of the time. We would then conclude based on the data that the pill is only effective half the time and thus is basically useless as a contraceptive. However, it is trivially obvious that the 50% result is derived from a 100% success rate for women and a 0% success rate for men. The 50% result is not the success rate for any individual and estimating the ATE masked important treatment effect heterogeneity.

One way to account for this heterogeneity is by estimating the conditional average treatment effect (CATE). In this example, we would condition on gender and estimate an average treatment effect for men and one for women. This requires leveraging additional information and defining a variable to condition upon. This is a top down approach in which we subset the data in some way and then estimate an ATE. Of course the example here is trivial, but in most empirical research, it may not be obvious which variables to condition on. Furthermore, the CATE still assumes a constant treatment effect for all individuals within the same covariate strata.

I argue for a different bottom-up approach in which we try to estimate each of the individual causal effects directly. The benefits of directly estimating the ICEs are that

1) they directly estimate the actual quantities of interest, such as an effect for a certain individual or groups of individuals.
2) they allow for discovery of treatment effect heterogeneity through graphical and exploratory approaches.
3) they bridge the gap between quantitative and qualitative research by allowing for small n estimands in a large n framework
4) any other causal quantity such as any ATE can be calculated directly from the ICEs, so estimating ICEs is a more flexible approach.

Of course, the main problem with estimating ICEs is that they are not identified in the data, so the data strictly speaking gives no information about the likelihood of any particular value for any ICE.

To estimate the ICEs, I present introduce a broad framework that leverages the usual causal inference assumptions of treatment assignment ignorability and SUTVA and use existing matching methods coupled with a Bayesian framework to give hints and uncertainty intervals for the ICEs. The Bayesian approach allows for prior qualitative information to be incorporated and also sidesteps the identification issue by defining a posterior over the ICEs. None of the methods used are new, and many date back several decades. But I argue that we can put these existing methods together in a novel way to estimate quantities which are much more important and relevant to researchers.

The basic idea of the estimation process is to impute the missing potential outcomes for each individual. Once the outcomes are imputed, then the ICEs can be calculated in a straightforward manner. The matching algorithms define pools of observations that we can use to help with the imputation and the Bayesian framework gives us uncertainty for the imputations that incorporates uncertainty in both the matching algorithms and the normal estimation uncertainty. The idea of Bayesian imputation of missing potential outcomes dates back at least to Rubin (1978) and Don actually has told me a few times that the imputation idea in general dates by much longer than that, at least back to Neyman. The matching idea and the algorithms used also date back at least to Don's work in the 1970s.

In my talk, I introduced a (hopefully coherent) framework that laid out the assumptions and a model to estimate the ICEs. I also conducted many simulations to test the ability of the model to recover ICEs and also tested several matching specifications. The results suggest that the model actually does a fairly good job of recovering the ICEs although the uncertainty intervals can be quite wide. Nevertheless, they give us hints about plausible ranges of values for the ICEs and aggregating the ICEs to estimate average effects produce nearly identical results to traditional methods. One noteworthy conclusion from the simulations is that the use of regression imputation in which we impute with the predicted values from a regular linear regression generally produces good average results but very poor calibration for individual results. Therefore, one takeaway is that we can use ICEs to estimate both individual and average estimands, but we can only estimate average estimands with ATEs with any accuracy and attempts to get at individual level estimates through ATEs are likely to be incorrect. The last part of my talk uses an existing example from economics and politics on monitoring corruption to demonstrate the flexibility of the approach. I adapt ICE estimation to both binary and continuous treatments and one-stage and two-stage IV type approaches.

For more information and copies of the presentation slides and a rough draft a paper describing the general model and framework, please see http://www.patricklam.org/research.html.

Posted by Konstantin Kashin at 10:54 PM | Comments (1)

App Stats: Pakes on "Moment Inequalities for Semiparametric Multinomial Choice with Fixed Effects"

We hope you can join us this Wednesday, April 17, 2013 for the Applied Statistics Workshop. Ariel Pakes, Professor of Economics from the Department of Economics at Harvard University, will give a presentation entitled "Moment Inequalities for Semiparametric Multinomial Choice with Fixed Effects". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Moment Inequalities for Semiparametric Multinomial Choice with Fixed Effects"
Ariel Pakes
Department of Economics, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, April 17th, 2013 12.00 pm

Abstract:

We propose a new approach to identi cation for multinomial choice models with a group (or panel) structure. We take a standard random utility model of choice, where the utility for each choice is additively separable in a choice-speci c fixed eff ect, a disturbance, and an index function of covariates and parameters. Observations in the same group are assumed to share the same fixed eff ects. Examples of this structure include; (i) Chamberlain's (1980) conditional likelihood estimator for panel data problems with choice speci c fixed eff ects and i.i.d. logistic disturbances, and (ii) models of product demand where markets are the grouping device, the within group observations are consumers, and the choice-speci c fixed effects represent product level unobservables.

We place no restriction on the variance-covariance of the disturbance vector across choices. The only restriction on the disturbances is a group homogeneity assumption. The main cost of the semiparametric flexibility in our model is that the conditional moment inequalities will, in general, only partially identify the index function parameters. The advantages are that it; (i) is non-parametric in the joint distribution of the disturbance vector across choices, (ii) allows for incidental choice speci c e ffects whose cardinality can grow with sample size, and (iii) can be extended to allow for certain types of endogeneity.

Posted by Konstantin Kashin at 10:10 AM | Comments (0)

April 8, 2013

App Stats: Lam on "Estimating Individual Causal Effects"

We hope you can join us this Wednesday, April 10, 2013 for the Applied Statistics Workshop. Patrick Lam, a Ph.D. candidate from the Department of Government at Harvard University, will give a presentation entitled "Estimating Individual Causal Effects". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Estimating Individual Causal Effects"
Patrick Lam
Government Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, April 10th, 2013 12.00 pm

Abstract:

The literature on causal inference has focused primarily on estimating average treatment effects, which aggregate over many individual effects. However, this aggregation often misses treatment effect heterogeneity, which may be of extreme importance. In addition, researchers often estimate average effects but their real quantity of interest is individual effects. In this paper, I develop methods to estimate individual causal effects based on commonly used matching procedures. I show that predictive mean matching performs the best in imputing missing potential outcomes to estimate the individual effects. I then demonstrate the flexibility of estimating individual causal effects and how they can be used to explore questions of interest, recover any other causal quantity, and be adapted to more complicated data structures. I conclude with empirical examples from political science.

Posted by Konstantin Kashin at 12:41 AM | Comments (3)

April 1, 2013

App Stats: Killewald on "His Gain, Her Pain? The Motherhood Penalty and the Fatherhood Premium within Coresidential Couples"

We hope you can join us this Wednesday, April 3, 2013 for the Applied Statistics Workshop. Sasha Killewald, Assistant Professor of Sociology at Harvard University, will give a presentation entitled "His Gain, Her Pain? The Motherhood Penalty and the Fatherhood Premium within Coresidential Couples". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"His Gain, Her Pain? The Motherhood Penalty and the Fatherhood Premium within Coresidential Couples"
Sasha Killewald
Department of Sociology, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, April 3rd, 2013 12.00 pm

Abstract:

Prior research on the association between parenthood and wages has focused at the individual level, documenting a substantial motherhood wage penalty and a smaller fatherhood premium. However, the majority of births occur to coresidential couples, yet we know little about the within-couple association between the motherhood penalty and fatherhood premium. Specialization suggests that women who experience the largest motherhood penalty will tend to be partnered with fathers with the largest premium. However, it is also possible that some couples are better able to defray the wage costs of parenthood for both parents. We bring a dyad perspective to the study of the interaction between parenthood and wages and use random-coefficients models to answer the following questions: 1) What is the average association between the motherhood penalty and the fatherhood premium within couples? 2) How does assortative mating on the basis of race, education, and post-parenthood specialization on paid and unpaid labor time contribute to this association?

Posted by Konstantin Kashin at 10:59 AM | Comments (0)

March 25, 2013

App Stats: Fowler and Hall on "Do Legislators Cater to the Priorities of Their Constituents?"

We hope you can join us this Wednesday, March 27, 2013 for the Applied Statistics Workshop. Anthony Fowler and Andrew B. Hall, Ph.D. Candidates from the Department of Government at Harvard University, will give a presentation entitled "Do Legislators Cater to the Priorities of Their Constituents?". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Do Legislators Cater to the Priorities of Their Constituents?"
Anthony Fowler and Andy Hall
Government Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, March 27th, 2013 12.00 pm

Abstract:

Republican and Democratic legislators vote differently on a large number of bills even when representing constituents of identical preferences. Because constituencies care about some issues more than others, representatives may give short shrift to the district's preferences on some topics while carefully mirroring them on others. The more a district cares about an issue, the more loyally we should see its legislators voting. As a consequence, we should expect the partisan gap in representation -- the difference in voting behavior between a Democrat and a Republican representing the same constituents -- to shrink on issues of greater concern to the district. We test this hypothesis in eight issue areas: agriculture, civil rights, defense, education, energy, public transportation, senior citizens' issues, and welfare. Contrary to expectation, we find little evidence that representational quality improves when constituents have strong personal interests. Across all issues examined, the representational gap between the parties is massive and does not shrink meaningfully in especially-interested districts.

Posted by Konstantin Kashin at 10:34 AM | Comments (0)

March 11, 2013

App Stats: Chamberlain on "Predictive Effects of Teachers and Schools on Test Scores, College Attendance, and Earnings"

We hope you can join us this Wednesday, March 13, 2013 for the Applied Statistics Workshop. Gary Chamberlain, Louis Berkman Professor of Economics from the Department of Economics at Harvard University, will give a presentation entitled "Predictive Effects of Teachers and Schools on Test Scores, College Attendance, and Earnings". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Predictive Effects of Teachers and Schools on Test Scores, College Attendance, and Earnings"
Gary Chamberlain
Department of Economics, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, March 13, 2013 12.00 pm

Abstract:

I study predictive effects of teachers and schools on test scores in fourth through eighth grade and outcomes later in life such as college attendance and earnings. The predictive effects have the following form: predict the fraction of a classroom attending college at age 20 given the test score for a different classroom in the same school with the same teacher, and given the test score for a classroom in the same school with a different teacher. I would like to have predictive effects that condition on averages over many classrooms, with and without the same teacher. I set up a factor model which, under certain assumptions, makes this feasible. Administrative school district data n combination with tax data were used to calculate estimates and do inference.

Posted by Konstantin Kashin at 4:14 AM

March 4, 2013

App Stats: Goodman on "Seeing More in Data"

We hope you can join us this Wednesday, March 6, 2013 for the Applied Statistics Workshop. Alyssa Goodman, a Professor of Astronomy from the Harvard-Smithsonian Center for Astrophysics at Harvard University, will give a presentation entitled "Seeing More in Data". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Seeing More in Data"
Alyssa A. Goodman
Harvard-Smithsonian Center for Astrophysics
CGIS K354 (1737 Cambridge St.)
Wednesday, March 6th, 2013 12.00 pm

Abstract:

Some scientists still think that good data visualization is only necessary when presenting work to "the public." In truth, thinking hard about how to learn the most from any data set should always involve some form of graph, map, chart, or other visual statistical display. This talk will demonstrate how visualization techniques that include so-called "linked views" offer new insights to researchers visualizing large and/or diverse data sets. In particular, the talk will highlight a few high-dimensional visualization examples where ideas about linked views first put forth by John Tukey are extended beyond two-dimensional displays and point clouds. Examples will be principally drawn from astronomy and medical imaging, and software highlighted will include the Universe Information System known as "WorldWide Telescope" (worldwidetelescope.org) and a new python-based linked-view system called "Glue" (glueviz.org).

Posted by Konstantin Kashin at 1:43 AM

February 26, 2013

App Stats: Mozaffarian on "Estimating the Global Impact of Poor Dietary Habits on Chronic Diseases"

We hope you can join us this Wednesday, February 27, 2013 for the Applied Statistics Workshop. Dariush Mozaffarian, Associate Professor in the Department of Epidemiology at the Harvard School of Public Health, will give a presentation entitled "Estimating the Global Impact of Poor Dietary Habits on Chronic Diseases". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Estimating the Global Impact of Poor Dietary Habits on Chronic Diseases"
Dariush Mozaffarian
Department of Epidemiology, Harvard School of Public Health
CGIS K354 (1737 Cambridge St.)
Wednesday, February 27, 2013, 12.00pm

Abstract:

Nearly every nation in the world is undergoing rapid epidemiologic transition toward noncommunicable chronic diseases (NCDs) including cardiovascular disease (CVD), obesity, diabetes, and cancers. Numerous organizations including the United Nations, World Health Organization, US Centers for Disease Control and Prevention, and other national and international organizations have emphasized the importance of dietary habits as a key risk factor for NCDs. Yet, the burdens of suboptimal dietary habits on NCDs globally, as well as heterogeneity in these burdens by region, country, age, and sex, are not established. Quantification of these burdens has been limited by inadequate or absent data on dietary habits in many nations, not only for each country as a whole, but also for age- and sex-specific strata. As part of our work in the 2010 Global Burden of Diseases Nutrition and Chronic Diseases Group, we systematically identified and obtained data on national and subnational individual-level surveys of dietary consumption worldwide; and used a Bayesian hierarchical model to evaluate and account for differences in comparability, assessment methods, representativeness, and missingness. We also quantified effects of dietary habits on NCDs, including differences by age, in new meta-analyses. We compiled additional data to quantify the alternative optimal distribution of key dietary risk factors, and the numbers of cause-specific deaths by country, age, and sex. Using this compilation of global data, we used comparative risk assessment to quantify the impacts of current dietary habits on NCDs in each nation around the world. The case of sugar-sweetened beverages (SSBs) and CVD, adiposity-related cancers, and diabetes will be presented as an example of our newest findings.

Posted by Konstantin Kashin at 12:43 AM

App Stats: Mozaffarian on "Estimating the Global Impact of Poor Dietary Habits on Chronic Diseases"

We hope you can join us this Wednesday, February 27, 2013 for the Applied Statistics Workshop. Dariush Mozaffarian, Associate Professor in the Department of Epidemiology at the Harvard School of Public Health, will give a presentation entitled "Estimating the Global Impact of Poor Dietary Habits on Chronic Diseases". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Estimating the Global Impact of Poor Dietary Habits on Chronic Diseases"
Dariush Mozaffarian
Department of Epidemiology, Harvard School of Public Health
CGIS K354 (1737 Cambridge St.)
Wednesday, February 27, 2013, 12.00pm

Abstract:

Nearly every nation in the world is undergoing rapid epidemiologic transition toward noncommunicable chronic diseases (NCDs) including cardiovascular disease (CVD), obesity, diabetes, and cancers. Numerous organizations including the United Nations, World Health Organization, US Centers for Disease Control and Prevention, and other national and international organizations have emphasized the importance of dietary habits as a key risk factor for NCDs. Yet, the burdens of suboptimal dietary habits on NCDs globally, as well as heterogeneity in these burdens by region, country, age, and sex, are not established. Quantification of these burdens has been limited by inadequate or absent data on dietary habits in many nations, not only for each country as a whole, but also for age- and sex-specific strata. As part of our work in the 2010 Global Burden of Diseases Nutrition and Chronic Diseases Group, we systematically identified and obtained data on national and subnational individual-level surveys of dietary consumption worldwide; and used a Bayesian hierarchical model to evaluate and account for differences in comparability, assessment methods, representativeness, and missingness. We also quantified effects of dietary habits on NCDs, including differences by age, in new meta-analyses. We compiled additional data to quantify the alternative optimal distribution of key dietary risk factors, and the numbers of cause-specific deaths by country, age, and sex. Using this compilation of global data, we used comparative risk assessment to quantify the impacts of current dietary habits on NCDs in each nation around the world. The case of sugar-sweetened beverages (SSBs) and CVD, adiposity-related cancers, and diabetes will be presented as an example of our newest findings.

Posted by Konstantin Kashin at 12:43 AM

February 18, 2013

App Stats: Garcia on "When and Why is Attrition a Problem in Randomized Controlled Experiments and How to Diagnose It"

We hope you can join us this Wednesday, February 20, 2013 for the Applied Statistics Workshop. Fernando Martel Garcia, a Research Fellow at the Harvard School of Public Health, will give a presentation entitled "When and Why is Attrition a Problem in Randomized Controlled Experiments and How to Diagnose It". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"When and Why is Attrition a Problem in Randomized Controlled Experiments and How to Diagnose It"
Fernando Martel Garcia
Harvard School of Public Health
CGIS K354 (1737 Cambridge St.)
Wednesday, February 20th, 2013 12.00 pm

Abstract:

Attrition is the Achilles' Heel of the randomized experiment: it is fairly common, and it can unravel the benefits of randomization. This study considers when and why attrition is a problem, and how it can be diagnosed. The extant literature remains ambiguous because it relies on the language of probability, whereas problematic attrition depends on the underlying causal relations. This ambiguity arises because causation implies correlation but not vice versa. Using the structural causal language of directed acyclic graphs I show attrition is a problem when it is an active collider between the treatment and the outcome, or when the latent outcome is a mediator between the treatment and the attrition. Moreover, whether observed outcomes are representative of all outcomes, or only comparable across experimental arms, depends on two d-separation conditions. One of these is directly testable from the data.

Posted by Konstantin Kashin at 12:30 AM

February 11, 2013

App Stats: Carpenter on "R&D Abandonment in Regulatory Equilibrium: Evidence from Asset Price Shocks Induced by FDA Decisions"

We hope you can join us this Wednesday, February 13, 2013 for the Applied Statistics Workshop. Dan Carpenter, the Allie S. Freed Professor of Government from the Department of Government at Harvard University, will give a presentation entitled "R&D Abandonment in Regulatory Equilibrium: Evidence from Asset Price Shocks Induced by FDA Decisions". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"R&D Abandonment in Regulatory Equilibrium: Evidence from Asset Price Shocks Induced by FDA Decisions"
Dan Carpenter
Government Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, February 13th, 2013 12.00 pm

This is joint work with Jessica Blankshain (Harvard University) and Susan Moffitt (Brown University).

Abstract:

Observers of approval regulation regimes such as FDA drug review have long proposed that they cause private companies to avoid developing new products that would otherwise have been marketed. The welfare conclusions and policy recommendations vary, but the causal claim is common. Yet most such claims suffer from the problem of endogeneity and non-random assignment, such that the necessary counterfactual cannot be sustained. If a regulatory decision occurs and drug projects are discontinued or delayed, the analyst cannot usually infer whether it was a change in regulation or something else that caused the project abandonment. Using a rich dataset on the development of over 15,000 pharmaceutical investment projects from 1989 to 2003, we examine responses in development projects to "bad news" regulatory announcements weighted by the asset price shocks in a general equilibrium financial market. Using a Lévy process model of asset price evolution, we demonstrate that the abrupt changes in sponsor asset prices upon the announcement of adverse regulatory news are plausibly non-anticipable for all participants but the regulator. Specifically, for the development projects of companies other than the sponsor affected, they are quasi-random, conditional on all information known on the day before the announcement. This assumption is supported by analysis of data, and then used to identify a model of regulatory effects upon drug development. The results suggest robust effects of induced project abandonment by regulatory decisions; a ten percent (negative) shock to the sponsor's asset price in response to adverse FDA news is sufficient to induce a three to four percent increase in the hazard rate of drug project discontinuation for all other firms' projects in the months following the news. While some immediate responses to adverse regulatory news are witnessed, most response takes place in a six month period following the event. Effects are larger for bad news from advisory committee decisions and FDA requests for additional data, and are negative (development-facilitating) for surprise other-company abandonments where FDA factors are implicit. The results are generally supportive of dominant theoretical models of endogenous approval regulation (Carpenter and Ting 2007), but policy implications are unclear and depend upon the potential health and welfare effects of the therapies foregone.

Posted by Konstantin Kashin at 1:57 AM

February 4, 2013

App Stats: Hatfield on "Statistical properties and health policy applications of microsimulation"

We hope you can join us this Wednesday, February 6, 2013 for the Applied Statistics Workshop. Laura Hatfield, an Assistant Professor from the Department of Health Care Policy at the Harvard Medical School, will give a presentation entitled "Statistical properties and health policy applications of microsimulation". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Statistical properties and health policy applications of microsimulation"
Laura Hatfield
Department of Health Care Policy, Harvard Medical School
CGIS K354 (1737 Cambridge St.)
Wednesday, September 6th, 2013 12.00 pm

Abstract:

When forecasting the impact of novel policy interventions, simulations are standard. If the behavior of the entire system is complex and not well identified by existing data, simulations that focus on the behavior of smaller units, such as individuals, may be preferred. So-called microsimulation models can incorporate complications such as clustering, nonlinearity, non-standard distributions, and time-dependence. This talk will present an overview of microsimulation techniques, with a focus on statistical features and dynamic (vs static) simulation, especially in health policy settings. I will also describe the current development of a model of health insurance coverage and health care spending of Medicare beneficiaries.

Posted by Konstantin Kashin at 1:42 AM

January 27, 2013

App Stats: Zajonc on "Sense - A Fully Bayesian Data Analysis Environment for the Cloud Era"

We hope you can join us this Wednesday, January 30, 2013 for the Applied Statistics Workshop. Tristan Zajonc, a Visiting Fellow at IQSS, will give a presentation entitled "Sense - A Fully Bayesian Data Analysis Environment for the Cloud Era". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Sense - A Fully Bayesian Data Analysis Environment for the Cloud Era"
Tristan Zajonc
IQSS, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, January 30th, 2013 12.00 pm

Abstract:

Over the last 20 years, probabilistic modeling has become the preeminent paradigm for complex statistical analysis in science and social science. Yet many probabilistic models remain difficult to represent, compose, estimate, and validate using traditional tools. This talk introduces Sense, a new data analysis environment that embeds probabilistic modeling into the core experience. The talk will demonstrate the power of this approach through a whirlwind tour of probabilistic model representation, composition, estimation, and validation. The talk will be targeted at both applied researchers in the sciences and social sciences and those interested in the future of statistical computing.

Posted by Konstantin Kashin at 5:18 PM

November 26, 2012

App Stats: Hainmueller and Yamamoto on "Causal Inference in Conjoint Analysis: Understanding Multi-Dimensional Choices via Stated Preference Experiments"

We hope you can join us this Wednesday, November 28, 2012 for the Applied Statistics Workshop. Jens Hainmueller and Teppei Yamamoto, Associate Professor and Assistant Professor, respectively, from the Department of Political Science at MIT, will give a presentation entitled "Causal Inference in Conjoint Analysis: Understanding Multi-Dimensional Choices via Stated Preference Experiments". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Causal Inference in Conjoint Analysis: Understanding Multi-Dimensional Choices via Stated Preference Experiments"
Jens Hainmueller and Teppei Yamamoto
Department of Political Science, MIT
CGIS K354 (1737 Cambridge St.)
Wednesday, November 28th, 2012 12.00 pm

Abstract:

For decades, market researchers have used conjoint analysis to understand how consumers make decisions when faced with multi-dimensional choices. In such analyses, respondents are asked to score or rank a set of alternatives, where each alternative is defined by multiple attributes which are varied randomly or intentionally. Political scientists are frequently interested in parallel questions about decision-making, yet to date conjoint analysis has seen little use within the field. In this manuscript, we demonstrate the potential value of conjoint analysis in political science, using examples about vote choice and immigrant admission to the United States. In doing so, we develop a set of statistical tools for drawing causal conclusions from stated preference data based on the potential outcomes framework of causal inference. We discuss the causal estimands of interest and provide a formal analysis of the assumptions required for identifying those quantities. Prior conjoint analyses have typically used designs which limit the number of unique conjoint profiles. We employ a survey experiment to compare this approach to a fully randomized approach. Both our formal analysis of the causal estimands and our empirical results highlight the potential biases of common approaches to conjoint analysis which restrict the number of profiles.

Posted by Konstantin Kashin at 2:24 AM

November 11, 2012

App Stats: Pattanayak on "A Potential Outcomes, and Typically More Powerful, Alternative to 'Cochran-Mantel-Haenszel'"

We hope you can join us this Wednesday, November 14, 2012 for the Applied Statistics Workshop. Cassandra Wolos Pattanayak, a College Fellow from the Department of Statistics at Harvard University, will give a presentation entitled "A Potential Outcomes, and Typically More Powerful, Alternative to 'Cochran-Mantel-Haenszel'". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"A Potential Outcomes, and Typically More Powerful, Alternative to 'Cochran-Mantel-Haenszel'"
Cassandra Wolos Pattanayak
Statistics Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, November 14th, 2012 12.00 pm

Abstract:

In studies of public health, outcome measures such as the odds ratio, rate ratio, or efficacy are often estimated across strata to assess the overall effect of active treatment versus control treatment. Patients may be partitioned into such strata or blocks by experimental design, or, in non-randomized studies, patients may be partitioned into subclasses based on key covariates or estimated propensity scores to improve observed covariate balance across treatment groups. In finite samples, there exist tests and intervals for these estimands that can be more powerful than tests and intervals created with Cochran-Mantel-Haenszel or analogous procedures . The proposed methods multiply impute missing potential outcomes within the Rubin Causal Model so that estimands can be directly estimated. The assumptions underlying these typically more powerful methods are appropriate in many circumstances, especially when the strata are based on covariates highly predictive of treatment decisions and outcomes. When used to draw inferences about a population from which the patients in the study are considered a random sample, and the sample is large, these methods are extremely similar to the classical methods. The proposed approach is particularly relevant when assessing the safety of a new treatment relative to a standard one because, under typical conditions, the tests are more powerful and the intervals are shorter, thereby detecting smaller differences.

Posted by Konstantin Kashin at 9:53 PM

November 5, 2012

App Stats: Bischof on "Summarizing Topical Content in Document Collections with Word Frequency and Exclusivity"

We hope you can join us this Wednesday, November 7, 2012 for the Applied Statistics Workshop. Jon Bischof, a Ph.D. candidate from the Department of Statistics at Harvard University, will give a presentation entitled "Summarizing Topical Content in Document Collections with Word Frequency and Exclusivity". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Summarizing Topical Content in Document Collections with Word Frequency and Exclusivity"
Jon Bischof
Department of Statistics, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, November 7th, 2012 12.00 pm

Abstract:

An ongoing challenge in the analysis of document collections is how to summarize content in terms of a set of inferred themes that can be interpreted substantively in terms of topics. However, the current practice of summarizing themes in terms of most frequent words limits interpretability by ignoring the differential use of words across topics. We argue that words that are both frequent and exclusive to a theme are more effective at characterizing topical content. We consider a setting where professional editors have annotated documents to a collection of topic categories, organized into a tree, in which leaf-nodes correspond to the most specific topics. Each document is annotated to multiple categories, at different levels of the tree. We introduce Hierarchical Poisson Convolution (HPC) as a model to analyze annotated documents in this setting. The model leverages the structure among categories defined by professional editors to infer a clear semantic description for each topic in terms of words that are both frequent and exclusive. We develop a parallelized Hamiltonian Monte Carlo sampler that allows the inference to scale to millions of documents.

Posted by Konstantin Kashin at 11:29 AM

October 22, 2012

App Stats: Hazlett and Hainmueller on "Kernel Regularized Least Squares: Moving Beyond Linearity and Additivity Without Sacrificing Interpretability"

We hope you can join us this Wednesday, October 24, 2012 for the Applied Statistics Workshop. Chad Hazlett, a Ph.D. student from the Department of Political Science at MIT, will give a presentation entitled "Kernel Regularized Least Squares: Moving Beyond Linearity and Additivity Without Sacrificing Interpretability" (this is joint work with Jens Hainmueller from MIT). A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Kernel Regularized Least Squares: Moving Beyond Linearity and Additivity Without Sacrificing Interpretability"
Chad Hazlett and Jens Hainmueller
Department of Political Science, MIT
CGIS K354 (1737 Cambridge St.)
Wednesday, October 24th, 2012 12.00 pm

Abstract:

We propose the use of Kernel Regularized Least Squares (KRLS) for social science modeling and inference problems. KRLS borrows from machine learning methods designed to solve regression and classification problems without relying on linearity or additivity assumptions. The method constructs a flexible hypothesis space that uses kernels as radial basis functions and finds the best fitting surface in this space by minimizing a complexity-penalized least squares problem. We provide an accessible explanation of the method and argue that it is well suited for social science inquiry because it avoids strong parametric assumptions and still allows for simple interpretation in ways analogous to OLS or other members of the GLM family. We also extend the method in several directions to make it more effective for social inquiry. In particular, we (1) derive new estimators for the pointwise marginal effects and their variances, (2) establish unbiasedness, consistency, and asymptotic normality of the KRLS estimator under fairly general conditions, (3) develop an automated approach to chose smoothing parameters, and (4) provide companion software. We illustrate the use of the methods through several simulations and a real-data example.

Posted by Konstantin Kashin at 1:17 AM

October 15, 2012

App Stats: Scanlan on "The Mismeasure of Group Differences in the Law and the Social and Medical Sciences"

We hope you can join us this Wednesday, October 17, 2012 for the Applied Statistics Workshop. James Scanlan, an Attorney-at-Law, will give a presentation entitled "The Mismeasure of Group Differences in the Law and the Social and Medical Sciences". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"The Mismeasure of Group Differences in the Law and the Social and Medical Sciences"
James Scanlan
Attorney-at-Law
CGIS K354 (1737 Cambridge St.)
Wednesday, October 17th, 2012 12.00 pm

Abstract:

This paper addresses the problematic nature of efforts in the law and the social and medical sciences to appraise the comparative circumstances of advantaged and disadvantaged groups on the basis of standard measures of differences in outcome rates, given that such measures tend to be systematically affected by the prevalence of an outcome. The rarer an outcome the greater tends to be the relative difference in experiencing it and the smaller tends to be the relative difference in avoiding it. Thus, for example, as mortality declines relative differences in mortality of advantaged and disadvantaged groups tend to increase while relative differences in survival tend to decrease; as procedures like immunization and cancer screening become more common, relative differences in rates of receipt of those procedures tend to decrease while relative differences in rates of failing to receive them tend to increase; relaxing mortgage lending criteria tends to increase relative differences in mortgage rejection rates while reducing relative differences in mortgage approval rates. Similarly, among subpopulations where adverse outcomes are comparatively rare (e.g., persons with high education or high income, British civil servants), relative differences in adverse outcomes tend to be larger, while relative differences in favorable outcomes tend to be smaller, than among subpopulations where adverse outcome are more common. Absolute differences between outcome rates and differences measured by odds ratios are unaffected by whether one examines the favorable or the adverse outcome. But such measures tend also to be affected by the overall prevalence of an outcome, though in a more complicated way than the two relative differences. Broadly, as uncommon outcomes become more common absolute differences tend to increase; as already common outcomes become even more common, absolute differences tend to decrease. Differences measured by odds ratios tend to change in the opposite direction of absolute differences as the prevalence of an outcome changes. The paper will explain these patterns and the misinterpretations of data on group differences arising from the failure to understand them. It will also describe a method for appraising the size of the difference in circumstances reflected by outcome rates of advantaged and disadvantaged groups that is theoretically unaffected by the prevalence of the outcome.

References:

Posted by Konstantin Kashin at 12:41 AM

October 8, 2012

App Stats: Teixeira on "Viral Video Advertising"

We hope you can join us this Wednesday, October 10, 2012 for the Applied Statistics Workshop. Thales Teixeira, Assistant Professor of Business Administration at the Harvard Business School, will give a presentation entitled "Viral Video Advertising". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Viral Video Advertising"
Thales Teixeira
Harvard Business School
CGIS K354 (1737 Cambridge St.)
Wednesday, October 10th, 2012 12.00 pm

Abstract:

To become viral, online video ads need to be viewed and then shared. Yet, what works for one decision may not work for the other. In this research we propose a novel consumer-centric model of viral advertising consisting of viewing and sharing decisions. We apply the model to assess the role of humor, present in 91% of viral ads, by teasing out the differential impact that type of humor (pure or shocking) has on each decision. In the lab, we record the facial expressions of consumers as they watch online ads containing either pure (i.e., smile, laughter) or shocking humor (e.g., shock from profanity), and examine its impact on their decisions. The video data is processed using face tracking software and used to calibrate a dynamic sequential model that accounts for both within and cross-decision dynamics. We find that shocking humor increases viewing but reduces sharing compared to no humor at all. Yet, content isn't the only factor of viral ad success; individual traits also matter. We also find that highly extraverted and self-directed consumers share humor ads more often and to a broader group of people each time. The magnitude of the effects of these two novel findings is then measured in a viral field study in which we selectively sent ads to participants and tracked views derived from sharing. We find that extraverted people garnered 300% more total views by sharing non-shocking humor ads than introverted people sharing ads low in humor.

Posted by Konstantin Kashin at 2:32 AM

September 30, 2012

App Stats: Kasy on "Identification in General Triangular Systems"

We hope you can join us this Wednesday, October 3, 2012 for the Applied Statistics Workshop. Maximilian Kasy, Assistant Professor of Economics from the Department of Economics at Harvard University, will give a presentation entitled "Identification in General Triangular Systems". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Identification in General Triangular Systems"
Maximilian Kasy
Department of Economics, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, October 3rd, 2012 12.00 pm

Abstract:

This paper discusses identification in continuous triangular systems without restrictions on heterogeneity or functional form. In particular, we do not assume separability of structural functions, restrictions on the dimensionality of unobservables, or monotonicity in unobservables. We do maintain monotonicity of the first stage relationship in the instrument. We show that under this condition alone, and given rich enough support of the data, we can achieve point identification of potential outcome distributions, and in particular of the average structural function. If the support of the continuous instrument is not large enough potential outcome distributions are partially identified. If the instrument is discrete identification fails completely. The setup discussed in this paper covers important cases not covered by existing approaches such as conditional moment restrictions (c.f. Newey and Powell, 2003) and control variables (c.f. Imbens and Newey, 2009). It covers, in particular, random coefficient models, as well as models arising as the reduced form of a system of structural equations.

Posted by Konstantin Kashin at 11:28 PM

September 24, 2012

App Stats: Miratrix on "Random Weight Estimators: Adjusting Randomized Trials Without Using Observed Outcomes"

We hope you can join us this Wednesday, September 26, 2012 for the Applied Statistics Workshop. Luke Miratrix, Assistant Professor of Statistics in the Department of Statistics at Harvard University, will give a presentation entitled "Random Weight Estimators: Adjusting Randomized Trials Without Using Observed Outcomes". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Random Weight Estimators: Adjusting Randomized Trials Without Using Observed Outcomes"
Luke Miratrix
Department of Statistics, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, September 26th, 2012 12.00 pm

Abstract:

To increase the precision of a randomized trial, experimenters often adjust estimates of treatment effects using baseline covariates thought to predict the outcome of interest. In a previous paper, we proved that even under the Neyman-Rubin model, if the covariates and the method for adjustment are determined before randomization, this process can increase precision in a manner quite similar to a comparable blocked experiment. Typically, however, experimenters wish to adjust for the covariates that are most imbalanced between treatment and control, given the realized randomization. This leads to a much vexed variable selection problem that depends on the observed treatment assignment. To understand the issues behind this process, we examine a class of estimators we call "Random Weight Estimators" that adjust treatment effect estimates by weighting units with weights depending on a function on treatment assignment and covariates. While similar in spirit to blocking, these estimators can be applied "after the fact,'' i.e., after randomization has occurred, allowing them to naturally adapt to the observed treatment assignment. They can also adjust for many different covariates at once, including continuous ones. This class is quite general, and it includes traditional methods such as ordinary linear regression. Using our framework, we show, under the Neyman-Rubin model, how one can easily introduce potential bias using what would seem to be legitimate and simple approaches, especially in small and midsize experiments. Care must be taken with many forms of adjustment, even if an approach is selected without regard to any actual outcomes. We also extend this methodology to survey experiments, giving an appropriate and near-unbiased estimator for the treatment effect of a parent population. Throughout the talk, we illustrate this overall framework.

Posted by Konstantin Kashin at 11:40 AM

September 16, 2012

App Stats: Nielsen on "Jihadi Radicalization of Muslim Clerics"

We hope you can join us this Wednesday, September 19, 2012 for the Applied Statistics Workshop. Rich Nielsen, a Ph.D. candidate from the Department of Government at Harvard University, will give a practice job talk entitled "Jihadi Radicalization of Muslim Clerics". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Jihadi Radicalization of Muslim Clerics"
Rich Nielsen
Government Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, September 19th, 2012 12.00 pm

Abstract:

This paper explains why some Muslim clerics adopt the ideology of militant Jihad while others do not. I argue that clerics strategically adopt or reject Jihadi ideology because of career incentives generated by the structure of cleric educational networks. Well-connected clerics enjoy substantial success at pursuing comfortable careers within state-run religious institutions and they reject Jihadi ideology in exchange for continued material support from the state. Clerics with poor educational networks cannot rely on connections to advance through the state-run institutions, so many pursue careers outside of the system by appealing directly to lay audiences for support. These clerics are more likely to adopt Jihadi ideology because it helps them demonstrate to potential supporters that they have not been theologically coopted by political elites. I provide evidence of these dynamics by collecting and analyzing 29,430 fatwas, articles, and books written by 91 contemporary clerics. Using statistical natural language processing, I measure the extent to which each cleric adopts Jihadi ideology in their writing. I combine this with biographical and network information about each cleric to trace the process by which poorly-connected clerics become more likely to adopt Jihadi ideology.

Posted by Konstantin Kashin at 10:37 PM

September 9, 2012

App Stats: Robins on "A Simple Unification of the Potential Outcome and Causal Graph Approaches to Causal Inference"

We hope you can join us this Wednesday, September 12, 2012 for the Applied Statistics Workshop. Jamie Robins, Professor of Epidemiology from the Harvard School of Public Health, will give a presentation entitled "A Simple Unification of the Potential Outcome and Causal Graph Approaches to Causal Inference". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"A Simple Unification of the Potential Outcome and Causal Graph Approaches to Causal Inference"
Jamie Robins
Harvard School of Public Health
CGIS K354 (1737 Cambridge St.)
Wednesday, September 12th, 2012 12.00 pm

Abstract:

Potential outcomes are extensively used within statistics, epidemiology, and political science for reasoning about causation. Directed acyclic graphs are another formalism used to represent causal systems. They are extensively used in computer science, bioinformatics, sociology and epidemiology. It is natural to wish to unify them.

We present a simple approach to this unification. The approach is based on the idea of splitting nodes to construct graphs whose nodes are potential outcomes. The resulting graph can be used to read off counterfactual independencies. These independencies are satisfied by all previously proposed graphical and nongraphical causal models. We review many examples to illustrate the power of this approach.

This is joint work with Thomas Richardson at the University of Washington.

Posted by Konstantin Kashin at 5:53 PM

September 3, 2012

App Stats: Grubb on "Cellular Service Demand: Biased Beliefs, Learning, and Bill Shock"

We hope you can join us this Wednesday, September 5, 2012 for the first Applied Statistics Workshop of the Fall 2012 semester. Michael Grubb, an Assistant Professor of Applied Economics from the MIT Sloan School of Management, will give a presentation entitled "Cellular Service Demand: Biased Beliefs, Learning, and Bill Shock". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Cellular Service Demand: Biased Beliefs, Learning, and Bill Shock"
Michael Grubb
MIT Sloan School of Management
CGIS K354 (1737 Cambridge St.)
Wednesday, September 5th, 2012 12.00 pm

Abstract:

By April 2013, the FCC's recent bill-shock agreement with cellular carriers requires consumers be notified when exceeding usage allowances. Will the agreement help or hurt consumers? To answer this question, we estimate a model of consumer plan choice, usage, and learning using a panel of cellular bills. Our model predicts that the agreement will lower average consumer welfare by $2 per year because firms will respond by raising monthly fees. Our approach is based on novel evidence that consumers are inattentive to past usage (meaning that bill-shock alerts are informative) and advances structural modeling of demand in situations where multi-part tariffs induce marginal-price uncertainty. Additionally, our model estimates show that an average consumer underestimates both the mean and variance of future calling. These biases cost consumers $42 per year at existing prices. Moreover, absent bias, the bill-shock agreement would have little to no effect.

Posted by Konstantin Kashin at 3:02 PM

April 23, 2012

App Stats: Elwert on "Endogenous Selection"

We hope you can join us this Wednesday, April 25, 2012 for the final session of the Applied Statistics Workshop this semester. Felix Elwert, Assistant Professor from the Department of Sociology at the University of Wisconsin-Madison, will give a presentation entitled "Endogenous Selection". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Endogenous Selection"
Felix Elwert
Department of Sociology, University of Wisconsin-Madison
CGIS K354 (1737 Cambridge St.)
Wednesday, April 25th, 2012 12.00 pm

Abstract:

Selection bias is a central problem for causal inference in the social sciences. Quite how central a problem it is, however, is often obscured by ambiguous terminology, needlessly technical presentations, and narrow rules of thumb. This paper uses directed acyclic graphs (DAGs) to advance a precise yet intuitive global definition of endogenous selection bias and argue its theoretical and practical centrality for causal inference. The paper clarifies the fundamental structural difference between confounding and endogenous selection, shows that nearly all non-parametric identification problems relate to either confounding or endogenous selection, and argues that the problem of endogenous selection is indifferent to timing. Perhaps most importantly, we illustrate the importance of endogenous selection bias with numerous and varied examples from empirical social research.

This is joint work with Chris Winship.

Posted by Konstantin Kashin at 12:43 PM

April 16, 2012

App Stats: Wasow on "Violence and Voting: Did the 1960s Urban Riots Reshape American Politics?"

We hope you can join us this Wednesday, April 18, 2012 for the Applied Statistics Workshop. Omar Wasow, a Ph.D. candidate from the Department of Government and the Department of African and African American Studies at Harvard University, will give a presentation entitled "Violence and Voting: Did the 1960s Urban Riots Reshape American Politics?" A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Violence and Voting: Did the 1960s Urban Riots Reshape American Politics?"
Omar Wasow
Government Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, April 18th, 2012 12.00 pm

Abstract:

Between 1964 and 1971, more than 750 riots flared up in black neighborhoods across the United States. Scholarship on how the American polity respond to these violent protests is contested. Some scholars argue that urban riots produced a conservative ``backlash'' among white voters, while other scholars find little or no effect. Using a measure that incorporates the location, timing and severity of urban riots between 1964 and 1971, I examine whether increased exposure to urban riots is associated with decreased support for the Democratic party. In the 1964, 1968 and 1972 presidential elections, I find a strong negative relationship between exposure to civil unrest and the county-level Democratic vote share. I find a similar negative relationship between exposure to riots and Democratic vote share in congressional elections between 1968 and 1972. Finally, I find that in counterfactual scenarios of fewer riots the Democratic presidential nominee, Hubert Humphrey, would have beaten the Republican nominee, Richard Nixon, in the 1968 election. As African Americans were strongly identified with the Democratic party in this time period, my results suggest that, in at least some contexts, political violence by a minority group may contribute to a backlash among segments of the mass electorate and encourage outcomes directly at odds with the preferences of the protestors.

Posted by Konstantin Kashin at 12:53 AM

April 9, 2012

App Stats: Glynn on "Using Post-Treatment Variables to Establish Upper Bounds on Causal Effects: Assessing Executive Selection Procedures in New Democracies"

We hope you can join us this Wednesday, April 11, 2012 for the Applied Statistics Workshop. Adam Glynn, Associate Professor from the Department of Government at Harvard University, will give a presentation entitled "Using Post-Treatment Variables to Establish Upper Bounds on Causal Effects: Assessing Executive Selection Procedures in New Democracies". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Using Post-Treatment Variables to Establish Upper Bounds on Causal Effects: Assessing Executive Selection Procedures in New Democracies"
Adam Glynn
Government Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, April 11th, 2012 12.00 pm

Abstract:

In this paper we propose an adjustment based on post-treatment variables for some standard estimators of the average treatment effect on the treated. Under relatively weak conditions, this adjusted estimator will provide an upper bound for the effect and in some cases lower bounds on p-values. Additionally, this approach does not place a restriction on the outcome variable and allows for multiple mechanisms by which the treatment has an effect on the outcome. We also demonstrate that this adjustment will reduce the estimated effect in a wide variety of circumstances, and therefore, when the assumptions for the adjusted estimator are preferable to the assumptions for the unadjusted estimator, the adjustment can be used as a robustness check. This method is illustrated with an assessment of the effects of using plurality rules for the first multi-party presidential elections in third wave of democracy in sub-Saharan Africa.

This is joint work with Nahomi Ichino.

Posted by Konstantin Kashin at 11:20 AM

April 1, 2012

App Stats: Bahar on "International Knowledge Diffusion and the Comparative Advantage of Nations"

We hope you can join us this Wednesday, April 4, 2012 for the Applied Statistics Workshop. Dany Bahar, a Ph.D. Candidate in Public Policy at the Harvard Kennedy School, will give a presentation entitled "International Knowledge Diffusion and the Comparative Advantage of Nations". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"International Knowledge Diffusion and the Comparative Advantage of Nations"
Dany Bahar
Harvard Kennedy School
CGIS K354 (1737 Cambridge St.)
Wednesday, April 4th, 2012 12.00 pm

Abstract:

In this paper we document that the probability that a product is added to a country's export basket is, on average, 65% larger if a neighboring country is a successful exporter of that same product. We interpret our result as evidence of international intra-industry knowledge diffusion. Our results are consistent with the overall consensus in the literature on technology spillovers: diffusion is stronger at shorter distances; is weaker for more knowledge-intensive products; and has become faster over time.

This is joint work with Ricardo Hausmann and Cesar Hidalgo.

Posted by Konstantin Kashin at 11:44 PM

March 26, 2012

App Stats: Yamamoto on "A Multinomial Response Model for Varying Choice Sets, with Application to Partially Contested Multiparty Elections"

We hope you can join us this Wednesday, March 28, 2012 for the Applied Statistics Workshop. Teppei Yamamoto, Assistant Professor from the Department of Political Science at MIT, will give a presentation entitled "A Multinomial Response Model for Varying Choice Sets, with Application to Partially Contested Multiparty Elections". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"A Multinomial Response Model for Varying Choice Sets, with Application to Partially Contested Multiparty Elections"
Teppei Yamamoto
Department of Political Science, MIT
CGIS K354 (1737 Cambridge St.)
Wednesday, March 28th, 2012 12.00 pm

Abstract:

This paper proposes a new multinomial choice model which explicitly takes into account variation in choice sets across observations. The proposed varying choice set logit model relaxes the independence of irrelevant alternatives assumption by allowing the individual random utility function to directly depend on choice set types, and can be applied to a variety of data in which some individuals can only choose from a subset of the theoretically possible responses. Both frequentist and Bayesian simulation-based estimation procedures are developed using the Monte Carlo expectation-maximization algorithm and Markov chain Monte Carlo, respectively. The proposed model can be used to analyze survey data in partially contested multiparty elections in which some political parties do not run their candidates in every district. For illustration, I apply the proposed method to the 1996 Japanese general election, where none of the districts was contested by all of the six major parties.

Posted by Konstantin Kashin at 1:20 AM

March 19, 2012

App Stats: Reshef on "Detecting Novel Bivariate Associations in Large Data Sets"

We hope you can join us this Wednesday, March 21, 2012 for the Applied Statistics Workshop. David Reshef, an MD/PhD student at the Harvard-MIT Division of Health Sciences and Technology (HST), will give a presentation entitled "Detecting Novel Bivariate Associations in Large Data Sets". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Detecting Novel Bivariate Associations in Large Data Sets"
David Reshef
Harvard-MIT Division of Health Sciences and Technology
CGIS K354 (1737 Cambridge St.)
Wednesday, March 21st, 2012 12.00 pm

Abstract:

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. One way of doing so is to search such data sets for pairs of variables that are closely associated. This can be done by calculating some measure of dependence for each pair, ranking the pairs by their scores, and examining the top-scoring pairs. We outline two heuristic properties--generality and equitability--that the statistic we use to measure dependence should have in order for such a strategy to be effective. We present a measure of dependence for two-variable relationships, the maximal information coefficient (MIC), that has these properties. MIC captures a wide range of associations both functional and not (generality), and assigns similar scores to relationships with similar noise levels, regardless of relationship type (equitability). Finally, we show that MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships.

Posted by Konstantin Kashin at 12:35 AM

March 5, 2012

App Stats: Goodman on "Flaking Out: Snowfall, Disruptions of Instructional Time, and Student Achievement"

We hope you can join us this Wednesday, March 7, 2012 for the Applied Statistics Workshop. Joshua Goodman, Assistant Professor of Public Policy at the Harvard Kennedy School, will give a presentation entitled "Flaking Out: Snowfall, Disruptions of Instructional Time, and Student Achievement". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Flaking Out: Snowfall, Disruptions of Instructional Time, and Student Achievement"
Joshua Goodman
Harvard Kennedy School
CGIS K354 (1737 Cambridge St.)
Wednesday, March 7th, 2012 12.00 pm

Abstract:

Recent research on charter schools, summer learning loss, and international achievement suggests that instructional time is a critical input to the education production function. Using student and school-grade fixed effects models with data from Massachusetts, I find no relation between school closures and achievement but a strong relation between student absences and achievement. I then confirm these results using temporal and spatial variation in snowfall to provide better identification. Extreme snowfall induces school closures but does not affect achievement. Moderate snowfall induces student absences and does reduce achievement. Instrumental variables estimates suggest that each absence induced by bad weather reduces math achievement by 0.05 standard deviations. These results are consistent with a model of instruction in which coordination of students is the central challenge. Teachers deal well with coordinated disruptions of instructional time like school closures, but deal poorly with absences that affect different students and different times. These estimates suggest that absences are responsible for up to 20% of the achievement gap between poor and nonpoor students. They also suggest that policies designed solely to increase instructional time may not be effective.

Posted by Konstantin Kashin at 3:58 AM

February 27, 2012

App Stats: Pfister on "Visual Computing in Biology"

We hope you can join us this Wednesday, February 29, 2012 for the Applied Statistics Workshop. Hanspeter Pfister, Gordon McKay Professor of Computer Science at the School of Engineering and Applied Sciences at Harvard University, will give a presentation entitled "Visual Computing in Biology". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Visual Computing in Biology"
Hanspeter Pfister
School of Engineering and Applied Sciences, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, February 29th, 2012 12.00 pm

Abstract:

Many areas in science are experiencing a flood of data arising in part from the development of instruments that acquire information on an unprecedented scale. This is particularly true in biology, where huge amounts of heterogeneous data are acquired from microarrays, scanners, microscopes, and various other instruments. Visual computing tools are essential to gain insights into this data by combining computational analysis with the power of the human perceptual and cognitive system and enabling data exploration through interactive visualizations. In this talk I will present some of my group's work in visual computing and give an overview of several successful visualization projects in the areas of genomics and systems biology. I then will focus on our work on visual computing in Connectomics, a new field in neuroscience that aims to apply biology and computer science to the grand challenge of determining the detailed neural circuitry of the brain.

Posted by Konstantin Kashin at 1:19 AM

February 20, 2012

App Stats: Dominici on "Bayesian Effect Estimation Accounting for Adjustment Uncertainty"

We hope you can join us this Wednesday, February 22, 2012 for the Applied Statistics Workshop. Francesca Dominici, Professor of Biostatistics from the Department of Biostatistics at the Harvard School of Public Health, will give a presentation entitled "Bayesian Effect Estimation Accounting for Adjustment Uncertainty". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Bayesian Effect Estimation Accounting for Adjustment Uncertainty"
Francesca Dominici
Department of Biostatistics, Harvard School of Public Health
CGIS K354 (1737 Cambridge St.)
Wednesday, February 22nd, 2012 12.00 pm

Abstract:

Model-based estimation of the effect of an exposure on an outcome is generally sensitive to the choice of which confounding factors are included in the model. We propose a new approach, which we call Bayesian Adjustment for Confounding (BAC), to estimate the effect on the outcome associated with an exposure of interest while accounting for the uncertainty in the confounding adjustment. Our approach is based on specifying two models: 1) the outcome as a function of the exposure and the potential confounders (the outcome model); and 2) the exposure as a function of the potential confounders (the exposure model). We consider Bayesian variable selection on both models and link the two by introducing a dependence parameter ω denoting the prior odds of including a predictor in the outcome model, given that the same predictor is in the exposure model. In the absence of dependence (ω = 1), BAC reduces to traditional Bayesian Model Averaging (BMA). In simulation studies we show that BAC with ω > 1 estimates the exposure effect with smaller bias than traditional BMA, and improved coverage. We then compare BAC, a recent approach of Crainiceanu et al. (2008), and traditional BMA in a time series data set of hospital admissions, air pollution levels and weather variables in Nassau, NY for the period 1999-2005. Using each approach, we estimate the short-term effects of PM2.5 on emergency admissions for cardiovascular diseases, accounting for confounding. This application illustrates the potentially significant pitfalls of misusing variable selection methods in the context of adjustment uncertainty.

Posted by Konstantin Kashin at 2:03 AM

February 13, 2012

App Stats: Sofer on "Sparse Joint Estimation of Covariates-Dependent Covariance Matrices"

We hope you can join us this Wednesday, February 15, 2012 for the Applied Statistics Workshop. Tamar Sofer, a Ph.D. student from the Department of Biostatistics at Harvard University, will give a presentation entitled "Sparse Joint Estimation of Covariates-Dependent Covariance Matrices". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Sparse Joint Estimation of Covariates-Dependent Covariance Matrices"
Tamar Sofer
Department of Biostatistics, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, February 15th, 2012 12.00 pm

Abstract:

We propose an estimation method for the principal components/covariance structures of a set of outcomes, while modeling the effect of covariates. We assume a linear mixed model formulation on the outcomes as response to covariates, a model corresponding to spiked covariance matrices. Since the subject-specific covariance matrices and the effects of covariates are believed to be sparse, we penalize coefficients using an oracle penalty function. Under some assumptions on the parameters and the likelihood, we show that the maximum likelihood estimator of the parameters is asymptotically consistent and is uniformly sparse ("sparsistent"), even when the number of parameters is small. We propose using the Bayesian Information Criterion (BIC) for tuning parameter selection and show that it is consistent for model selection. Using a simple iterated least squares procedure we are able to recover the model parameters with high accuracy. The method is implemented to study the effect of smoking on the covariances of gene methylations in the asthma pathway in smokers and non-smokers US veterans from the Normative Aging Study (NAS).

Posted by Konstantin Kashin at 2:15 AM

February 6, 2012

App Stats: Titiunik on "Using Regression Discontinuity to Uncover the Personal Incumbency Advantage"

We hope you can join us this Wednesday, February 8, 2012 for the Applied Statistics Workshop. Rocio Titiunik, Assistant Professor from the Department of Political Science at the University of Michigan, will give a presentation entitled "Using Regression Discontinuity to Uncover the Personal Incumbency Advantage". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Using Regression Discontinuity to Uncover the Personal Incumbency Advantage"
Rocio Titiunik
Department of Political Science, University of Michigan
CGIS K354 (1737 Cambridge St.)
Wednesday, February 8th, 2012 12.00 pm

Abstract:

We study the conditions under which estimating the incumbency advantage using a regression discontinuity (RD) design recovers the personal incumbency advantage in a two-party system. Lee (2008) has introduced RD as a method for estimating what is generally considered the "partisan" incumbency advantage. We present a causal model with some simple but plausible assumptions that allows RD to be used to estimate the "personal" incumbency advantage, as an alternative to sophomore surge, retirement slump, and other commonly used measures. We estimate the incumbency advantage using our model with data from U.S. House elections between 1952 and 2008. Using the assumptions of our model, we also explore the estimation of the incumbency advantage beyond the limited RD conditions where knife-edge electoral shifts create the leverage for causal inference.

Posted by Konstantin Kashin at 1:21 AM

January 30, 2012

App Stats: Quackenbush on "Moving Beyond the Mean: The Role of Variation in Determining Phenotype"

We hope you can join us this Wednesday, February 1, 2012 for the Applied Statistics Workshop. John Quackenbush, Professor of Biostatistics and Computational Biology and Director of the Center for Cancer Computational Biology at the Dana-Farber Cancer Institute, will give a presentation entitled "Moving Beyond the Mean: The Role of Variation in Determining Phenotype". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Moving Beyond the Mean: The Role of Variation in Determining Phenotype"
John Quackenbush
Dana-Farber Cancer Institute and Harvard School of Public Health
CGIS K354 (1737 Cambridge St.)
Wednesday, February 1st, 2012 12.00 pm

Two trends are driving innovation and discovery in biological sciences: technologies that allow holistic surveys of genes, proteins, and metabolites and a realization that biological processes are driven by complex networks of interacting biological molecules. However, there is a gap between the gene lists emerging from genome sequencing projects and the network diagrams that are essential if we are to understand the link between genotype and phenotype. 'Omic technologies were once heralded as providing a window into those networks, but so far their success has been limited, in large part because the high-dimensional they produce cannot be fully constrained by the limited number of measurements and in part because the data themselves represent only a small part of the complete story. To circumvent these limitations, we have developed methods that combine 'omic data with other sources of information in an effort to leverage, more completely, the compendium of information that we have been able to amass. Here we will present a number of approaches we have developed, with an emphasis on the how those methods have provided into the role that particular cellular pathways play in driving differentiation, and the role that variation in gene expression patterns influences the development of disease states. In particular, we will challenge the basic analytical that have been used in biomedical research and argue that one should move beyond a simple comparison of the means relative to variance (the t-test) but instead also consider how variance itself changes between phenotypes. Looking forward, we will examine more abstract state-space models that may have potential to lead us to a more general predictive, theoretical biology.

Posted by Konstantin Kashin at 2:26 AM

January 22, 2012

App Stats: Alan Zaslavsky on "The Consumer Assessments of Healthcare Providers and Systems (CAHPS) Survey for Medicare"

We hope you can join us this Wednesday, January 25 for the first Applied Statistics Workshop of 2012! Alan Zaslavsky, a professor of health care policy in the Department of Health Care Policy at Harvard Medical School, will give a presentation entitled "The Consumer Assessments of Healthcare Providers and Systems (CAHPS) Survey for Medicare: A Review and New Findings from a Mode Experiment". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"The Consumer Assessments of Healthcare Providers and Systems (CAHPS) Survey for Medicare: A Review and New Findings from a Mode Experiment"
Alan Zaslavsky
Department of Health Care Policy, Harvard Medical School
CGIS K354 (1737 Cambridge St.)
Wednesday, January 25th, 2012 12.00 pm

Abstract:

We assess health care quality and access in the Medicare system to inform consumer choice, foster quality improvement, monitor health plan quality, and reward high-performing plans. The CAHPS survey has since 1997 been one of the main tools for this assessment. In this talk, I will review some of the more interesting analyses of system quality made possible by using the CAHPS survey and some of the challenging issues in system monitoring overcoming years. I will then describe our analyses of an experiment on the effects of survey mode on CAHPS responses, using a principal stratification framework.

Posted by Konstantin Kashin at 5:43 PM

November 28, 2011

App Stats: Friedman on "The Long-Term Impacts of Teachers: Teacher Value-Added and Students' Outcomes in Adulthood"

We hope you can join us this Wednesday, November 30, 2011 for the final Applied Statistics Workshop of the semester. John Friedman, Assistant Professor of Public Policy at the Harvard Kennedy School, will give a presentation entitled "The Long-Term Impacts of Teachers: Teacher Value-Added and Students' Outcomes in Adulthood". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"The Long-Term Impacts of Teachers: Teacher Value-Added and Students' Outcomes in Adulthood"
John Friedman
Harvard Kennedy School
CGIS K354 (1737 Cambridge St.)
Wednesday, November 30th, 2011 12.00 pm

Abstract:

The use of test-score-based "value-added" (VA) measures to evaluate teachers is controversial, among other reasons, because (1) there is little evidence on whether high VA teachers improve student outcomes in adulthood and (2) there is debate about whether VA measures provide unbiased estimates of teacher quality. We address these issues by analyzing school district data from grades 3-8 for 2.5 million children linked to data on parents and adult outcomes from tax records. We find that the degree of bias due to selection is small using tests based on previously unobserved parent characteristics and a new quasi-experimental research design based on changes in teaching staff. We then show that high VA teachers increase their students' probability of college attendance, raise earnings, reduce teenage birth rates, and improve the quality of the neighborhood in which their students live in adulthood. The impacts of teacher VA are roughly constant across grades 4-8. A one standard deviation improvement in teacher VA in a single grade raises earnings by 1% at age 28. Replacing a teacher whose VA is in the bottom 5% with an average teacher would increase students' lifetime income by approximately $300,000 for the average classroom in our sample.

Posted by Konstantin Kashin at 12:24 AM

November 14, 2011

App Stats: Beltrán-Sánchez on "New Evidence Linking Early and Late-life Mortality in European Cohorts"

We hope you can join us this Wednesday, November 16, 2011 for the Applied Statistics Workshop. Hiram Beltrán-Sánchez, a postdoctoral research fellow at the USC Davis School of Gerontology and at the Harvard Center for Population and Development Studies, will give a talk entitled "New Evidence Linking Early and Late-life Mortality in European Cohorts". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"New Evidence Linking Early and Late-life Mortality in European Cohorts"
Hiram Beltrán-Sánchez
USC Davis School of Gerontology
CGIS K354 (1737 Cambridge St.)
Wednesday, November 16th, 2011 12.00 pm

Abstract:

Early environmental influences on later life health and mortality are well recognized. Using mortality data from 630 cohorts born throughout the 19th and early 20th century in nine European countries, we fitted a multilevel model to further explore the association between early life mortality with both the estimated mortality level at age 40 and the exponential (Gompertz) acceleration in mortality rates with age. Our findings strongly link early life mortality to both the cohort mortality level in mid-adulthood and the Gompertz rate of mortality acceleration during aging. Recent cohorts exposed to lower mortality environments early in life also showed lower mortality levels in adulthood. However, these gains were diminished by faster mortality accelerations at older age. Thus recent increases in adult survival are mainly due to declines in adult mortality levels rather than changes in the rate of aging. This analysis defines new links in the developmental origins of adult health and disease in which effects of early exposure to infections persist to adulthood and remain evident in the cohort rates of mortality at later ages.

Posted by Konstantin Kashin at 12:04 AM

November 6, 2011

App Stats: VanderWeele on "Sensitivity Analysis for Contagion Effects in Social Networks"

We hope you can join us this Wednesday, November 9, 2011 for the Applied Statistics Workshop. Tyler VanderWeele, Associate Professor of Epidemiology at the Harvard School of Public Health, will give a presentation entitled "Sensitivity Analysis for Contagion Effects in Social Networks". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Sensitivity Analysis for Contagion Effects in Social Networks"
Tyler VanderWeele
Harvard School of Public Health
CGIS K354 (1737 Cambridge St.)
Wednesday, November 9th, 2011 12.00 pm

The paper is available here.

Abstract:

Analyses of social network data have suggested that obesity, smoking, happiness, and loneliness all travel through social networks. Individuals exert ''contagion effects'' on one another through social ties and association. These analyses have come under critique because of the possibility that homophily from unmeasured factors may explain these statistical associations and because similar findings can be obtained when the same methodology is applied to height, acne, and headaches, for which the conclusion of contagion effects seems somewhat less plausible. The author uses sensitivity analysis techniques to assess the extent to which supposed contagion effects for obesity, smoking, happiness, and loneliness might be explained away by homophily or confounding and the extent to which the critique using analysis of data on height, acne, and headaches is relevant. Sensitivity analyses suggest that contagion effects for obesity and smoking cessation are reasonably robust to possible latent homophily or environmental confounding; those for happiness and loneliness are somewhat less so. Supposed effects for height, acne, and headaches are all easily explained away by latent homophily and confounding. The methodology that has been used in past studies for contagion effects in social networks, when used in conjunction with sensitivity analysis, may prove useful in establishing social influence for various behaviors and states. The sensitivity analysis approach can be used to address the critique of latent homophily as a possible explanation of associations interpreted as contagion effects.

Posted by Konstantin Kashin at 2:59 AM

October 31, 2011

App Stats: Brown and Marks on "Empirical Social Science at Disney Research"

We hope you can join us this Wednesday, November 2, 2011 for the Applied Statistics Workshop. Amber Brown, Senior Research Scientist at Disney Research, and Joe Marks, Vice President and Fellow of Disney Research, will give a talk entitled "Empirical Social Science at Disney Research". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Empirical Social Science at Disney Research"
Amber Brown and Joe Marks
Disney Research, Boston
CGIS K354 (1737 Cambridge St.)
Wednesday, November 2nd, 2011 12.00 pm

Abstract:

At Disney Research we mostly work on technologies that are relevant to our various businesses: computer graphics, computer vision, robotics, human-computer interaction, materials, displays, etc. But we also have projects in the social sciences, with a heavy emphasis on rigorous empirical testing. We will describe four recent projects:

  • Novel pay-what-you-want pricing mechanisms.
  • Load balancing of park guests via pushed incentives on mobile devices.
  • Guest participation in environmental programs.
  • Introduction of a cinema culture to the developing world.

Posted by Konstantin Kashin at 1:08 AM

October 23, 2011

App Stats: Nielsen on "Comparative Effectiveness of Matching Methods for Causal Inference"

We hope you can join us this Wednesday, October 26, 2011 for the Applied Statistics Workshop. Rich Nielsen, a Ph.D. candidate from the Department of Government at Harvard University, will present a paper entitled "Comparative Effectiveness of Matching Methods for Causal Inference". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Comparative Effectiveness of Matching Methods for Causal Inference"
Rich Nielsen
Government Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, October 26th, 2011 12.00 pm

Abstract:

Matching is an increasingly popular method of causal inference in observational data, but following methodological best practices has proven difficult for applied researchers. We address this problem by providing a simple graphical approach for choosing among the numerous possible matching solutions generated by three methods: the venerable "Mahalanobis Distance Matching" (MDM), the commonly used "Propensity Score Matching" (PSM), and a newer approach called "Coarsened Exact Matching" (CEM). In the process of using our approach, we also discover that PSM often approximates random matching, both in many real applications and in data simulated by the processes that fit PSM theory. Moreover, contrary to conventional wisdom, random matching is not benign: it (and thus PSM) can often degrade inferences relative to not matching at all. We find that MDM and CEM do not have this problem, and in practice CEM usually outperforms the other two approaches. However, with our comparative graphical approach and easy-to-follow procedures, focus can be on choosing a matching solution for a particular application, which is what may improve inferences, rather than the particular method used to generate it.

The paper is joint work with Gary King, Carter Coberley, James E. Pope, and Aaron Wells.

Posted by Konstantin Kashin at 10:54 PM

October 17, 2011

App Stats: Weihua An on "Peer Effects on Adolescent Smoking and Social Network-Based Interventions"

We hope you can join us this Wednesday, October 19, 2011 for the Applied Statistics Workshop. Weihua An, a Lecturer in the Department of Sociology at Harvard University, will present his dissertation entitled "Peer Effects on Adolescent Smoking and Social Network-Based Interventions". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Peer Effects on Adolescent Smoking and Social Network-Based Interventions"
Weihua An
Sociology Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, October 19th, 2011 12.00 pm

Abstract:

This study addresses a fundamental question in social network analysis: whether and to what extent peers affect a person's wellbeing. More specifically, it attempts to identify and quantify peer effects on smoking among adolescents.

Based on the causal inference terminology, a systematic framework to study causal peer effects was developed to distinguish several types of peer effects, including peer effects under control, peer effects under treatment, etc. To overcome the difficulties in identifying peer effects with observational data, a novel field experiment was conducted with a partial treatment group design specifically tuned to estimate peer effects.

More specifically, a smoking prevention intervention composed of distributing smoking prevention brochures and hosting health education workshops was assigned to partial randomly chosen members in a number of classes in six middle schools in China where the experiment was fielded. The goal was to study how the information contained in the intervention was spread across students and how it affected their information, knowledge, intention, and behavior regarding smoking. To accelerate or reinforce the diffusion, central students or students with their close friends as identified based on their social network information were also chosen respectively to receive the intervention in different treated classes.

Descriptive analysis provided strong support for peer effects on the initiation and maintenance of adolescent smoking. Further statistical analysis showed that compared with students in the control classes, students whose classmates were randomly chosen to receive the intervention but who did not receive the intervention themselves were more likely to exchange information about the intervention with other students and to remain non- smokers or change to non-smokers overtime. It was also found that the social network- based interventions did not consistently bring significant added value in all the outcomes of interest and their benefits mainly concentrated on lowering students' intention to smoke and decreasing smokers' popularity.

Special attention will be paid in the presentation to elaborating how to choose central students and student groups in a social network.

Posted by Konstantin Kashin at 12:06 AM

October 9, 2011

App Stats: Weissman on "From Fourier to Forensics"

We hope you can join us this Wednesday, October 12, 2011 for the Applied Statistics Workshop. Michael Weissman, a Professor Emeritus from the Physics Department at the University of Illinois, will give a presentation entitled "From Fourier to Forensics". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"From Fourier to Forensics"
Michael Weissman
Government Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, October 12th, 2011 12.00 pm

Abstract:

Although the statistical and systematic problems of public opinion polls are fairly widely recognized, we tend to assume that published polling results reflect some sort of actual poll. In 2009 a prominent blog suggested that the pollster Strategic Vision might be fabricating data, based in part on surprising deviations from uniformity of the distribution of trailing digits of the results.(http://www.fivethirtyeight.com/search/label/strategic%20vision) Objections were raised to the assumed uniform distribution, but we were able to use Fourier analysis together with known polling statistics to show that the results were weird even if that assumption were dropped. http://query.nytimes.com/gst/fullpage.html?res=9C03E1DA123AF930A25751C1A96F9C8B63

In 2010 we were contacted by a political consultant who had noticed anomalies in Research2000 poll reports. Using a variety of elementary statistical techniques, we showed that those results could not have accurately represented real polls. http://en.wikipedia.org/wiki/Research_2000 Unfortunately, we do not know if there are other bogus pollsters, disguising results via a random binary generator (cost $0.01).

Posted by Konstantin Kashin at 10:20 PM

October 2, 2011

App Stats: Liublinska on "Addressing missing data issues in a study with rare binary outcomes constrained by a small sample size"

We hope you can join us this Wednesday, October 5, 2011 for the Applied Statistics Workshop. Victoria Liublinska, a Ph.D. candidate from the Statistics Department at Harvard University, will present a paper entitled "Addressing missing data issues in a study with rare binary outcomes constrained by a small sample size". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Addressing missing data issues in a study with rare binary outcomes constrained by a small sample size"
Victoria Liublinska (with D. Rubin and R. Gutman)
Statistics Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, October 5th, 2011 12.00 pm

Abstract:

We (re)analyze the data obtained in a recent study conducted to evaluate safety and efficacy of a new device designed for vertebroplasty. The following are just a few issues that had to be addressed: missing data in some covariates, incorrect analysis applied initially to the primary endpoint, missing data in secondary endpoints. The latter involved additional challenges such as panel data (responses were collected twice over time with a non monotone missingness pattern), secondary endpoints were rare binary events. The analysis was complicated by a relatively small sample size. Our work demonstrates how a complex missing data issue can be broken down into a set of small tasks that are solved individually. Some tasks involved multivariate missing data imputation using chained equations (van Buuren and Oudshoorn 2000; Raghunathan et al. 2001) with carefully chosen conditional models. Other tasks called for new state-of-the-art solutions, such as z-transformation procedure for combining repeated p-values (D. Rubin et al. 2011 (to be submitted), C. Licht 2009 Ph.D. thesis) or enhanced tipping-point graphs that assess sensitivity to various deviations from assumptions made about the missing data mechanism (Yan et al. 2009, Campbell et al. 2011).

Posted by Konstantin Kashin at 10:05 PM

September 25, 2011

App Stats: Spirling on "Partisan Convergence in Executive-Legislative Interactions: Modeling Debates in the House of Commons, 1832-1915"

We hope you can join us this Wednesday, September 28, 2011 for the Applied Statistics Workshop. Arthur Spirling, Assistant Professor at the Department of Government at Harvard University, will present a paper entitled "Partisan Convergence in Executive-Legislative Interactions: Modeling Debates in the House of Commons, 1832-1915". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Partisan Convergence in Executive-Legislative Interactions: Modeling Debates in the House of Commons, 1832-1915"
Arthur Spirling
Government Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, September 28th, 2011 12.00 pm

Abstract:

We consider the interaction between members of the executive and backbenchers in the House of Commons between the Great Reform Act and the Great War, a period of radical internal reform that birthed the Westminster system in its current form. We gather new data of over a million speeches in seventeen thousand debates to model the way in which the cabinet-legislative relationship changed over time. In particular, we conceptualize debates as Markov chains moving between speaker states and focus on estimating transition probabilities of the same. We take a Bayesian mixed model approach, allowing for debate-level and ministry-level variation. We show a remarkable "convergence" in the behavior of ministers from different parties, beginning between the mid-1870s and late-1880s and coinciding with a series of important standing orders relating to the ability to ask questions in the Commons. While Tory ministers generally become more responsive, Liberal ministers are less involved in debate.

Posted by Konstantin Kashin at 9:44 PM

September 19, 2011

App Stats: Sen on "Natural Experiments, Judicial Quality, and Racial Bias in Federal Appellate Review"

We hope you can join us this Wednesday, September 21, 2011 for the Applied Statistics Workshop. Maya Sen, a Ph.D. candidate from the Department of Government at Harvard University, will give a practice job talk entitled "Natural Experiments, Judicial Quality, and Racial Bias in Federal Appellate Review". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Natural Experiments, Judicial Quality, and Racial Bias in Federal Appellate Review"
Maya Sen
Government Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, September 21st, 2011 12.00 pm

Abstract:

In this paper, I find that cases decided by black federal lower-court judges are consistently overturned more often than cases authored by similar white judges. I estimate this effect by leveraging the fact that incoming cases to the U.S. courts are randomly assigned to judges, which ensures that black and white judges hear similar sorts of cases. The effect is robust and persists after matching exactly on measures for judicial quality (including quality ratings assigned by the American Bar Association (ABA)), previous professional and judicial experience, and partisanship. Moreover, by looking more closely at the ABA ratings scores awarded to judicial nominees, I demonstrate that this effect is unlikely to be attributable exclusively to differences between black and white judges in terms of quality. This study is the first to explore how higher-court judges evaluate opinions written by judges of color and it has clear normative implications: attempts to make the judiciary more reflective of the general population may have actually resulted in inequality in the aggregate, both for litigants and for judicial actors.

Posted by Konstantin Kashin at 2:20 AM

September 12, 2011

App Stats: Blackwell on "A Dynamic Causal Inference Approach for Estimating the Effectiveness of Negative Campaigning"

We hope you can join us this Wednesday, September 14, 2011 for the Applied Statistics Workshop. Matt Blackwell, a Ph.D. candidate from the Department of Government at Harvard University, will give a practice job talk entitled "A Dynamic Causal Inference Approach for Estimating the Effectiveness of Negative Campaigning". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"A Dynamic Causal Inference Approach for Estimating the Effectiveness of Negative Campaigning"
Matt Blackwell
Government Department, Harvard University
CGIS K354 (1737 Cambridge St.)
Wednesday, September 14th, 2011 12.00 pm

Abstract:

Traditional single-shot causal inference models investigate the effect of a single action at a single point in time and are an invaluable tool for political scientists. Often, however, actions unfold over time, with political entities reacting to a shifting environment. Accordingly, single-shot methods leave researchers unable to extract meaningful causal inferences about these dynamic processes. This stems from a fundamental tension: in dynamic settings, regression and matching force a choice between omitted variable bias on the one hand, and post-treatment bias on the other and are unable to simultaneously correct for both. To avoid these problems, I introduce a framework for dynamic causal inference and utilize marginal structural models to estimate dynamic causal effects. The effectiveness of "going negative" serves as a motivating example---an apt illustration since candidates change their strategy as the campaign unfolds. Furthermore, I introduce novel diagnostics and a sensitivity analysis for the model.

Posted by Konstantin Kashin at 2:42 AM

September 5, 2011

App Stats: Robins on "Parametrizations, Likelihoods, Semiparametrics, Causal Graphs, Model Selection and Discovery for Complex Causal Models"

We hope you can join us this Wednesday, September 7, 2011 for the first Applied Statistics Workshop this semester. Jamie Robins, the Mitchell L. and Robin LaFoley Dong Professor of Epidemiology at the Harvard School of Public Health, will present his paper entitled "Parametrizations, Likelihoods, Semiparametrics, Causal Graphs, Model Selection and Discovery for Complex Causal Models". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Parametrizations, Likelihoods, Semiparametrics, Causal Graphs, Model Selection and Discovery for Complex Causal Models"
Jamie Robins
Harvard School of Public Health
CGIS K354 (1737 Cambridge St.)
Wednesday, September 7th, 2011 12.00 pm

Abstract:

I will discuss recent results on novel factorizations of the likelihood for both (i) semiparametric causal models (marginal structural models and structural nested models) and (ii) nonparametric causal graphical models with unmeasured confounders (hidden variables). I will show how the causal question of substantive interest dictates the choice of factorization (eg a causally complete MSM factorization is appropriate for inference on direct effects ) and a R(recursive)-factorization is appropriate for the construction of algorithms to perform model selection and causal discovery in the setting of nonparametric causal graphical models. Associated with each factorization is a parametrization . The parametrization dictates both the form of doubly robust estimators and the likelihood to be maximized in scoring algorithms such as BIC used for model selection. I will derive the relationship (ie mappings or diffeomorphisms ) between these alternate parameterizations. This is joint work with Thomas Richardson, Ilya Shpitser, Robin Evans, Eric Tchetgen and Andrea Rotnitzky.

Posted by Konstantin Kashin at 12:47 AM