November 2011
Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

Authors' Committee


Matt Blackwell (Gov)


Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries



SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
The Education Wonks
Empirical Legal Studies
Free Exchange
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science



Powered by
Movable Type 4.24-en

« October 2011 | Main | January 2012 »

28 November 2011

App Stats: Friedman on "The Long-Term Impacts of Teachers: Teacher Value-Added and Students' Outcomes in Adulthood"

We hope you can join us this Wednesday, November 30, 2011 for the final Applied Statistics Workshop of the semester. John Friedman, Assistant Professor of Public Policy at the Harvard Kennedy School, will give a presentation entitled "The Long-Term Impacts of Teachers: Teacher Value-Added and Students' Outcomes in Adulthood". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"The Long-Term Impacts of Teachers: Teacher Value-Added and Students' Outcomes in Adulthood"
John Friedman
Harvard Kennedy School
CGIS K354 (1737 Cambridge St.)
Wednesday, November 30th, 2011 12.00 pm


The use of test-score-based "value-added" (VA) measures to evaluate teachers is controversial, among other reasons, because (1) there is little evidence on whether high VA teachers improve student outcomes in adulthood and (2) there is debate about whether VA measures provide unbiased estimates of teacher quality. We address these issues by analyzing school district data from grades 3-8 for 2.5 million children linked to data on parents and adult outcomes from tax records. We find that the degree of bias due to selection is small using tests based on previously unobserved parent characteristics and a new quasi-experimental research design based on changes in teaching staff. We then show that high VA teachers increase their students' probability of college attendance, raise earnings, reduce teenage birth rates, and improve the quality of the neighborhood in which their students live in adulthood. The impacts of teacher VA are roughly constant across grades 4-8. A one standard deviation improvement in teacher VA in a single grade raises earnings by 1% at age 28. Replacing a teacher whose VA is in the bottom 5% with an average teacher would increase students' lifetime income by approximately $300,000 for the average classroom in our sample.

Posted by Konstantin Kashin at 12:24 AM

26 November 2011

Measuring racism with Google search

Using racially charged google searches as a proxy for racism, this paper by Seth Stephens-Davidowitz shows that Barak Obama lost 3-5 percentage points of the popular vote in 2008 because he is black. I found it very interesting and the emprical strategy invites imitation and application to other areas. Worth a look.

Posted by Richard Nielsen at 10:46 AM

14 November 2011

App Stats: Beltrán-Sánchez on "New Evidence Linking Early and Late-life Mortality in European Cohorts"

We hope you can join us this Wednesday, November 16, 2011 for the Applied Statistics Workshop. Hiram Beltrán-Sánchez, a postdoctoral research fellow at the USC Davis School of Gerontology and at the Harvard Center for Population and Development Studies, will give a talk entitled "New Evidence Linking Early and Late-life Mortality in European Cohorts". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"New Evidence Linking Early and Late-life Mortality in European Cohorts"
Hiram Beltrán-Sánchez
USC Davis School of Gerontology
CGIS K354 (1737 Cambridge St.)
Wednesday, November 16th, 2011 12.00 pm


Early environmental influences on later life health and mortality are well recognized. Using mortality data from 630 cohorts born throughout the 19th and early 20th century in nine European countries, we fitted a multilevel model to further explore the association between early life mortality with both the estimated mortality level at age 40 and the exponential (Gompertz) acceleration in mortality rates with age. Our findings strongly link early life mortality to both the cohort mortality level in mid-adulthood and the Gompertz rate of mortality acceleration during aging. Recent cohorts exposed to lower mortality environments early in life also showed lower mortality levels in adulthood. However, these gains were diminished by faster mortality accelerations at older age. Thus recent increases in adult survival are mainly due to declines in adult mortality levels rather than changes in the rate of aging. This analysis defines new links in the developmental origins of adult health and disease in which effects of early exposure to infections persist to adulthood and remain evident in the cohort rates of mortality at later ages.

Posted by Konstantin Kashin at 12:04 AM

9 November 2011

Studies that withhold replication data are more likely to have errors

We already knew that scholars who provide replication data get cited more. Now we know that they are also more likely to be right! Paper by Wicherts, Bakker, and Molenaar here. Blog post by Gelman here.

The authors asked for replication data to 49 psychology studies. Amazingly, many of them did not comply even though they were explicitly under contract with the journals to provide the data.

1) Papers whose authors withheld data had more reporting errors, meaning that the reported p-value was different than the correct p-value as calculated from the coefficient and standard error (as reported in the paper). I'd really like to think that these were all just innocent typos but: in seven papers, these typos reversed findings. None of those seven authors shared their data.

2) Papers whose authors withheld data tended to have larger p-values, meaning that their results were not as "strong" in some sense. This interpretation tortures the idea of the p-value a little bit, but it certainly represents how many researchers think about p-values. It's striking that researchers who think their results are "weaker" were less likely to provide data. It also suggests that researchers who are getting a range of p-values from different, plausible models tend to pick the p-value just below 0.05 rather than the one just above. But then, we already knew that.

This is frightening, not least because most of these were lab experiments, where we tend to think that the results are less sensitive to analyst manipulation because of strong design. Also, these are only the problems that were obvious without access to the replication data.

Most responses to this study include appeals for better data sharing standards, but I don't think it's necessary. As long as we know which authors provide replication data and which don't, we can all update accordingly.

Posted by Richard Nielsen at 7:48 PM

6 November 2011

App Stats: VanderWeele on "Sensitivity Analysis for Contagion Effects in Social Networks"

We hope you can join us this Wednesday, November 9, 2011 for the Applied Statistics Workshop. Tyler VanderWeele, Associate Professor of Epidemiology at the Harvard School of Public Health, will give a presentation entitled "Sensitivity Analysis for Contagion Effects in Social Networks". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Sensitivity Analysis for Contagion Effects in Social Networks"
Tyler VanderWeele
Harvard School of Public Health
CGIS K354 (1737 Cambridge St.)
Wednesday, November 9th, 2011 12.00 pm

The paper is available here.


Analyses of social network data have suggested that obesity, smoking, happiness, and loneliness all travel through social networks. Individuals exert ''contagion effects'' on one another through social ties and association. These analyses have come under critique because of the possibility that homophily from unmeasured factors may explain these statistical associations and because similar findings can be obtained when the same methodology is applied to height, acne, and headaches, for which the conclusion of contagion effects seems somewhat less plausible. The author uses sensitivity analysis techniques to assess the extent to which supposed contagion effects for obesity, smoking, happiness, and loneliness might be explained away by homophily or confounding and the extent to which the critique using analysis of data on height, acne, and headaches is relevant. Sensitivity analyses suggest that contagion effects for obesity and smoking cessation are reasonably robust to possible latent homophily or environmental confounding; those for happiness and loneliness are somewhat less so. Supposed effects for height, acne, and headaches are all easily explained away by latent homophily and confounding. The methodology that has been used in past studies for contagion effects in social networks, when used in conjunction with sensitivity analysis, may prove useful in establishing social influence for various behaviors and states. The sensitivity analysis approach can be used to address the critique of latent homophily as a possible explanation of associations interpreted as contagion effects.

Posted by Konstantin Kashin at 2:59 AM

2 November 2011

Privacy, Statistics, and the Debate over the Regulation of Social Science Research

(This is a guest post by Dr. Micah Altman, who is a Senior Research Scientist and Director of Data Archiving and Acquisitions at IQSS.)

The U.S. Office for Human Research Protections (OHRP) proposed a set of sweeping changes to the federal regulations that govern research involving human subjects (the “Common Rule”), in the form of an Advance Notice of Proposed Rule Making (ANPRM) and solicited comments from investigators, Institutional Review Boards (IRBs), and any other interested parties by October 26, 2011. The ANPRM posed 75 questions, among these were many that implicated the collection, storage, de-identification and distribution of information about individual research subjects, as well as major questions about the types and nature of exempt and minimal risk research. Together these proposed changes could have a huge effect on the conduct of social science research and on the sharing of research results.

There have been over 1100 comments submitted on the proposed legislation. The Data Privacy Lab, which is run by Latanya Sweeney and which has now joined IQSS, organized a series of seminars at Harvard on the proposed changes, resulting in two responses being submitted. One response was drafted by the lab and joined by about 50 data privacy researchers and supported, and by two national privacy groups. A second complementary response, was led by Salil Vadhan, Joseph Professor of Computer Science and Applied Mathematics, and former Faculty Director of the Center for Research on Computation and Society of Computer Science at Harvard, and joined by academics and researchers.

Among other issues these responses draw attention to the key role of data sharing in social science research, and to the statistical and computation advances that have fundamentally changed both the analysis of informational risks, and the opportunities available to use statistical methods to disclose data safely.

For example, the proposed privacy HIPAA privacy rule is implicitly tailored to traditional microdata, as Vadhan, et al.’s response points out: “[T]here is increased interest in collecting and analyzing data sets that are not in the traditional microdata form. For example, social network data involves relationships between individuals. A “friendship” relationship or contact between two individuals on a social network does not entirely “belong” to either individual’s record; the relationship can have privacy implications for both parties. While this change from data about individuals to data about pairs may seem innocuous, it makes the task of anonymization much more difficult and one cannot expect standards developed for traditional microdata, like HIPAA, to apply. ”

The response then goes on to highlight how advances in statistical and computational methods can direct access to confidential data safely through dynamic interactive mechanisms for tabulation, visualizations, and general statistical analysis; multiparty computation; and synthetic data generation. In many circumstances these techniques can yield both better privacy protections and better research utility than traditional “de-identification” techniques such as removing and generalizing fields.

Sweeney, et. al’s response goes on to comment on the systemic issues, incentive problems, and policy issues associated with the proposed changes, most important:

First, that “The proposed ban on re-identification would drive re-identification methods further into hidden, commercial activities and deprive the public, the research community and policy makers of knowledge about re-identification risks and potential harms to the public.”

And second, that the proposed policy provides no incentive to develop or use statistical and computational methods that would improve both privacy and research utility of data sharing:”[T]here needs to be a channel for NCHS, NIST or a professional data privacy body to operationalize research results so that real-world data sharing decisions rely on the latest guidelines and best practices.”

The DPL has collected these responses, along with the related responses from Harvard University, major privacy groups, and Social Science research association.

Posted by Matt Blackwell at 5:26 PM