30 September 2005
In my paper with S. Khagram entitled "A comparative study of inequality and corruption" (ASR 2005, vol.70:136-157), we demonstrated that data averaged for a long period (say, 1971-1996) instead of single-year data can be useful for both reducing measurement error and capturing a long-term effect.
In previous empirical studies of causes of corruption, income inequality was found insignificant. We suspected this lack of significance might be due to attenuation bias because income inequality was poorly measured. We found that using averaged data for inequality and other control variables increased the coefficient for inequality and made it significant.
Another result from this paper used "mature cohort size" (ratio of population 40 to 59 years old to the population 15 to 69 years old) as an instrument for inequality in IV regressions; again, inequality was found significant. Higgins and Williamson (1999) have previously studied the effect of cohort size on inequality. Because fat cohorts tend to get low rewards, when these fat cohorts lie at the top of the age-earnings curve, earnings inequality is reduced. When the fat cohorts are old or young adults, earnings inequality is augmented. Indeed, the mature cohort size is a powerful predictor of inequality
Note that by "fat cohorts" and "slim cohorts" I mean the relative size of the cohorts. When the mature cohorts is fat, or the relative size of the mature cohort is large, the earns differential (earnings gap between the mature cohort and the others) is reduced and hence earnings inequality is reduced.
You can view my paper here.
29 September 2005
Tobler's First Law of Geography states that "everything is related to everything else, but near things are more related than distant things." Obviously there are many examples -- an infection is more likely to spread to a nearby person than to a far away one, a new highway might depress house prices for people living right next to it, and so on. The point is that there can be important dependencies and heterogeneities that vary with space, among other associations. And in those cases the usual assumptions that observations or errors are independently distributed don't hold. Urgh. Welcome to the world of spatial statistics.
As an estimation problem this is often addressed through clustering methods. Households in a village with some infected persons are at higher risks than households in neighboring villages. Or are they really? Clustering works when the locations are relatively homogenous and separated. What if there is no good way to classify observations into clusters, for example, if an area is evenly populated? Or if the infected household lives right at the end of the village road, and some neighbors are in the other village? The administrative boundaries commonly used for clustering (village name) might not properly account for the actual proximity or whatever defines the space between the observations. If a transmitting mosquito wouldn't care much about the village name when deciding who to bite next, why should an analyst rely on it?
Using clustering may often be a good approximation but in some cases it's not good enough and there can be substantial spatial lags (observations are spatially dependent), spatial errors (error terms are related) and spatial heterogeneity (model parameters vary across space). Those can lead to biased estimates, inefficient ones, or both. The bad news is that those effects can matter a lot. The good news is that there are methods to test for spatial dependence and correlation, and estimation techniques to deal with them.
Of course the underlying interactions we are trying to better capture can be anything from linear to more complicated relations. It is unlikely that they are perfecrly well described by any abstract spatial model, so we will still need to make assumptions. But at least there are some methods that can handle cases where the usual assumptions fail, and they can make an important difference to the analysis. I will write more about them in later blog entries. Meanwhile you might be interested in the following texts:
-- James LeSage's Econometrics Toolbox (www.spatial-econometrics.com) has an excellent workbook discussing spatial econometrics and examples for the MATLAB functions provided on the same site
-- Anselin (2002) "Under the Hood: Issues in the Specification and Interpretation of Spatial Regression Models" Agricultural Economics 27: 247-267 provides a quick overview of the issues
-- Anselin (1988) Spatial Econometrics: Methods and Models is the classic and widely quoted reference for spatial statistics
28 September 2005
Every year, the host university of the Political Methodology conference invites a local scholar from some other discipline to share his or her research with the political science methods community. This year's special presentation, by James Elsner of the Florida State University Department of Geography, was sadly prescient. Professor Elsner's talk, "Bayesian Inference of Extremes: An Application in Modeling Coastal Hurricane Winds," applied extreme value theory in a Bayesian context to estimate the frequency with which hurricanes above a given strength make landfall in the United States. The devastating impact of Hurricane Katrina amply illustrates the importance of estimating maximum intensities; news reports suggest that as little as a foot or two of water overtopping the levees and eroding them from below may have caused the breaches that flooded New Orleans.
Extreme value theory provides a way to estimate the distribution of the maximum or minimum of a set of independent events. While this could be done directly if the distribution of the underlying events was known, in practice it is preferable to use the extremal types theorem to estimate the distribution of the maximum or minimum directly from data. The theorem states that, with appropriate transformations, the distribution of extreme values converges in the limit to one of three classes of distribution - Gumbel, Frechet, or Weibull - regardless of the shape of the underlying distribution.
There are several challenges in estimating the distribution of extreme values. The three classes of limit distributions for extreme values have different behavior in the extreme tail: one family has a finite limit, while the other two have no limit but decay at different rates. To the extent that we are interested in "extreme" extremes, these differences could have substantive implications. Compounding this problem, observations in the extreme tail are likely to be sparse. Finally, one might expect that the quality of data is lower when extreme maxima or minima are occurring. Consider Katrina: most of the instrumentation for recording wind speeds, storm surge, and rainfall rates were knocked out well before the height of the storm. (Nor is this just a problem with weather phenomena; imagine trying to measure precisely daily price changes during a period of hyperinflation). The Bayesian approach pursued in this work seems promising, as is allows the uncertainty in both the data itself and in the functional form to be modeled explicitly.
In talking with other grad students after the presentation, I think the consensus was that, while interesting methodologically and sobering substantively, it was hard to see how we would apply these methods in our own work. A quick Google search suggests that this approach is (not surprisingly) well established in financial economics, but not much else from the social sciences. With a little more time to reflect, however, I think that this may be more due to a lack of theoretical creativity on our part. Coming from the formal side of political science, I could see how thinking about extreme values might provide some insight into how political systems are knocked out of equilibrium, much like the levees in New Orleans.
27 September 2005
Proceedings in the Harvard Dept. of Statistics seminar series started early this year, as Hui Jin eloquently delivered her doctoral thesis defense on Wednesday, September 14, entitled "Principal Stratification for Causal Inference with Extended Partial Compliance." Jin applied her ideas both to drug trials and to school choice (voucher) programs. She spoke in particular about the second application, focusing on a study of vouchers as offered to students from low-income families in the New York City public school system. In this study, 1000 students were offered a subsidy to help pay tuition for a private school of their choice, and were matched with students with similar conditions who were not offered the grant. Both groups were tracked for three years, and a set of tests at the beginning and end were used to measure achievement. The compliance factor was whether grant recipients would always take advantage of the offer, and whether unlucky ones would never make their own way to private school. While the compliance rate after three years remained high - roughly 80% - it was the compliance factor that proved to be the most instructive on the achievement pattern of students, a result found by stratifying the outcomes according to compliance patterns.
Those students expected to comply perfectly - attend private school with the grant and public school without it, in all three years - made the least improvement as compared to their colleagues in the other strata. Comparative performance improved with non-compliance; the biggest non-conformers, those who attended private or public school regardless of whether the grant was offered showed the most improvement over their previous scores.
Notably, the reasons for this performance haven't been completely explained, though Prof. Rubin (Jin's advisor and collaborator on the project) suggests that perhaps using the voucher as a threat to remove a student from his friends may compel a higher performance at public school. Whatever the underlying mechanism, the results give strong and compelling reason to fully consider the effect of vouchers in the school system.
26 September 2005
This week's Applied Statistics Workshop presentation will be given by Professor Xihong Lin of the Department of Biostatistics at the Harvard School of Public Health. Professor Lin received her Ph.D. in Biostatistics from the University of Washington. She is one of the newest members of the Harvard statistical community, having just moved to Harvard from the University of Michigan School of Public Health. She has published widely in journals including the American Journal of Epidemiology, Biometrika, and the Journal of the American Statistical Association. She currently serves as the co-ordinating editor of Biometrics. Among her other awards, she has been recognized as an outstanding young scholar by both the American Statistical Association and the American Public Health Association.
Professor Lin's presentation, "Causal Inference in Hybrid Intervention Trials Involving Treatment Choice," considers the problem of causal inference from experiments in which some subjects are allowed to choose the treatment that they receive. Allowing treatment choice may increase compliance levels, but creates inferential challenges not present in a fully randomized experiment. Professor Lin will discuss her approach to this problem on Wednesday, September 28 at noon in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.
In my draft paper on the "correlates of social trust" (presented at the ASA conference, August 2005), I argued that fairness of a society such as freedom from corruption (fair administration of rules) and distributive fairness (relatively equal and unskewed distributions) affects the society's level of social trust more than its homogeneity does. Based on a multilevel analysis of data from the World Values Surveys (WVS, 1995-97, 2000-01) and the European Values Study (EVS, 1999), I found that corruption and inequality are significantly negatively associated with social trust controlling for individual-level factors and other country-level factors, while ethnic diversity loses significance once corruption or inequality is accounted for. Also, I found that the inequality effect is primarily due to the skewness of income rather than its simple heterogeneity, and that the negative effect of minority status is greater in more unequal and undemocratic societies.
The WVS and the EVS have been conducted in close cooperation with (almost) identical questions. The WVS (1995-97) covers 50 countries, and the WVS/EVS (1999-2001) covers 66 countries in all continents of the world. By pooling the 1995-97 data and the 1999-2001 data, I was able to increase the number of countries to 80. My literature review has unearthed few articles employing multilevel modeling in the comparative politics or sociology literatures. I suspect the scarcity of adequate multilevel data is one reason for this. Schofer and Fourcade-Gourinchas (2001) used the 1991 WVS in a multilevel analysis of the "structural contexts of civic engagement," but the country coverage was just 32. Although they had a lot of observations at the individual level, the relatively small N at the country level prevented them from including many explanatory variables at the country level. Now, with a relatively large number of countries, the WVS/EVS data seems to be an ideal dataset for which many interesting multilevel analyses can be conducted.
Since my draft is rough, I will welcome any comments, either methodological or substantive. You can find a draft here.
23 September 2005
The annual meeting of the Conference of the Cognitive Science Society took place in late July. Amid a slew of interesting debates and symposia, one paper stood out as having particularly interesting implications from the methodological perspective. The paper, by Navarro et. al., is called "Modeling individual differences with Dirichlet processes" (pdf found here).
The basic idea is that many questions in cognitive and social science hinge on identifying which items (subjects, features, datapoints) belong to which groups. The individual difference literature is replete with famous psychological theories along these lines: the factors contributing to IQ, the different "personality types", the styles of thought on this or that problem. In cognitive science specifically, the process of classification and categorization - arguably one of the more fundamental of the mind's capabilities - is basically equivalent to figuring out which items belong to which groups. Many existing approaches can capture different ways to assign subjects to groups, but in almost all of them the number of groups must be prespecified - an obvious (and large) limitation.
A Dirichlet process is a "rich-get-richer" process: as new items are seen, they are assigned to groups proportional to the size of the group, with some nonzero probability alpha of forming a new group. This naturally results in a power-law (Zipfian) distribution of items, which parallels the natural distribution of many things in the world. It also often seems to form groups that match human intuitions about the "best" way to split things up. Dirichlet process models, often used in Bayesian statistics, have been around in machine learning and some niches of cognitive science for at least a few years. However, the Navarro article is one of the first I'm aware of that (i) examines their potential in modeling individual differences, and (ii) attempts to make them more widely known to a general cognitive science audience.
It's exciting to see more advanced Bayesian statistical models of this sort poke their way into cognitive science. As I think about how useful these can be, I have some questions. For instance, Navarro et al.'s model gives a more principled mechanism for figuring out how many groups best fit a set of data, but the exact number of groups identified is still dependent on the alpha parameter. Is this a severe limitation? Also, the "rich-get-richer" process is intuitive and natural in many cases, but not all groups follow power-law distributions. How might we use models with other processes (e.g., Gaussian process models) to assign items to an unspecified number of groups in delete "other" ways that don't yield power-law distributions? I think we've only started to scratch the surface of the uses of this type of model, and I'm eager to see what happens next.
22 September 2005
Our job as social scientists is to learn how to take data that reflects various aspects of how people and societies work, and then use that data to form abstract theories or models about the world. Different fields in social science look at different data, but we all share common methods and (I imagine) some common general questions. This blog is set up to allow our different disciplines to discuss our commonalities of method and approach, sharing insights from our respective fields.
Cognitive science is a bit unusual because the questions of method and approach are simultaneously relevant on two levels rather than one. In cognitive science, the object of study (the brain) must solve the same questions as the scientists themselves. In other words, just as the job of the cognitive scientist is to figure out how best to take data in the world and form models about the world, the job of the brain is to figure out how to take data in the world and form a model about the world. As a result, the issues that crop up again and again for scientists—which quantitative approaches "compress" data most effectively and fastest, when statistical or symbolic models capture the world best, and how much needs to be built into our models from the beginning—are the very issues the brain needs to solve as it is learning about the world. They are thus issues that the cognitive science world continually debates about on both levels: not only what works for us as scientists (and when), but what works for the brain itself (and when).
When I post here, therefore, I'll be constantly playing with these levels: I'll be talking about quantitative methods in social science not just from the perspective of the scientist (as will everyone else here), but also from the perspective of the mind (which I'm guessing most other people won't). In short, the questions we all struggle with in terms of methodology are the same questions cognitive scientists struggle with in terms of content. It's my hope that playing with these questions on two levels at once will be edifying, entertaining, and lots of fun. I think it will be.
21 September 2005
It's well known that African American college students on average (repeat: on average) have lower SAT scores than white students (see Bowen and Bok's book The Shape of the River). Now here's something that annoys me: Every now and then, I run into somebody who takes this observation as evidence that affirmative action dilutes academic standards. Hello? Differences in mean SATs among accepted students have little or nothing to do with affirmative action!!
Consider this: SAT scores are roughly normally distributed among both blacks and whites but the distribution for blacks is shifted a bit to the left (lower mean). Now consider a college that will admit every candidate above a certain cut-off point (same cut-off for everybody). Under these circumstances the average SAT score of accepted black students would be lower than the average SAT score among accepted white students, even though the college has applied a uniform, race-blind admission standard. Why? Because the tail area of the white SAT distribution extends farther to the right of the cut-off point than the tail area of the distribution for blacks, whatever the reason. Upshot: racial differences in test scores in a student body don't reveal whether a school practices affirmative action and by themselves certainly don't betray "diluted standards." In addition, more or less the only way to create a student body where black and white students have the same average SAT score, given these race specific SAT distributions, would be to set drastically higher admissions standards for blacks than for whites - i.e. to discriminate against blacks. Surely, that wasn't the point?
(This observation comes to me via friends of UCLA's Thomas Kane. Kane is now moving to Harvard - thus moving this blog closer to the source.)
20 September 2005
We'd like to welcome the new Political Behavior Blog at the Institute run by Prof. Barry Burden and his team of graduate students from Harvard and MIT. They've only just started but they have some interesting material on the way. If you're interested in political behavior, we encourage you to check it out. See http://iq.harvard.edu/blog/pb.
It's fascinating how far you can get by taking a second look at the simplest statistics - in this case percentages and ratios. Case in point, James Scanlan's clever and unjustly ignored observation that African Americans will necessarily appear to be losing ground relative to whites even as their standing improves in absolute terms. (Actually, the argument holds for any inter-group comparisons, not just race.) Scanlan shows that this is an artifact of measuring progress by focusing exclusively on ratios of percentages from dissimilar distributions. This insight begs the question of how best to measure progress. Here are some of Scanlan's examples.
Black-white differences in infant mortality: In 1983, 19.2 black infants but only 9.7 white infants died per 1000 births in each group. The resulting black-white ratio was 1.98. In 1997, infant mortality had decreased quite a bit, to 14.2 for blacks and 6.0 for whites. Note that in raw percentage terms, infant mortality had improved more for blacks than for whites. That should be good news, no? But, lo, now look at the black-white ratio in 1997 - it increased from 1.98 to 2.4. How can infant mortality have improved more for blacks than for whites in absolute terms at the same time as the relative position of
blacks to whites has worsened?
Here's another example for the same underlying statistical phenomenon: Moving the income distributions of blacks and whites up by the same dollar amount relative to the poverty threshold would increase the racial disparity in poverty (because relatively more blacks suffer extreme poverty than whites)! Except for extreme circumstances, this will be true even if we boost black real incomes more than white real incomes. How can it be that helping blacks more than whites in absolute terms would worsen blacks' relative economic position?
Here's my favorite example - racial disparities in college acceptance rates. Suppose that college admissions are solely a function of SAT scores (as I'm told they essentially are for some large, selective state schools) and that the SAT distribution of black test takers equals that of whites except it's shifted to the left (as it is). Let the cut-off point for college acceptance be the same for blacks and whites (i.e. no affirmative action). Lowering the admission standard (for everybody) would then reduce the racial disparity in admission rates. That's good, no? But at the same time - and necessarily so - the lowering of admission standards would increase the racial disparity in rejection rates. That's bad, no? Huh?
It turns out that seemingly straightforward comparisons of ratios of percentages may hide more than they tell (in these examples, with important policy implications). Interestingly, all three examples draw on the same statistical phenomenon. The secret lies in the funny shape of cdf-ratios from density functions that are shifted against each other. I plan to provide an intuitive explanation for this point once we've figured out how to post graphics on this blog. Until then, read James P. Scanlan's "Race and Mortality" in the Jan/Feb 2000 issue of Society.
19 September 2005
Boston will host the 2005 International Conference on Health Policy Research from October 28-30. This year's theme is "Methodological Issues in Health Services and Outcomes Research" and presentations are meant to convey both content and methodology.
The conference includes a slightly eclectic selection of workshops on methods and the use of well-known health datasets -- two workshops on the latter are free, others cost $60 or $30 (students). Registration is not free either but studentes pay only $80. Looks interesting and useful overall, though you might want to attend selectively.
For more info check the conference website.
The Research Workshop in Applied Statistics brings together the statistical community at Harvard for a lively exchange of ideas. It is a forum for graduate students, faculty, and visiting scholars to present and discuss their work. We advertise the workshop as "a tour of Harvard's statistical innovations and applications," with weekly stops in different disciplines such as economics, epidemiology, medicine, political science, psychology, public policy, public health, sociology and statistics. The topics of papers presented in recent years include matching estimators, missing data, Bayesian simulation, sample selection, detecting biological attacks, imaging the Earth's interior, incumbency in primary elections, the effects of marriage on crime, and revealed preference rankings of universities.
One of the strengths of the workshop is its diverse group of faculty sponsors. This year's sponsors include Alberto Abadie (Kennedy School), Garrett Fitzmaurice (School of Public Health), Lee Fleming (Business School), Guido Imbens (Economics), Gary King (Government), Kevin Quinn (Government), James Robins (School of Public Health), Donald Rubin (Statistics), and Christopher Winship (Sociology). The workshop provides an excellent opportunity for informal interaction between graduate students and faculty.
The workshop meets Wednesdays during the academic year; lunch is provided. If you are interested, come to our organizational meeting on Wednesday, September 21 at noon in Room N354 at the Institute for Quantitative Social Science (IQSS is located on the 3rd Floor of CGIS North, 1737 Cambridge St., located behind the Design School). Course credit is available for students as an upper-level class in either Government or Sociology.
For more information, check out our website at here . There you will find contact information, the schedule of presentations, and links to papers from previous presentations. We'll also be using this blog to announce speakers and to post reports from the workshop, so check back here often. We hope to see many of you there. If you have any questions, feel free to e-mail me at firstname.lastname@example.org.
John F. Friedman
Continuing from the most recent post, for the economist, perhaps a more interesting incidence of this statistical problem is not researchers making this error within the literature but consumers making misjudgments in the marketplace. (Since most people approach problems in their lives with less rigor than a statistician, perhaps this is not surprising). In particular, once consumers make these inference mistakes, economic theory suggests that firms will take advantage. Edward Glaeser wrote at length on this phenomenon in 2003 in "Psychology and the Market."
One classic example of this phenomenon - as specifically related to censorship by death - is the mutual fund industry. Most brochures for management companies aggressively tout the high past returns that have accumulated in their funds. Consumers then extrapolate these historical earnings into the future, usually choosing managers based on past performance. Of course, their reasoning is tainted by the same statistical problem; companies will shut down those mutual funds which have poor past performance, leaving only their winners for customers to admire. (Another problem with this line of reasoning is that there is virtually no evidence that strong past performance predicts of strong future performance. In this sense, perhaps the greater error is to pay attention to past returns at all!) This problem is compounded in the market by the fact that any firm which attempts to educate consumers about their mistakes is unlikely to capture the value-added from that effort. The now-savvy consumers have no reason to invest at the firm that provided the information, and, even if they did, these firms make the most money from naive consumers rather the smart ones, who would now make up the clientele. See David Laibson and Xavier Gabaix (2004) for more on this phenomenon. Since no firm has an incentive to educate the public, the entire industry becomes geared towards taking advantage of naive consumers, obfuscating costs, and selectively presenting information.
A Visit To Harvard
Anton Westveld (Visiting from University of Washington Statistics Department)
This past week I had the opportunity to visit with Kevin Quinn, one of my main Ph.D. advisors, at for the Center for Government and International Studies at Harvard. Kevin and Gary King asked if I would provide a brief description of my recent visit.
I was fortunate enough to arrive in time to work in the new buildings for the Center. The new space has a modern design that is quite beautiful and utilitarian.
Currently we are working on developing statistical methodology for longitudinal social network data. Social network data consist of measured relations occurring from interactions within a set of actors. This type of data allows for the empirical investigation of the interconnectivity of the actors, which is a cornerstone of social science theory. The methodology focuses on data generated from the repeated interaction of pairs of actors, including temporal dyadic data resulting in an outcome for each actor at each time point (e.g. the level of exports from Canada to Japan in a given year). The methodology incorporates structure to account for correlation resulting from interactions as well as the repeated nature of the data. In particular, a random effects model is employed which accounts for five different types of network dependencies. These five dependencies are then correlated over time through the assumption that the random effects follow a weakly stationary process.
Kevin and I spent the last few days discussing appropriate methodology and writing C++ code. We also spent some time discussing the relationship between social network models and statistical game theory models, both of which seek to gain an understating of social phenomena by examining social interaction data. Due to the Center’s collegial environment, I also had opportunities to discuss my work with Gary King and Jake Bowers.
18 September 2005
The Harvard Dept. of Statistics kicks off its 2005-2006 seminar series on Monday, September 19 with a talk by the father of the Rubin Causal Model himself, Prof. Donald Rubin. An entertaining speaker if there ever was one, Prof. Rubin will give a firsthand account of his research to all who are interested.
The talk will be held in Science Center 705 at 4:00; a reception will follow. Looking forward to seeing all interested parties in attendance.
16 September 2005
John F. Friedman
The problem of "censoring by death" also surfaces up in a number of economic contexts. For instance, firms that go bankrupt as a result of poor corporate policies will not appear in many datasets, making any analysis of the impact of other financial events biased upwards. This problem has particularly plagued the literature on the impacts of corporate restructuring and leveraged buyouts (LBOs) of distressed firms. Since these firms are at high risk of failure by nature of their inclusion in the study in the first place, such firms exit the sample at high frequency, and the benefits of restructuring and LBOs may be overstated.
One can theoretically correct for this problem by modeling the ways in which the sample selection occurs, but these approaches have performed poorly in many economic settings due to the sensitivity of the results to the parametric assumptions of the econometric model. For instance, the "Heckman selection correction" - brought into Economics by Nobel laureate James Heckman in 1979 - models the death process as a first stage Probit based on observable characteristics. By estimating this first stage, one can correct for the lost observations. Bob LaLonde (1986) later tested this model by comparing the results from a job training study with random assignment to the results one would have gotten had one used Heckman's method on the treated group. Though the selection correction performed better than many alternative methods, such as matching or differences-in-differences, the estimates were rather imprecise and confidence intervals mismeasured. In this case, the problem is the joint assumption of normality and selection entirely on observables. Though more flexible models have come into Economics in recent years - the Propensity Score, for instance – these too have proven sensitive to the particular model properties in many applications.
Though perhaps an old-fashioned solution, the studies in economics that best avoid this problem have simply endeavored to correct for the sample selection problem by collecting otherwise unavailable data on firm deaths in the sample. These samples are often smaller, permitting less broad analysis, but effectively mitigate the selection by death problem.
15 September 2005
D. James Greiner
I'm interested in the problem of "censoring due to death" within the framework of the Rubin Causal Model ("RCM").
As readers will know, the RCM is a framework for studying the effects of causes in which the science is represented via a set of potential outcomes for each unit. (A potential outcome is the value the dependent variable would take on if the treatment variable had a certain value, whether or not the treatment variable actually had that value). An assignment mechanism decides what treatment (e.g., active treatment or control) a unit receives and thus which potential outcome will be observed. Unit-level causal effects are defined as the difference in the potential outcomes of some quantity of interest. The fundamental problem of causal inference is that we can observe at most one potential outcome for each unit. Unobserved potential outcomes are treated as missing data. Observational studies are analyzed as "broken" randomized experiments, broken in the sense that the assignment mechanism was not recorded and therefore must be reconstructed in some approximate way. For a more complete discussion, see Holland, P.W. (1986). Statistics and Causal Inference. Journal of the American Statistical Association 81: 945--960.
Censoring or truncation due to death occurs when some units' failure to comply with a post-treatment condition renders their values of the quantity of interest undefined. Consider for example a medical study designed to assess the effect of a new cancer treatment on the percentage of patients who survive cancer-free for ten years. Suppose some individuals die from car accidents or drug overdoses or other causes clearly unrelated to cancer before the ten-year time period has elapsed. Such individuals do not have a value for ten-year cancer-free survival, so their values of the quantity of interest are undefined. (The problem here is not that these individuals' values for cancer-free survival are missing data; rather, the problem is that they have no such values.) Under such circumstances, some quantitative analysts simply remove such individuals from the study and analyze the remainder. This course of action can bias results in several different ways. To illustrate one such way, it could be that individuals who die from non-cancer related causes might smoke, have less healthy diets, refuse to wear seat belts, or otherwise engage in more risky behavior than many of the other individuals in the study. If the treatment is effective in warding off cancer, there could be more deaths unrelated to cancer in the treated group than the control group, because some treated group members survive cancer that would otherwise have killed them long enough to be felled by, for example, car accidents, before ten years are up. This difference could render comparison of the units remaining in the treated and control groups an inappropriate method of assessing the effect of the treatment.
The key is to realize that a comparison of ten-year cancer-free survival rates only makes sense for units who would not die from causes unrelated to cancer if assigned treatment AND who would not die from causes unrelated to cancer if assigned control. Thus, removing individuals who died from causes unrelated to cancer is not enough.
The remaining group actually assigned control may include some units who would have died from non-cancer causes if they had been assigned treatment, and the remaining group actually assigned treatment may have some units who would have died from non-cancer causes had they been assigned control. The researcher must take appropriate steps to remove both sets of people from the study, so as to isolate the set of individuals who would not die from causes unrelated to cancer regardless of treatment assignment. Junni Zhang (Peking University) and Don Rubin (Harvard University) discuss these issues in "Estimation of Causal Effects Via Principal Stratification when Some Outcomes Are Truncated by 'Death,'" (2003). Journal of Educational and Behavioral Statistics 28:353-368. They extend them in a forthcoming paper with Fabrizia Mealli (University of Florence) currently entitled "Evaluating Causal Effects in the Presence of 'Truncation by Death' -Likelihood-based Analysis via Principal Stratification."
14 September 2005
When I was an undergrad, the first political science class that I took was taught by the late A.F.K. Organski. At one point, someone asked him what advice he would give to freshmen interested in political science as a major. "Take as many math courses as you can," he said with his inimitable accent. I'm pretty sure that this was not the advice that most people wanted to hear, and that it was honored more in the breach than the observance, but it was sound advice nonetheless.
In keeping with this idea, several Harvard programs offer short math refresher courses for incoming graduate students, including Government, Economics, and the Kennedy School. The Gov Department's "math (p)re-fresher" is held during the first two weeks of September. We cover calculus, probability, linear algebra, and a bit of optimization theory, along with an introduction to some of the software (R, Xemacs, and Latex) that we use in the department's methods courses. All told, it is a quick review of about five semesters worth of undergraduate math courses in the span of ten days. As you might imagine, there is considerable variation in the amount of "pre-freshing" versus "re-freshing" that goes on in the course.
I'm curious about the prevalence of these kind of "math camp" courses in the social sciences. I only know of a few others in political science, but I get the sense that they are more common in economics. Are there any sociology math camps out there? Psychology? Public health? If you have a math camp, I'd be interested in taking a look at your syllabus. Comments should be enabled.
13 September 2005
By popular demand, we've shortened the URL for this blog. The old one still works, but the URL is now: http://iq.harvard.edu/blog/sss/
Dan Hopkins, G4, Government (guest author)
Continuing with the discussion of papers presented at the recent Political Methodology Conference, Kevin Quinn and Arthur Spirling's paper begins with the problem of identifying legislators' preferences in conditions of strict party discipline. To tackle this challenge, they applied a Dirichlet process mixture model and presented some interesting results about the intra-party groups observed in the British House of Commons. They backed up the groupings recovered from the model with significant qualitative work, and showed how qualitative and quantitative work of this kind can go hand in hand. At the same time, the discussant, Andrew Martin, raised a valuable question: how does this method relate to other analyses of grouping/clustering? I am curious about this question as well.
James Honaker's paper tackled a question of substantive importance: what is the role of economic conditions in triggering sectarian violence? Honaker analyzed all available data, far more than anyone previously, and used a creative combination of ecological inference and multiple imputation to estimate the impact of the Protestant-Catholic unemployment ratio on a monthly basis. His substantive result was that this ratio matters: as the gap between Protestant and Catholic employment grows, so too does the risk of violence. One questioner suggested that we might want to instrument for unemployment, since unemployment could be endogenous to violence. Honaker responded that unemployment in Northern Ireland tracks unemployment in comparable cities elsewhere. This paper struck me as, among other things, a powerful (if implicit) rebuttal to those who are that one should never attempt ecological inferences. The question Honaker addressed is one scholars have already tried to answer - sometimes with counter-intuitive results - suggesting that we may not be able to simply wait for perfect, individual-level data.
Kosuke Imai presented co-authored work on an Internet experiment in Japan. As with the Jackman et al. paper, this work presented a single Bayesian model that dealt with 1) the problem of non-compliance; 2) the problem of non-response; and 3) estimated causal effects. The methods were compelling, although the data were less cooperative: almost no statistically significant treatment effects emerged. That result seems to fit with our priors: the experiment directed Japanese Internet users, presumably a relatively well-informed group, to click on a webpage containing party manifestos during the Upper House election. The fact that we are selecting our sample based on a set of covariates might help explain why the covariates are (at least individually) relatively helpless in predicting compliance. As with the Bowers and Hansen, I hope that the authors make their statistical code public and easily adapted to other applications-as these tools are well-suited to analyzing a wide range of randomized experiments.
David Epstein presented a joint paper with Sharyn O'Halloran that argued for using higher-dimension Markov models-that is, Markov models with more than two states-to model transitions to and from autocracy/democracy. The substantive argument: adding a third category of "partial democracy" helps us see that economic growth matters both for transitioning to democracy and for staying there. Discussant Jeff Gill and others questioned the appropriateness of the basic Markovian assumption (that the probability of transition conditional on the current state is equal to the probability of transition conditional on all previous states) and suggested exploring a higher-order Markov model (that is, models that allow previous states to influence present transition probabilities). I agree with their suggestion, but my question is more basic: if we have polity scores that are continuous on an interval, how much information is thrown away by transforming these scores into three discrete states? I have not seen the data, so I also wonder if these three states emerge naturally from it. In other words, how much would this analysis change if we redefined autocracy or democracy by a few polity points?
12 September 2005
The Department of Government invites applications for a position in quantitative political methodology at the rank of Assistant or untenured Associate Professor to begin July 1, 2006. Candidates should expect to have completed the requirements for the Ph.D. prior to appointment. Teaching duties will include offering courses at undergraduate and graduate levels. Candidates are expected to demonstrate a promise of excellence both in research and teaching in political methodology. Harvard is an Affirmative Action/Equal Opportunity Employer; applications from women and minority candidates are strongly encouraged. We will begin to review applications on Monday, October 10, 2005 and will continue until the position is filled. Send application, including cv, transcripts, at least 3 letters of recommendation, samples of written material, teaching evaluation materials if available, and a one-page summary of a proposed job talk to: Faculty Recruitment Committee, Political Methodology, Government Department, Harvard University, 1737 Cambridge Street, Cambridge, MA 02138.
Pol Meth Conf III
Dan Hopkins, G4, Government (guest author)
Continuing the discussion of the recent Political Methodology Conference, throughout its first two days the notion of the conference as the "Second Annual Conference on Matching" was a running joke, and definitely a fair joke, although the two matching papers were, well, matched by two ideal point papers. So on to ideal points. Michael Bailey's paper tackled an important problem: because major figures across the different institutions of the federal government are faced with different policy decisions, it is hard to make statements about how their preferences relate. Is the Supreme Court to the left of Congress? How would today's court rule on famous decisions from the past? Bailey's paper sought to extend ideal points across institutions, using such things as public statements and the court briefs of the Solicitor General to compare the ideal points of not just justices but of members of all three branches of the federal government. Bailey argued, for example, that if the first Bush administration filed a brief in support of a certain side in a court case, we could use that filing to put Bush in the same space as Chief Justice Rehnquist. Bailey used the same sort of logic to extend ideal points back in time, focusing on statements about preferences-for instance, Clarence Thomas's statement that Roe was wrongly decided-to allow figures from different time periods to be placed on the same scale. Especially impressive was the data collection effort this project entails, as the author tracked down public statements from a wide range of figures.
One of the challenges of making these kinds of cross-institutional inferences, though, is that we need to implicitly assume non-strategic behavior. Needing to build a majority of five, justices in the Supreme Court face a task distinct from that of the President—or from that of the average member of the House. These strategic contexts will undoubtedly affect politicians' decisions: Presidents have little incentive to make public statements that put them at odds with the majority of Americans, even if those statements reflect their preferences. Also, if Presidents (or others in the system) are selective about the subjects of their commentary, we might wind up with a biased idea of where they actually stand. Still, Bailey provided quite a neat paper, one that provides useful tools for tracking inter-institutional dynamics. The substantive results were also very interesting, with the median ideal point of the Court almost always between that of the House and the Senate.
The next ideal point paper came from Simon Jackman, Matthew Levendusky, and Jeremy Pope. Here, the goal was to estimate the baseline propensity of a Congressional district to support Democratic or Republican candidates—although much of the Q&A was taken up by questions about whether this was best thought of as the "natural vote" or something else. The authors emphasized that measurement and structural modeling go hand-in-hand because inaccurate measurement may well bias the structural estimate of quantities like the incumbency advantage. They also pointed out that in this field we are content with rough proxies of district tendencies despite the fact that in other areas we demand much more precision in our measurements. Jackman, Levendusky, and Pope's model was a Bayesian hierarchical ideal point model that draws on information about both Congressional and Presidential results to make inferences about districts' underlying partisan preferences.
For me, one provocative result from this paper was that the discrimination parameter-that is, the impact of the covariates on the estimated vote share-increased over the decades. In other words, demographic characteristics are becoming increasingly effective predictors of districts' preferences. I would love to see the authors try to get at exactly why that is. One possibility, which Levendusky mentioned in making his presentation, is redistricting: politicians get better at picking their constituents, districts become more homogeneous, and so district-level demographics become better predictors of aggregate vote choices. To test this theory, one might re-estimate the model without the least populous states (because such states have less potential for gerrymandering. Consider Wyoming: no gerrymandering there). Another possibility is that the electorate is sorting itself into more politically homogeneous groups, something one might test in a preliminary way by running the model separately for high-mobility and low-mobility districts. The Census gives data on how many people have lived in the same house for their entire lives, data that could help with these questions.
This fall I am teaching GOV 2000 Quantitative Methods for Political Science I. This course is also offered for credit through Harvard's distance learning program as GOVT E-2000. GOV 2000 is the first course in the Department of Government's methodology sequence and it is designed to introduce students to statistical modeling with emphasis on least squares linear regression. Although we will not ignore the theory underlying the linear model, much of the course will focus on practical issues that arise when working with regression models. Topics covered in the course include: data visualization, statistical inference for the linear model, assessing model adequacy, when is a regression model a causal model, dealing with leverage points and outliers, robust regression, and methods for capturing nonlinearities. We will also be working with real social science datasets throughout the course. For more information, please visit the course website here .
9 September 2005
Dan Hopkins, G4, Government (guest author)
Continuing with the matching theme on which I ended the post of two days ago, Alexis Diamond and Jas Sekhon presented a paper on genetic matching that claimed to be a significant improvement on past approaches. One of the challenges of matching is to weight each of the covariates so as to produce the optimal set of matches. Genetic matching uses a genetic algorithm to search across the set of possible weight matrices to find the weight matrix that minimizes some loss function. Of course, what exactly that loss function should be is debatable. In Rawlsian fashion, Diamond and Sekhon argued that it should be to maximize the p-value of the most unbalanced covariate, and Sekhon's software (link here) does exactly that. In some applications, one could certainly imagine other loss functions; seeking the best possible balance on the most unbalanced covariate could jeopardize the overall balance, a libertarian sort of rebuttal. The discussion of the paper also raised the question of whether using a p-value is the right criterion. If the algorithm is comparing p-values from samples with different sizes, for instance, it could disproportionately favor a smaller sample.
Despite the questions, I buy Diamond and Sekhon's argument. Genetic matching makes effective use of computing power to search across a high-dimensional space for the most balanced sample that the data can provide. In cases where there is insufficient overlap on covariates, data analysts will know this quickly rather than devoting weeks to Holy Grail-style quests for optimal matches. And in cases where there is sufficient overlap on the covariates to make causal inferences, data analysts will be far more certain that they have attained the best possible balance—again, subject to the constraints about the loss function.
8 September 2005
This article in World Politics on forecasting state failure that Langche Zeng (who by the way is moving this week from GW to UCSD) and I wrote a few years ago seems relevant to what is presently happening in New Orleans. Here are the opening sentences of the article: "`State failure' refers to the complete or partial collapse of state authority, such as occurred in Somalia and Bosnia. Failed states have little political authority or ability to impose the rule of law [on its citizens]." We normally associate state failure with foreign countries you would not want to visit, but with a third of the New Orleans police force not showing up for work, with the two-thirds that remained barricaded in their homes or police stations, with corpses strewn around the streets from the hurricane and some murders, and where a policeman today "joked that if you wanted to kill someone here, this was a good time" (see today's NY Times Article), it is hard to see how New Orleans this past week was anything but the definition of state failure.
Our article was about some methodological errors we found in the U.S. State Failure Task Force's forecasts and methods of forecasting. They had selected data via a case-control design (i.e., selecting on their dependent variable all examples of state failure and a random sample of nonfailures), which can save an enormous amount of work in data collection, but it is only valid if you properly correct. The Task Force didn't correct and so, for example, their forecast for Brazil failing was reported at 0.72 but their model, correctly interpreted, indicated that it was only 0.11; their reported forecast for Somalia failing was 0.45, but the model actually indicated that it was only 0.04. We also improved their methods and thus forecasting success over their corrected models via neural network methods and some other approaches. They also collected one of the best data sets on the subject, which you might want to use.
The charter of the U.S. State Failure Task Force prohibits it from discussing state failure in the U.S. or making forecasts of U.S. state failure, but by their definitions, there is little doubt that for a time anyway all relevant governmental authorities in the U.S. suffered a "complete or partial collapse of state authority" and so the U.S. would seem to fit that definition. I haven't checked, but I doubt their model or our's had any ability forecast these events.
7 September 2005
The 14 papers presented at the 2005 Conference of the Society for Political Methodology, held July 21st-July 23rd in heavily air-conditioned rooms at Florida State, provided plenty of good fodder for discussion. I will focus on several I found especially provocative--and on which I could reasonably comment--in blog posts over the next few days.
Starting off the conference, Gary King's presentation of his paper Death by Survey: Estimating Adult Mortality without Selection Bias'' with Emmanuela Gakidou argued that we need to take a new approach to estimating death rates in the many countries that do not have vital registration systems. The dominant approach at the present assumes that larger families do not have differing mortality rates, but given the uneven pace of development in so many countries, that seems a heroic assumption, and their paper shows it is completely wrong empirically. King and Gakidou's approach involves two fixes: first, weighting to deal with the over-sampling of families with more surviving children during the observation period (since samples are drawn in proportion to survivors rather than those alive at the start of the period), and second, extrapolation to deal with the fact that families with no surviving children are entirely excluded from the sample. The first problem is fixed exactly by weighting; the second requires assumptions beyond the range of the data. Some discussion focused focused on one of the main challenges to this second fix—that it involves extrapolation, extrapolation based on a small number of data points, and extrapolation based on a quadratic term. The paper deals with the danger of extrapolation through repetition in different data: By showing that the relationship between mortality and the number of siblings is constant in its shape across a wide range of countries, King and Gakidou argue that we can be reasonably confident about the fit of the curve from which we are extrapolating. The authors are now gathering survey data to replicate this approach in cases where we know the answer--that is, where we also have accurate, non-survey data on mortality rates. That is especially critical since families without any surviving children might be disproportionately the victims of wars or other violence, for instance, making it challenging to use data about families with surviving children to make inferences about families without any surviving children.
Another early paper came from Kevin Clarke, who argued that political science as a discipline has become too worried about omitted variable bias. Clarke took another look at the familiar theoretical omitted variable bias result and pointed out that contrary to conventional wisdom, including additional variables can, under certain circumstances, exacerbate problems of omitted variable bias. The circumstances are that something else is wrong: that is, omitting a variable that is causally prior to and correlated with the treatment variable and affects the outcome variable will bias inferences in a predictable direction, and including it will reduce bias -- but only when other modeling assumptions are correct. If you have five things wrong with your model and you fix four, it is at least possible that you can make things worse.
In my view, the sociology of the discipline embedded in Clarke's presentation was right on. In substantive presentations, it is incredibly common for presenters to be barraged with questions that are of this form: "did you account for [insert favorite variable]? What about [insert second favorite variable]? Or maybe [insert random variable that no one before or since has ever heard of]?" Reviewers, too, seem to find this an easy way to respond to articles. One way to deal with this problem--sensitivity tests--was highlighted during the discussion. Our models are almost never perfectly specified, so there will always be omitted variables, and knowing how those variables would need to look to overturn a result is a good (if incomplete) start to deal with this problem. One example of how these kinds of sensitivity analyses might work, by the way, is David Harding's 2003 "Counterfactual Models of Neighborhood Effects: The Effect of Neighborhood Poverty on Dropping Out and Teenage Pregnancy." (American Journal of Sociology 109(3): 676-719). Another is Paul Rosenbaum and Don Rubin's 1983 "Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome." (Journal of the Royal Statistical Society, Series B 45: 212-218).
One other case against overly-saturated models, one that did not come up in the discussion but that is probably familiar to many, is the challenge of thinking in terms of conditional effects as the number of variables increases. For instance, if we think about vote choices as our dependent variable, I understand what it means to talk about the impact of income conditional on race, but it is much harder to know what it means to say the impact of income conditional on ten other, inter-correlated variables. This problem becomes all the more difficult when we remember that we are conditioning not just on the inclusion of certain variables but also on the functional form specified for them.
Because I am a Harvard graduate student, I should also play to type and say something briefly about how matching (which these days is well-represented in hallway conversations at IQSS) relates to omitted variables. Obviously, it is no panacea, as unobserved confounders can be just a troublesome as in the case of more conventional models. But there is one way in which matching adds value here. In cases where we are matching observations of units for which we have information not quantified in our dataset, looking at the list of matched pairs can help identify the omitted variable. If, say, we are studying countries, and see that our observed variables wind up pairing Ethiopia and Greenland, we can use that pairing to think through what kinds of unobserved variables might be potential confounders.
Dan Hopkins, G4, Government (guest author)
6 September 2005
Welcome to the Social Science Statistics Blog, hosted by the Institute for Quantitative Social Science at Harvard University. We are starting this blog today in order to make public some of the hallway conversations about social science statistical methods and analysis that are regular features at the Institute and related research groups. Perhaps you may have also found that while formally published research is emphasizing one topic or approach, conversations with scholars at conferences reveal a strong trend about to proceed in a new direction. We similarly find correlated trends in the work-in-progress of many of Harvard's methodologists and visitors' informal speculations and plans revealed while making their rounds at our seminars. We find familiarity with these trends to be valuable for our own research, and so we hope to record some of this information here for ourselves, our students, and anyone else who may wish to listen in.
This blog may be especially useful at Harvard given the high level of decentralization here -- referred to by Harvard insiders as "every tub [or School] on its own bottom" although, since this decentralization often goes right down to the individual faculty member, I sometimes think a better phrase might be "everyone with a bottom has their own tub." To prevent this formal structure from having negative intellectual consequences, faculty here often invent structures to span our formal structures. This blog is one of those structures. Another is a weekly Research Workshop on Applied Statistics we started a few years ago, billed as "a tour of Harvard's statistical innovations and applications with weekly stops in different disciplines". Every week during the academic year, a differing subset of the almost 300 faculty and students from across the university who have signed themselves up for our mailing list appear at the Institute for a talk on some aspect of social science methods or their application. Most of us find this regular exchange with such a diverse group of scholars to be highly productive, although we sometimes have to figure out how to translate the jargon describing the same statistical models from one discipline to another (most are familiar with either "Malmquist Bias" in Astronomy or "Selection Bias" in Economics but rarely both, despite the fact that they are almost identical mathematically.)
Although most of our blog posts will involve other subjects, one post each week during the academic year will include summaries of (and when available links to) papers presented at our weekly seminar, along with a sense of the discussion that takes place afterwards. Some of the other topics we plan will include posts on trends in methodological thought, questions and comments, paper and conference announcements, applied problems needing methodological solutions, methodological techniques seeking applied problems, and whatever else may be of interest and occurs to someone around here. Comments on posts are welcome from others too.
The main responsibility for the daily posts on this blog has been taken by an extremely talented group of gradaute students representing six different academic disciplines. Our authoring team is chaired by Jim Greiner, who has a law degree, practical experience with the Justice Department and a law firm in Washington D.C., and is now a Ph.D. candidate in Harvard's Statistics Department. Members of our committee include Sebastian Bauhoff, in the Economics track of the Health Policy Ph.D. Program; Felix Elwert, a Ph.D. candidate in the Department of Sociology and an A.M. candidate in the Department of Statistics; John Friedman, a Ph.D student in the Economics department; Jens Hainmueller and Mike Kellermann, graduate students in the Department of Government; Amy Perfors, a graduate student in the Brain and Cognitive Sciences department at MIT; Andrew (Drew) C. Thomas, who after getting a B.A. in physics from MIT has joined the Department of Statistics Ph.D. program; and Jong-Sung You, a Ph. D. Candidate in Public Policy at the Kennedy School and Doctoral Fellow of Inequality and Social Policy Program. Please read more about our team here.