31 October 2006
Jacob Eisenstein at MIT has developed an smart election predictor for the US Senate Elections using a Kalman Filter. The filter helps to decide how much extra weight to attach to more recent polls. Check it out here; he also has some details on the method here.
In a previous post about the Gerber & Malhotra paper about publication bias in political science, I rather optimistically opined that the findings -- that there were more significant results than would be predicted by chance, and that many of those were suspiciously close to 0.05 -- were probably not deeply worrisome, at least for those fields in which experimenters could vary the number of subjects run based on the significance level achieved thus far.
Well, I now disagree with myself.
This change of mind comes as a result of reading about the Jeffreys-Lindley paradox (Lindley, 1957), a Bayes-inspired critique of significance testing in classical statistics. It says, roughly, that with large enough sample size, a p-value can be arbitrarily close to zero even though the null hypothesis is highly probable (i.e., very close to one). In other words, a classical statistical test might reject the null hypothesis at an arbitrarily low p-value, even though the evidence that it should be accepted is overwhelming. [A discussion of the paradox can be found here].
When I learned about this result a few years ago, it astonished me, and I still haven't fully figured out how to deal with all of the implications. (This is obvious, since I forgot about it when writing the previous post!). As I understand the paradox, the intuitive idea is that, with larger sample size, you will naturally get some data that appears unlikely (and, the more data you collect, the more likely you are to see some really unlikely data). If you forget to compare the probability of that data under the null hypothesis with the probability of the data under the alternative hypotheses, then you might get an arbitrarily low p-value (indicating that the data are unlikely under the null hypothesis) even if the data is even more unlikely under any of the alternatives. Thus, if you just look at the p-value, without taking effect size, sample size, or the comparative posterior probability of each hypothesis under consideration, you are likely to wrongly reject the null hypothesis on the basis of the p-value, even if it is the most likely of all possibilities.
The tie-in with my post before, of course, is that it implies that it isn't necessarily "okay" practice to keep increasing sample size until you achieve statistical significance. Of course, in practice, sample sizes rarely get larger than 24 or 32 -- at the absolute outside, 50 to 100 -- which is much smaller than infinity. Does this practical consideration, then, mean that the practice is okay? As far as I can tell, it is fairly standard (but then, so is the reliance on p-values to the exclusion of effect sizes, confidence intervals, etc., so "common" doesn't mean "okay"). Is this practice a bad idea only if your sample gets extremely large?
Lindley, D.V. (1957) A statistical paradox. Biometrika, 44. 187-192
30 October 2006
This week the Applied Statistics Workshop will present a talk by Nan Laird, Professor of Biostatistics in the Harvard School of Public Health, and Christoph Lang, Assistant Professor of Biostatistics in the Harvard School of Public Health.
Before joining the Department of Biostatistics, Professor Laird received her Ph.D. in Statistics from Harvard and was an Assistant Prof. of Statistics at Harvard. She has published extensively in Statistics in Medicine, Biostatistics, American Journal of Human Genetics and the American Journal of Epidemiology among others. Her research interest is the development of statistical methodology in four primary areas: statistical genetics, longitudinal studies, missing or incomplete data, and analysis of multiple informant data.
Professor Lang earned his Ph.D. in Applied Statistics from the University of Reading, and has been a member of the Department of Biostatistics since then. His publications have appeared in Biostatistics, the American Journal of Human Genetics, Genetic Epidemiology, and Genetics. Prof. Lange's current research interests fall into the broad areas of statistical genetics and generalized linear models. Recent topics in statistical genetics include family-based association tests, meta-analysis of linkage studies, GEE-methods in linkage analysis and marker-assisted selection.
Prof. Laird and Prof. Lang will present a talk entitled “Statistical Challenges and Innovations for Gene Discovery”. An abstract for the talk and associated background papers are available from the course website. The presentation will be at noon on Wednesday, November 1st, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.
29 October 2006
Reading the Data Mining blog, I just learned about this cool visualization of the US population density presented by Time magazine.
Take a closer look here. Cute, isn't it?
27 October 2006
Here's a question (alright, a bleg) for any economist-types out there: can you recommend any articles or books that integrate the potential outcomes framework for causal inference with the type of equilibrium analysis that is usually used in microeconomic modeling? I'm not exactly looking for cases where someone says "my comparative statics say the effect should be positive and, voila, it is!", but rather an applied article in which the potential outcomes arise naturally from the structure of the model. Or, even better, something more philosophical that attempts to integrate the potential outcomes approach with equilibriumist models of behavior. A quick Google Scholar search on "potential outcomes", "causal inference", and "equilibrium" only bring up about 80 hits, many of which appear to be by James Heckman, so any pointers to more sympathetic treatments would be appreciated!
26 October 2006
Newcomb’s paradox is a classic problem in philosophy and also an entertaining puzzle to consider. Here is one version of the paradox. Suppose you are presented with two boxes, A and B. You are allowed to take just box A, just box B, or both A and B. There will always be $1000 in box A, and there will either be $0 or $1,000,000 in box B.
A ‘predictor’ determines the contents of box B before you have arrived, using the following plan. If the predictor believes you will pick both box A and B, then she places nothing in box B, but if she believes that you will only take box B, then she places the $1,000,000 in box B.
What makes this predictor special is her amazing accuracy. In the previous billion plays of the game she has never been wrong.
So, you have the two boxes in front of you, what should you do? Keep in mind, the predictor has already made her decision when you arrive at the boxes, so by our normal rules of causality (events in the future cannot cause past events), our actions cannot change what the predictor has decided.
25 October 2006
Quantitative expert witnesses are essential to modern litigation. But why do they disagree so often?
An excerpt from an article by Professor Franklin Fisher appears below. It’s a tad long, but it’s really worth reading. Does it ring a familiar bell with anyone out there?
“It is not, however, always easy to avoid becoming a ‘hired gun’ . . . The danger is sometimes a subtle one, stemming from a growing involvement in the case and friendship with the attorneys. For the serious professional, concerned about preserving his or her standards, the problem is not that one is always being asked to step across a well-defined line by unscrupulous lawyers. Rather, it is that one becomes caught up in the adversary proceeding itself and acquires the desire to win. . . . Particularly because lawyers play by rules that go beyond those of academic fair play, it becomes insidiously easy to see only the apparent unfairness of the other side while overlooking that of one’s own.”
Franklin M. Fisher, Statisticians, Econometricians, and Adversary Proceedings, 81 J. AM. STAT. ASS’N. 277, 285 (1986)
24 October 2006
Here’s an interesting piece that should help you keep your New Semester resolutions by understanding procrastination better. Sendhil Mullainathan recently used research by Dan Ariely and Klaus Wertenbroch as motivation for his undergraduate psychology and economics class. Though it’s not exactly statistics, it seems the insights could be useful for grad students and their advisors.
Ariely and Wertenbroch did several experiments to see how deadlines might help overcome procrastination. They examine whether deadlines might be effective pre-commitment devices, and whether they can enhance performance. In one of their experiments, they asked participants to proofread three meaningless synthetic texts. Participants received financial rewards for finding errors and submitting on time (just like in a problem set…). They randomized participants into three categories: three evenly-spaced deadlines every 7 days; an end-deadline after 21 days; or a self-imposed schedule of deadlines within a three week period.
Which one would you select if you could? Maybe the end-deadline because it gives you the most flexibility in arranging the work (similar to a final exam or submitting your dissertation all at once)? Ariely and Wertenbroch found that the end-deadline does the worst both in terms of finding errors and submitting on time. Participants with evenly-spaced deadline did best. But that group also liked the task the least, maybe because they had several unpleasant episodes of reading silly texts, or because they spent more time than the other groups.
So when you start your semester with good intentions, consider setting some reasonable and regular deadlines that bind, and get a calendar. Or just wait for the New Year for another chance to become resolute and have another drink in the meantime.
20 October 2006
With the World Series about to get underway, featuring the rubber match between the Detroit Tigers and the St. Louis Cardinals (Round 1 went to the Cardinals in 1934, Round 2 to the Tigers in 1968, but maybe this is a best of five and we won't see the end until 2076), it is worth reflecting on the influence baseball has had on statistics and vice versa. I mentioned Frederick Mosteller's analysis after the 1946 World Series in a previous post, but many statisticians share his interest in baseball. Dozens of baseball-related articles have appeared in statistical journals over the years, attempting to answer substantive questions ("Did Shoeless Joe Jackson throw the 1919 World Series?") or to motivate statistical techinques ("Parametric Empirical Bayes Inference: Theory and Applications", with application to Ty Cobb's batting average). Within political science, more than one methodologist has told me about the hours that they spent tracking batting averages and OBP's when they were growing up (OK, so it may have been cricket in a few cases). Going in the other direction, there is no question that the Moneyball approach to baseball has been enormously influential, even if the jury is still out about its implications for the post-season. As Harry Reasoner once said, "Statistics are to baseball what a flaky crust is to Mom's apple pie." To which I can only add, Go Tigers!
19 October 2006
As a lawyer, I have to be interested not just in what quantitative principles are true, but also in how to present “truth” to people without quantitative training. To that end, HELP! One of the maddening things about statistics is Simpson’s paradox. The quantitative concept, undoubtedly well-known to most readers of this blog, is that the correlation between two variables can change sign and magnitude, depending on what is conditioned on. That is, Corr(A, B | C) might be positive, while Corr(A, B | C, D) might be negative, while Corr (A, B | C, D, E) might be positive again. At bottom, this is what’s going on when regression coefficients become (or cease to be) significant as one adds additional variables to the right-hand side. Because regression currently enjoys a stranglehold on expert witness analyses in court cases (I’ll be ranting on that in the future), communicating Simpson's Paradox a matter of real concern for someone like me who cares about what juries see, hear, and think. Any ideas on how to get this concept across?
18 October 2006
There was for a while a post on this blog with comments about an article on deaths in the Iraq war. Despite many good points all parties had in this discussion, it was distracting some folks here from our mission to make the world safe for quantitative methodology. If you're interested in reading more about this subject, we recommend Andrew Gelman's blog post on this subject, which includes many of those who posted and commented here. Thanks for the original post and everyone who commented, and sorry for the confusion.
Social scientists, who often have a limited ability to create true experiments and replicate studies, value ways to learn from the synthesized results of previous work. A popular quantitative tool designed for this purpose is meta-analysis, which calculates a standardized effect size for each of a set of studies in a literature review and then performs inference on the resulting set of effect sizes. Meta-analysis is particularly common in education research.
Can we trust the results of these analyses?
On the one hand, when performed correctly, meta-analysis should successfully summarize the information available in multiple studies. Combining the results in this way can increase the power of overall conclusions when the sample size in each study is relatively small.
On the other hand, a good meta-analysis relies on the assumption that the original studies were unbiased and generally well-performed. In addition, we hope that the researchers in each study had the same target population in mind and worked independently of each other. Further complicating matters is the potential for publication bias – a meta-analysis will rarely include unpublished studies with less impressive effect sizes.
The second hand represents the view of Derek Briggs at the University of Colorado, Boulder, who in a 2005 Evaluation Review paperobjected to what he saw as the overuse of meta-analysis in social science research. He also suggested that assumptions necessary for a reliable meta-analysis are not always met.
More to come on this topic next time.
16 October 2006
This week the Applied Statistics Workshop will present a talk by Charles E. Loeffler, Ph.D. Candidate in Sociology at Harvard University.
Charles graduated from Magna Cum Laude from Harvard with a degree in Social Studies, before going on to receive his M. Phil in Criminology from Cambridge University. He has recently completed the National Consortium on Violence Research Pre-Dissertation Fellowship under the mentorship of Prof. Steven Levitt of the University of Chicago. His work has appeared in The New Republic Online, Federal Sentencing Reporter, and Ars Aequi: A Biographical History of Legal Science. Charles's research interests include Criminology, Quasi-Experimental Methods and Decisionmaking.
Charles will present a talk entitled "Is justice blind? A natural experiment in the use of judicial discretion in criminal trials". The working paper for the talk is available from the course website. The presentation will be at noon on Wednesday, October 18th, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.
12 October 2006
Doging Bill Collectors
As Tailor (A) fits customer (B) and calls out measurements, college boy (C) mistakes them for football signals and makes a flying tackle at clothing dummy (D). Dummy bumps head against paddle (E) causing it to pull hook (F) and throw bottle (G) on end of folding hat rack (H) which spreads and pushes head of cabbage (I) into net (J). Weight of cabbage pulls cord (K) causing shears (L) to cut string (M). Bag of sand (N) drops on scale (O) and pushes broom (P) against pail of whitewash (Q) which upsets all over you causing you to look like a marble statue and making it impossible for you to be recognized by bill collectors. Don't worry about posing as any particular historical statue because bill collectors don't know much about art (more on causal chains in cartoons click here).
11 October 2006
Today's papers were full with reports of a new study in the Lancet (here) on counting the excess deaths in Iraq since the US invasion in 2003. The article by Johns Hopkins researchers is an update on a study published in 2004 which generated a huge debate about the political as well as statistical significance of the estimates. This time the media's attention is again on the magnitude of the estimate (655,000 excess deaths, most of them due to violence) which is again vastly higher than other available numbers. The large uncertainty (95% CI 390,000 - 940,000) gets fewer comments this time, maybe because the interval is further away from 0 than in the 2004 study.
Just to point you to some interesting articles, here is a good summary in today’s Wall Street Journal. Wikipedia has a broad overview of the two studies and criticisms here. Brad deLong responded to criticisms of the 2004 study here; he also covers problems with the cluster sampling approach. And check this and this for some related posts on this blog.
By the way, the WSJ article has a correction for misinterpreting the meaning of 95% confidence. Maybe you can use it convince your stats students that they should pay attention.
Jeremy Freese, an RWJ Health Policy Scholar at IQSS this year, sent me this amazing abstract (below) from the front lines of the replication movement, in psychology. On the same topic, but different discipline, don't miss Jeremy's "Reproducibility Standards in Quantitative Social Science: Why Not Sociology?" (find the pdf at his homepage) forthcoming, Sociological Methods and Research, July 2006. (I've written some on this topic too).
"The Poor Availability of Psychological Research Data for Reanalysis" By Wicherts, Jelte M.; Borsboom, Denny; Kats, Judith; Molenaar, Dylan American Psychologist. 61(7), Oct 2006, 726-728.
The origin of the present comment lies in a failed attempt to obtain, through e-mailed requests, data reported in 141 empirical articles recently published by the American Psychological Association (APA). Our original aim was to reanalyze these data sets to assess the robustness of the research findings to outliers. We never got that far. In June 2005, we contacted the corresponding author of every article that appeared in the last two 2004 issues of four major APA journals. Because their articles had been published in APA journals, we were certain that all of the authors had signed the APA Certification of Compliance With APA Ethical Principles, which includes the principle on sharing data for reanalysis. Unfortunately, 6 months later, after writing more than 400 e-mails--and sending some corresponding authors detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes-we ended up with a meager 38 positive reactions and the actual data sets from 64 studies (25.7% of the total number of 249 data sets). This means that 73% of the authors did not share their data.
10 October 2006
I can't resist chiming in and contributing post VI on causation and manipulation, but coming at a rather different angle: rather than ask what we as researchers should do, the cognitive science question is what people and children do do - what they assume and know about causal inference and understanding.
You might think that people would (for lack of a better term) suck at this, given other well-known difficulties in reasoning, anecdotal reports from educators everywhere, etc, etc. However, there's a fair amount of evidence that people -- both adults and children -- can be quite sophisticated causal reasoners. The literature on this is vast and growing, so let me just point out one quite interesting finding, and maybe I'll return to the topic in later posts.
One question is whether children are capable of using the difference between evidence from observations and evidence from intervention (manipulation) to build a different causal structure. The well-named "theory theory" theory of development suggests that children are like small scientists and should therefore be quite sophisticated causal reasoners at an early age. To test this, Schulz, Kushnir, & Gopnik [pdf showed preschool children a special "stickball machine" consisting of a box, out of which two sticks (X and Y) rose vertically. The children were told that some sticks were "special" and could cause the other sticks to move, and some weren't. In the test condition, children saw X and Y move together on their own three times; the experimenter then intervened to pull on Y, causing it to move and X to fail to move. In the experimental condition, the experimenter pulled on one stick (X) and both X and Y moved three times; a fourth time the experimenter pulled on Y again, but only it moved (X was stationary).
The probability of each stickball moving conditioned on the other are the same in both cases: however, if the children reason about causal interventions, then the experimental group -- but not the control group -- should perceive that X might cause Y to move (but not vice-versa). And indeed, this was the case.
Children are also good at detecting interventions that are obviously confounded, overriding prior knowledge, and taking base rate into account (at least somewhat). As I said, this is a huge (and exciting) literature, and understanding people's natural propensities and abilities to do causal reasoning might even help us address the knotty philosophical problems of what a cause is in the first place.
9 October 2006
This week the Applied Statistics Workshop will present a talk by Matthew C. Harding, Ph.D. Candidate in Economics at the Massachusetts Institute of Technology.
Before coming to MIT, he received his M. Phil. in Economics at Oxford University. His research interests include Econometrics, American Politics, Political Economy, Macro-Finance, Economic Theory, Industrial Organization and Behavioral Economics. His publications appear in the International Economic Review and Macroeconomics: Imperfections, Institutions, and Policies.
Harding will present a talk entitled "Stochastic Eigen-analysis for Economics, Finance and Political Science". An abstract and accompanying working papers for the talk are available from the course website. The presentation will be at noon on Wednesday, October 4th, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.
6 October 2006
Fair warning: This entry includes a plug for one of my papers
Anti-discrimination laws require lawyers to figure out the causal effect of race (gender, ethnicity) on certain decision making. Previous posts have been exploring the often-tossed-around idea of considering the treatment to be perceived race, as opposed to "actual" (whatever that means) or self-identified race, to answer the no-causation-without-manipulation objection. This feels like a good idea, but it really only works in some cases and not others. It works when we can identify a specific actor (or an institution) whose behavior we want to study. Capital sentencing juries and a defendant firm in an employment discrimination lawsuit are two that work. We can think about changing these specific actors' perceptions of particular units (capital defendants, potential employees), and we can think about WHEN it makes sense to think of treatment (the perception) as being applied: at the moment the actor first perceives the unit's race (or gender or whatever). In contrast, "the public" or "the set of all employers in the United States" are two examples of actors that don't work. The timing of treatment assignment no longer makes sense, the counterfactuals are too hard to imagine, and the usual non-interference-among-units assumption becomes hard to think about.
What does all this buy us? A fair amount. First, this line of thinking identifies cases in which rigorous causal inference based on the potential outcomes framework remains beyond our reach. Figuring out the causal effect of gender or salaries nationwide is one example; another is the causal effect of candidate race on election outcomes. Second, in those cases in which we can identify a specific actor, we get a coherent conceptualization of the timing of treatment assignment, which allows us to distinguish pre- from post-treatment variables. This is a big deal. Entire lawsuits sometimes turn on it.
All this has important implications for civil rights litigation, as I discuss in my paper, "Causal Inference in Civil Rights Litigation." You can get a draft (pdf) of this paper from my website, which you can access by clicking on my name to the left. I'd appreciate any reader reactions/suggestions.
Boston Chapter of the American Statistical Association Evening Lecture Series
"Rich state, poor state, red state, blue state: What's the matter with
Connecticut? A demonstration of multilevel modeling"
Andrew Gelman, Columbia University*
IQSS, 1737 Cambridge Street, Room N354
Monday, October 9, 7:30pm
* Andrew will also be on hand at IQSS on the morning of Tuesday, October 10, to answer any stats questions you might have.�
5 October 2006
People who read this blog regularly know that few things get authors and commentators as worked up as questions about causal inference, either philosophical (here, here, and here) or technical (here, here, here, etc.). I wouldn't want to miss out on the fun this time around -- and how could I pass up the opportunity to have the IV post on causation and manipulation?
Jens and Felix have both discussed whether non-manipulable characteristics such as race or gender ("attributes" for Holland) can be considered causes within the potential outcomes framework. I agree with them that, at least as far as Holland is concerned, the answer is (almost always) no - no causation without manipulation. The fact that we are having this discussion 20 years later suggests (to me, at least) that this answer is intuitively unsatisfying. It is worth remembering a comment made by Clark Glymour in his discussion of the Holland (1986) article:
People talk as they will, and if they talk in a way that does not fit some piece of philosophical analysis and seem to understand each other well enough when they do, then there is something going on that the analysis has not caught.
Identifying perceptions of an attribute (rather than the attribute itself) as the factor subject to manipulation makes a lot of sense in situations where the potential outcomes are to a certain degree out of the control of the individual possessing the attribute, as in the discrimination example. Extending this idea to situations in which outcomes are generated by the subject possessing the attribute (in which "self-perceptions" would be manipulated) would commit researchers to a very particular understanding of attributes such as race and gender that would hardly be uncontroversial.
In these cases, I think that it makes more sense to look at the differences in well-specified Rubin-Holland causal effects (i.e. the results of manipulation) conditional on values of the attribute rather than identifying a causal effect as such. So, for example, in the gender discrimination example we could think of the manipulation as either applying or not applying for a particular job. This is clearly something that we could randomize, so the causal effect would be well defined. We could calculate the average treatment effect separately for men and women and compare those two quantities, giving us the difference in conditional causal effects. I'm sure that there is a catchy name for this difference out there in the literature, but I haven't run across it.
So, is this quantity (the difference in conditional causal effects) of interest to applied researchers in the social sciences? I would argue that it is, if for nothing else than giving us a more nuanced view of the consequences of something that we can manipulate. Is it a Rubin-Holland causal effect? No, but that is a problem only to the extent that we privilege "causal" over other useful forms of inference.
4 October 2006
Two recent post by Jim and Jens ponder the holy grail of manipulability via the exchange between Holland and Heckman. Can non-manipulable things like gender or race cause things in the potential outcomes framework?
Holland (1986) says no because it’s hard to conceive of changing the unchangeable. Fair enough. But this argument has been carried too far in some quarters and not far enough in others. Here’s why:
Invoking Holland, some population scientists now go so far to claim that we can’t conceive of things like marriage or divorce as causes because the decision to marry or divorce is beyond the direct control of an experimenter. Please. At most we need some exogeneity, a little speck of indifference, a tipping point to make them amenable to coherent causal thinking (and estimation). Heckman goes even farther than this, and he is right: the issue is not whether I, personally, can wreck all marriages in my study, but whether we can coherently conceive of a counterfactual world where things are different as a matter of theoretical speculation ("mental act"). In this, however, even Heckman seems to yield: A minimum requirement for thinking about counterfactual worlds would appear to be the possibility of conceiving of these worlds in a coherent fashion. And this, I believe is the underlying unease of the statisticians whom Heckman criticizes: whether one can even coherently imagine counterfactual worlds in which gender is changed.
On the other hand, social scientists love to talk about the effects of gender and race, which – pace Michael Jackson and Deidre McCloskey – are really hard to think of as manipulable, ceteris paribus. What Holland’s dictum contributes in this respect is the entirely appropriate call for getting the question straight.* For what most of these studies look for is evidence of discrimination. Thinking about discrimination within the potential outcomes framework makes it clear that the issue really isn’t whether we can manipulate the race or gender of a specific person, but rather whether we can manipulate the perception of the person’s race or gender in the eyes of the discriminator. Cases in point: Goldin and Rouse’s study on discrimination in symphony orchestras, where the gender of applicants was obscured (i.e. perceptions manipulated) by staging auditions behind an opaque gauze barrier. Similarly, Grogger and Ridgeway’s paper in the latest issue of JASA uses natural variation in the perceptibility of driver’s skin color (dusk, the veil of darkness) to test for racial profiling in traffic controls. In either case, the causal question was not, what would happen if we changed the musician/driver from female/black to male/white, but, What would happen if we could change knowledge/perception of race and gender.
In other words, there are important causal questions to be asked about race and gender, but these questions don’t necessarily require the manipulability of race and gender. Not even within the potential outcomes framework of causality.
* My pet peeve: Much of social science is so busy providing answers that it forgets to ask well-formulated questions.
3 October 2006
In a recent post, Jim Greiner asked whether we adhere to the principle of "no causation without manipulation." This principle, if true, raises the question of whether it makes sense to talk about the causal effect of gender.
The Rubin/Holland position on this is clear: it makes no sense to talk about the causal effect of gender because what manipulation and thus what counterfactual one has in mind (a sex-transformation surgery?) is clearly ill-defined. One can ask related questions like sending resumes to employers randomizing female and male names and see whether one gender is more likely to be invited to a job interview, but it makes no sense to think about a causal effect of gender per se.
The contrasting view is presented by one of their main foils, James Heckman, who writes in a recent paper (Andrew Gelman also had a blog post on this): "Holland claims that there can be no causal effect of gender on earnings. Why? Because we cannot randomly assign gender. This confused statement conflates the act of definition of the causal effect (a purely mental act) with empirical difficulties in estimating it. This type of reasoning is prevalent in statistics. As another example of the same point, Rubin (1978, p. 39) denies that it is possible to define a causal effect of sex on intelligence because a randomization cannot in principle be performed. In this and many other passages in the statistics literature, a causal effect is defined by a randomization. Issues of definition and identification are confused. [...] the act of definition is logically separate from the acts of identification and inference." Heckman sees this as a "view among statisticians that gives rise to the myth that causality can only be determined by randomization, and that glorifies randomization as the ‘‘gold standard’’ of causal inference."
So what do you make of this? Does it make sense to think about a causal effect of gender or not? Does it make sense to try to estimate it, i.e. interpret a gender gap in wages as causal (balance on all confounders except gender). How about the causal effect of race, etc.? Just to be precise here notice that Rubin/Holland admit that "even thought it may not make much sense to talk about the 'causal' effect of a person being a white student versus being a black student, it can be interesting to compare whites and blacks with similar background characteristics to see if there are differences" in some outcome of interest.
2 October 2006
This week the Applied Statistics Workshop will present a talk by Subharup Guha, Post-Doctoral Research Fellow in the Harvard School of Public Health Department of Biostatistics, and Louise Ryan, Henry Pickering Walcott Professor of Biostatistics in the Harvard School of Public Health and Department of Biostatistical Science at the Dana-Farber Cancer Institute.
Before coming to Harvard, Dr. Guha received his Ph.D. in Statistics at Ohio State University. Dr. Guha’s publications appear in Environmental and Ecological Statistics, Journal of the American Statistical Association, Journal of Computational and Graphical Statistics and the Journal of the Royal Statistical Society. His research interests include Bayesian modeling, computational biology, MCMC simulation, Semiparametric Bayesian methods, Spatio-temporal models and survival analysis.
Professor Ryan earned her Ph.D. in Statistics from Harvard University, and has been a member of the Department of Biostatistics since then. She has received numerous honors and distinctions during that time including the the Spiegelman Award from the American Public Health Association, and was named Mosteller Statistician of the Year. She has published extensively in Biometrics, Journal of the American Statistical Association, Journal of Clinical Oncology, and the New England Journal of Medicine. Her research interests focus on statistical methods related to environmental risk assessment for cancer, developmental and reproductive toxicity and other non-cancer endpoints such as respiratory disease, with a special interest in the analysis of multiple outcomes as they occur in these applied settings.
Dr. Guha and Professor Ryan will present a talk entitled "Gauss-Seidel Estimation of Generalized Linear Mixed Models with Application to Poisson Modeling of Spatially Varying Disease Rates." The paper that accompanies the talk is available from the course website. The presentation will be at noon on Wednesday, October 4th, in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.
The New York Times recently published an obituary for David Lykken, who was a pioneer of twin studies. His “Minnesota Twin Studies” suggested the importance of genetic factors in life outcomes. But his work with twins also spurred empirical research in many fields, not just genetics – and for good reason.
The idea of using twins for social science studies is very appealing: some twins are genetically identical, and also grow up in the same family and environment. So from a statistical perspective, comparing outcomes such as earnings between pairs of twins is like having a “perfect match." This idea made the rounds in many fields, such as labor economics. By using the argument that all unobserved characteristics (e.g. “genetic ability”) should be equal and can thus be differenced away, twin studies were used to estimate the returns to education – the effect of education on wages.
Alas there are potential problems with using twin data. For example, measurement error in a difference estimation can lead to severe attenuation bias precisely because twins are so similar. If there is little variation in educational attainment, even small measurement errors can strongly affect the estimate. Researchers have been ingenious about this (e.g. by instrumenting one persons’ education with the level that her twin reported, as in Ashenfelter and Krueger). While this may reduce the attenuation bias it can magnify the omitted variables bias which motivated the use of twins in the first place. Because there are only small differences in schooling, small unobserved differences in ability can lead to a large bias. The culprits can be details such as differences in birth weight (Rosenzweig and Wolpin have a great discussion of such factors). In addition, twins who participate in such studies are a selected group: they are getting along well enough to participate, and many of them get recruited at “twin events.” But not all twins party in Twinsburg, Ohio.
Of course none of this is to belittle the contribution of Dr Lykken, who besides helping to start this flurry of work also was also a major contributor to happiness research.