May 2006
Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

Authors' Committee


Matt Blackwell (Gov)


Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries



SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
The Education Wonks
Empirical Legal Studies
Free Exchange
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science



Powered by
Movable Type 4.24-en

« April 2006 | Main | June 2006 »

25 May 2006

Dirichlet Spaces and Metropolis Traces

Drew Thomas

A problem I've had come up again and again is the ability to explore a space bound by a Dirichlet prior with a Metropolis-type algorithm. I've yet to find a satisfactory answer and I'm hoping someone else will have some insight.

The research question I have deals with allocating patients to hospitals, considering the effect of the number of beds - one example of the "supply-induced demand" question. (The analysis is being done under Prof. Erol Pekoz, who's visiting Harvard Stats this year.) Conjugate priors for this problem have eluded me, and so the quantity of interest, the probability that a patient will be sent to a particular hospital for inpatient care, is being inferred through a Metropolis algorithm.

Here's the thing: there are at most 64 different hospitals to which a patient can be assigned. Even after assuming that if a hospital has not yet received a patient from a particular area they won't ever, the number of hospitals is extreme.

One suggested proposal has been a Dirichlet distribution with parameters equal to the current values, times a constant. That way the expected value of the proposal will be the same as the last draw. However, when the number is too low, the smallest dimensions will have parameter value less than 1, which leads to trouble, as the value will tend to zero; when it's too high, the biggest parameters don't move at all, and the effect of moving some of its mass is lost.

I've considered implementing a parallel-tempering method but I'd like to keep it cleaner. Does anyone have a better method that's reasonably quick to run, rather than monkeying with each parameter individually?

Posted by Andrew C. Thomas at 6:00 AM

23 May 2006

Inheritance Laws

Jason Anastasopoulos, guest blogger

Question: Many political philosophers that focused on questions of property (including Plato) believed that equality of conditions were necessary for the development of a virtuous citizenry and virtuous leaders. The key to creating this equality of conditions, they argued, was the implementation of strict inheritance laws limiting the transfer of wealth from one generation to the next. Does anyone know of any quantitative models or empirical studies that examine the interaction between social stratification and inheritance laws? If you do, email me at

Posted by James Greiner at 6:00 AM

20 May 2006

It's summer!

It's the end of the term for both Harvard and MIT... so in view of the fact that we on the authors committee are about to embark on summers of tireless dedication to research while scattered to the far reaches of the planet, posting to this blog will be reduced until fall.

A special thanks to the loyal readers and commenters of this blog -- you folks have made this year a really rewarding experience for us. We won't stop posting, so do hope you still stop by occasionally and are still with us when we resume on a full schedule at the end of the summer.

Posted by Amy Perfors at 2:09 PM

18 May 2006

Reactions To The Virginity Pledge Study

Drew Thomas

Harvard School of Public Health doctoral candidate Janet Rosenbaum has been in the news lately, following the publication of her study of virginity pledges in the American Journal of Public Health, as well as her recent IQSS seminar. (Full disclosure: Janet is a friend of mine. I'll address her as Ms. Rosenbaum for this entry.) Since it's certainly a hot topic, it's no surprise how much attention her findings have received; first, the big news agencies picked it up, then the blogosphere took their shift - mainly over the "controversy" resulting from the study. (See for an example.)

But I think the more relevant part of the whole debate is the point Ms. Rosenbaum was trying to make about surveys and self-reporting: we use these data to make broad, sweeping conclusions on social phenomena, and while they are the best we have, they aren't up to the best standard we could achieve.

Posted by Andrew C. Thomas at 6:04 AM

16 May 2006

Communication, Anyone?

Jim Greiner

The course I co-taught this semester on Quantitative Social Science & Law has come to an end. There were a lot of “lessons learned” in the class, both for the students (at least, I hope so) and for the teaching staff (more definitely). Of all of these lessons, one sticks in my head: we ought to focus on teaching quantitative students how to communicate with folks without formal statistical training.

Some quantitative folks will graduate and spend the rest of their lives talking to and working with only quantitative people. Some, but not many. Most of us will be talking and working with people who have little or no statistics classes under their belts. But do we ever teach the communication skills needed to function effectively with the proles? I’ve never seen or heard of a class that focuses on these skills. Not one. Does that strike anyone besides me as odd?

Posted by James Greiner at 6:00 AM

15 May 2006

A bit of google frivolity

Amy Perfors

Google has just come out with a new tool, Google Trends, which compares the frequencies of different web searches and thus provides hours of entertainment to language and statistics geeks like myself. In honor of that -- and, okay, because it's nearing the end of the term and I'm just in the mood -- here's a rather frivolous post dedicated to the tireless folks at Google, for entertaining me today.

Some observations:

1) One thing that is interesting (though in hindsight not surprising) is that Google Trends seems like a decent tool for identifying how marked a form is. The basic idea is that a default term is unmarked (and often unsaid), but the marked term must be used in order to communicate that concept. For instance, in many sociological domains, "female" is marked more than "male" is -- hence people refer to "female Presidents" a lot more than they refer to "male Presidents", even though there are many more of the latter: the adjective "male" is unnecessary because it just feels redundant. In contrast, you much more often say "male nurse" than "female nurse", because masculinity is marked in the nursing context.

Anyway, I noticed that for many sets of words, the term that is searched for most often is the marked term, even though the unmarked term probably occurs numerically more often. For instance, Blacks, whites indicates far more queries for "blacks"; Gay, straight many more for "gay"; and Rich, poor, middle class the most for rich, followed by poor, and least of all middle class.

I have two hypotheses to explain this: (a) people generally google for information, and seek information about what they don't know; so it's not surprising that more people don't know about the non-default, usually numerically smaller item. And, (b) since unmarked means it doesn't need to be used, it's not really a surprise that people don't use it. Still, I thought it was interesting. And clearly this phenomenon, if real at all, is at most only one of many factors affecting query frequency: for instance, Christian, atheist, muslim indicates far more hits for "Christian", and those in very Christian areas.

2) Another observation: the first five numbers seem to have search frequencies that drop by half with each consecutive number. Is this interesting for cognitive reasons? I have no idea.

3) As far as I can tell, no search occurs more often than "sex." If anyone can find something with greater frequency, I'd love to hear it. On the one hand, it may say good things for our species that "love" beats out "hate", but that may just mean more people are searching for love than hate. And "war" beats out "peace", sadly enough.

4) "Hate bush" peaked right before the 2004 election, "love bush" about six months before that. I have no idea what that's all about.

5) It's amazing to me how many people clearly must use incredibly unspecific searches: who searches for "one"? Or "book"? Though there is no indication of numbers (a y axis on these graphs would be incredibly handy), a search needs a minimum number of queries otherwise it won't show up, so somebody must be making these.

6) In conclusion, I note that Harvard has more queries than MIT. Does this mean that MIT is the "default"? Or that Harvard generates more interest? Since I'm an MIT student but writing for a Harvard blog, I plead conflict of interest...

Posted by Amy Perfors at 6:00 AM

12 May 2006

Statistical Discrimination in Health Care

Sebastian Bauhoff

This blog has frequently written about testing for discrimination (see for example here, here, and here). This is also a hot issue in health care. In health care there is a case for 'rational' discrimination' where physicians respond to clinical uncertainty by relying on priors about the prevalence of diseases across racial groups (for example).

A paper by Balsa, McGuire and Meredith in 2005 lays out a very nice application of Bayes Rule to look into this question. The Institute of Medicine suggests that there are three types of discrimination: simple prejudice, stereotyping, and statistical discrimination where docs use probability theory to overcome uncertainty. The latter occurs when the uncertainty of a patients condition leads the physician to treat her differently from similar people of different race.

The paper uses Bayes Rule to conceptualize the decision a doctor has to make when hearing symptom reports from a patient and has to decide whether the patient really has the disease:

Pr(Disease | Symptom) = Pr(Symptom | Disease) * Pr(Disease) / Pr(Symptom)

A doc would decide differently if she believed that disease prevalence differs across racial groups (which affects Pr(Disease)), or if diagnostic signals are more noisy from some groups (which changes Pr(symptom)), maybe because the quality of doctor-patient communication differs across races.

The authors test their model on diagnosis data from family physicians and internists, and find that sensible priors about disease prevalance could explain racial differences in the diagnosis of hypertension and diabetes. For the diagnosis of depression there is evidence that differences in doctors' decisions may be driven by different communication patterns between white docs and their white vs. minority patients.

Obviously prejudice and stereotyping are different from statistical discriminiation, and have quite different policy implicatons. This is a really nice paper that makes these distinctions clear as well as nicely using Bayes Rule to conceptualize the issues. The general idea might also apply to other issues of policy including police stop and search.

Posted by Sebastian Bauhoff at 6:00 AM

10 May 2006

An Intoxicating Story

From Wikipedia's entry on the t-test:

The t-statistic was invented by William Sealy Gosset for cheaply monitoring the quality of beer brews. "Student" was his pen name. Gosset was statistician for Guinness brewery in Dublin, Ireland, hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge for applying biochemistry and statistics to Guinness's industrial processes. Gosset published the t-test in Biometrika in 1908, but was forced to use a pen name by his employer who regarded the fact that they were using statistics as a trade secret. In fact, Gosset's identity was unknown not only to fellow statisticians but to his employer - the company insisted on the pseudonym so that it could turn a blind eye to the breach of its rules. Today, it is more generally applied to the confidence that can be placed in judgements made from small samples.

I like the way they think.

Posted by Andrew C. Thomas at 6:00 AM

9 May 2006

Running Statistics On Multiple Processors

Jens Hainmueller

You just bought a state-of-the-art PC with dual processors and yet your model still runs forever? Well, your statistical software is probably not multi-threading, meaning that despite the fact that your computer actually has two processors, the whole computation runs only on one of them. Don’t believe me? Well check your CPU usage, it's probably stuck at 50 percent (or less).

You might ask why statistical software doesn't use both processors simultaneously. The fact is that splitting up computations to two or even more processors is a non-trivial issue that many software packages do not accomplish yet. This may change in the near future, however, as the advent of dual processors for regular PCs exhibits increasing pressure on statistical software producers to allow for multi-threading.

In fact, Stata Corp. has recently released Stata/MP, a new version of Stata/SE that runs on multiprocessor computers. Their website proclaims that: "Stata/MP provides the most extensive support for multiple-processor computers and dual-core computers of any statistics and data-management package." So this bodes well for Stata users.

What’s in it for Non-Stataists? People at S-PLUS told me yesterday that there is "currently an enhancement request to add functionality to S-PLUS that will allow it to use multiple processors. This request has been submitted to our developers for further review." Unfortunately no further information is available at this point.

In my favourite software R, there are efforts to get concurrency and potentially parallelism. Currently, the SNOW package allows for simple parallel computing.

It will be interesting to see how other statistical software producers like SAS, LIMDEP, etc. will react to this trend toward dual processing. Does anybody have more information about this issue?

Posted by Jens Hainmueller at 6:00 AM

8 May 2006

Coarsened at Random

Jim Greiner

I’m the “teaching fellow” (the “teaching assistant” everywhere but Harvard, which has to have its lovely little quirks: “Spring” semester beginning in February, anyone?) for a course in missing data this semester, and in a recent lecture, an interesting concept came up: coarsened at random.

Suppose you have a dataset in which you know or suspect that some of your data values are rounded. For example, ages of youngsters might be given to the nearest year or half-year. Or perhaps in a survey, you’ve gotten some respondents’ incomes only within certain ranges. Then the data has been “coarsened” in the sense that you know that the true value is within a certain range, but you don’t know where within that range.

Happily, techniques have been developed to handle this sort of situation. In many ways, the game is the same as that in the missing data setting. Just as in the missing data context good things happen when the data are missing at random, so also in this context good things happened when the data are coarsened at random. Thus, to begin with, you have to consider (among other things) whether you think the probability that you will observe only a range of possible data values, as opposed to the specific true value, depends on something you don’t observe (such as that specific true value). A good place to start on all this is Heitjan & Rubin, “Inference from Coarse Data via Multiple Imputation with Application to Age Heaping,” 85 JASA 410 (1990).

One final point: you might think that coarsened at random is a specific case of missing at random. Actually, it’s the other way around. Data can be (and often is assumed to be) coarsened at random but not missing at random. Think and you’ll see why.

Posted by James Greiner at 6:00 AM

4 May 2006

Detecting Attempted Election Theft

At the Midwest conference last week I saw Walter Mebane presenting his new paper entitled "Detecting Attempted Election Theft: Vote Counts, Voting Machines and Benford's Law." The paper is really fun to read and contains many cool ideas about how to statistically detect vote fraud in situations where only minimal information is available.

With the advent of voting machines that replace traditional paper ballots physically verifying vote counts becomes impossible. As Walter Mebane puts it: "To steal an election it is no longer necessary to toss boxes of ballots in the river, stuff the boxes with thousands of phony ballots, or hire vagrants to cast repeated illicit votes. All that may be needed nowadays is access to an input port and a few lines of computer code.?

How does Mebane utilize statistical tools to detect voting irregularities? He relies on two sets of tests:

The first test relies on Benford’s Law. The idea here is that if individual votes originate from a mix of at least two statistical distributions there may be a rationale to expect that the distribution of the digits in reported vote counts should satisfy the second digit Benford's law. Walter provides simulations showing that the Benford's Law test is sensitive to some kinds of manipulation of vote counts but not others.

The second set of tests relies on randomization. The idea is based on the assumption that in each precinct (especially crowded ones) voters may be randomly and independently assigned to each machine used in the precinct. The test involves checking whether the split of the votes is the same on all the machines used in a precinct. If some of the machines were indeed hacked, the distribution of the votes among candidates would differ on the affected machines. Mebane tests these expectations against data from three Florida counties with very interesting findings.

In general, the paper was very well received by the audience. Some attendees raised concerns about the randomization test, arguing that voters may not be randomly assigned to voting machines (for example old voters may be more likely to go to the first machine in line etc.). The discussant, Jonathan Wand, raised the idea of actually using random assignment of voters to voting machines as an administrative tool to facilitate fraud detection ex post. He also proposed to use sampling techniques to make recounts more feasible (but that would require voting machines that do leave a paper trail). Another comment alluded to the fact that if somebody smart wants to steal an election, he or she might anticipate some of Walter's tests and design manipulations so that they satisfy the test.

Overall, my impression is that although his research is admittedly still at an early stage, Mebane is onto something very cool here and I am eager to see the redrafts and more results in the future. This is a very important topic given that more and more voting machines will be used in the future. Everybody interested in the vote fraud should read this paper.

Posted by Jens Hainmueller at 6:00 AM

3 May 2006

Sensitivity Analysis

Felix Elwert

Observational studies, however well done, remain exposed to the problem of unobserved confounding. In response, methods of formal sensitivity analysis are growing in popularity these days (see Jens's post on a related issue here.)

Rosenbaum and Rubin's basic idea is to hypothesize the existence of an unobserved covariate, U, and then to recompute point-estimates and p-values for a range of associations between this unobserved covariate and, in turn, the treatment T and the outcome Y. If moderate associations (= moderate confounding) change the inference about the effect of the treatment on the outcome we question the robustness of our conclusions.

But how to assess whether the critical association between U, T, and Y that would invalidate the standard results is large in substantive terms?

One popular strategy compares this critical association to the strength of the association between T, Y, and an important known (and observed) confounder. For example, one might say that the amount of unobserved confounding it would take to invalidate the conclusions of a study on the effect of sibship size on educational achievement would have to be at least as large as the amount of confounding generated by omitting parental education from the model.

This is indeed the strategy used in a few studies. But what if U should be taken to stand not for a single but for a whole collection of unobserved confounders? Clearly, it then is no longer credible to compare the critical association of U with the amount of confounding created by a single known covariate. Better to compare it to a larger set of observed confounders. But with larger sets of included variables, we have the problem of interactions between them, and of surpressing and amplifying relationships. In short, gauging the critical association of U with T and Y in substantive terms will become a whole lot less intuitive.

(FYI, Robins and his colleagues in epi have proposed an alternative method of sensitivity analysis, which hasn’t found followers in the social sciences yet, to my knowledge. I’m currently working on implementing their method in one of my projects.)

Posted by Felix Elwert at 6:03 AM

2 May 2006

The 80% Rule, Part II

Jim Greiner

In my last post, I introduced the so-called 80% rule in employment discrimination cases. In this post, I discuss some of the reasons why it stinks. For the sake of illustration, pretend I’m interested in knowing whether a company discriminates against women in hiring, and recall that the 80% rule says that I should see whether the hiring rate for women is less than 80% of the hiring rate for men.

The first issue with the 80% rule is that it means different things depending on the hiring rate for men. Suppose 90% of men that apply for a job are hired. 80% of 90% is 72%, so the difference between men and women is 18%; that might seem like something worth investigating. But suppose the company at issue is very exclusive, so it only hires 5% of men who apply; 80% of 5% is 4%. Is this 1% difference something to worry about? Perhaps it is, perhaps it isn’t, but it sure is different from the 18% difference in the previous example.

A second issue with the 80% rule is that it varies depending on whether we’re talking about success rates or failure rates ("success" means getting hired here, "failure" means not getting hired). In one of my hypotheticals above, a company hired 90% of the men who applied. So the success rate is 90%, and the failure rate is 10%. If we apply the 80% rule to the success rate, we should worry if the hiring rate for women is below 72%. But what happens if we apply the reasoning of the rule to the failure rate for men? By analogy to the 80% rule’s reasoning, it seems like we should worry if the failure rate for women is greater than, say, 120% (100% + 20%), or perhaps 125% (1/.8 = 1.25), of the failure rate for men. Take the 125% for the sake of argument, and return to our hypothetical in which the failure rate for men was 10%. 125% of 10% is 12.5%, so we should worry if the failure rate for women is greater than 12.5%. But a failure rate for women of greater than 12.5% corresponds to a success rate for woment of less than 87.5%, and we just said that we’re supposed to worry if the success rate was less than 72%. So which is it, 87.5% or 72%?

A final criticism (for the purposes of this post; I could go on and on here): is any of this significant in the statistical sense? P-values, anyone? Significance tests? Posterior intervals? Anything at all?

Next time you hear someone applying the 80% rule in an employment discrimination case, invite the speaker join us on this planet.

Posted by James Greiner at 6:00 AM

1 May 2006

Applied Statistics - Ben Hansen

This week the Applied Statistics Workshop will present a talk by Ben Hansen, Assistant Professor of Statistics at the University of Michigan. Professor Hansen received his Ph.D. from the University of California at Berkeley and was an NSF Post-doctoral Fellow before joining the faculty at Michigan in 2003. His research interests include optimal matching and stratification, causal inference in comparative studies, and length-optimal exact confidence procedures. His work has appeared in JASA and the Journal of Computational and Graphical Statistics, among others.

Professor Hansen will present a talk entitled "Matching with prognosis scores: A new method of adjustment for comparative studies." The corresponding paper is available from the course website. The presentation will be at noon on Wednesday, May 3 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided. An abstract of the paper appears on the jump:

In one common route to causal inferences from observational data, the statistician builds a model to predict membership in treatment and control groups from pre-treatment variables, X, in order to obtain propensity scores, reductions f(X) of the covariate possessing certain favorable properties. The prediction of outcomes as a function of covariates, using control observations only, produces an alternate score, the prognosis score, with favorable properties of its own. As with propensity scores, stratification on the prognosis score brings to uncontrolled studies a concrete and desirable form of balance, a balance that is more familiar as an objective of experimental control. In parallel with the propensity score, prognosis scores reduce the dimension of the covariate; yet causal inferences conditional on them are as valid as are inferences conditional only on the unreduced covariate. They suggest themselves in certain studies for which propensity score adjustment is infeasible. Other settings call for a combination of prognosis and propensity scores; as compared to propensity scores alone, the pairing can be expected to reduce both the variance and bias of estimated treatment effects. Why have methodologists largely ignored the prognosis score, at a time of increasing popularity for propensity scores? The answer lies in part with older literature, in which a similar, somewhat atheoretical concept was first celebrated and then found to be flawed. Prognosis scores avoid this flaw, as emerges from theory presented herein.

Posted by Mike Kellermann at 9:43 AM