4 April 2008
Here are the results of the Pennsylvania Democratic primary, with Obama counties in purple and Clinton counties in Orange.
What, you say? The Pennsylvania primary hasn't happened yet? You're right. Enter statistics!
Consider this scatterplot of Kerry's 2004 vote share versus Obama's 2008 vote shares in Ohio counties. The result is something I call the Kerry-Obama smile: Obama does well in Kerry's best counties, where staunchly Democratic urban blacks are concentrated; and in Kerry's worst regions, presumably due to Obama's appeal to crossover Republicans. Clinton does best in the wide middle swath.
This motivates a very simple modeling idea: fit a curve to the scatterplot. Obviously, a quadratic in Kerry's share looks like a decent fit. That gives us the best-fit line shown on the plot. The R-squared is 0.16, representing an okay fit.
The next step is utterly useless, but utterly fun. We can use Ohio to predict Pennsylvania. In other words, given that we know how Kerry did in Pennsylvania counties in 2004, we can predict how well Obama will do in 2008 in every Pennsylvania county. Note that I first tweaked the model's intercept slightly in Obama's favor, so that the aggregate prediction matches the current polling average (showing Clinton up by 6.6%).
The bad news for Obama is that nearly all of Pennsylvania's counties fall in the middle of the smile. The image below compares Kerry in 2004 to the model's predictions for Obama in 2008. Obama is predicted to carry Philadelphia overwhelmingly, and to do well in some of the curvy, heavily Republican counties in the south-center of the state. Everywhere else, though, is Clinton country.