25 April 2011
We hope that you can join us for the final Applied Statistics Workshop of the year this Wednesday, April 27th when we will be happy to have Benjamin Lauderdale, currently a College Fellow in the Department of Government, Harvard University and soon to be at the London School of Economics. You will find an abstract below. As always, we will serve a light lunch and the talk will begin around 12:15p.
“There Are Many Median Justices on the Supreme Court” Benjamin Lauderdale Department of Government, Harvard University CGIS K354 (1737 Cambridge St.) Wednesday, April 27th, 2011 12 noon
While unidimensional preference estimates for the U.S. Supreme Court exist in both constant and time-varying forms, estimating variation in preferences across areas of the law has been difficult because multidimensional scaling models perform poorly with only nine voters. We introduce a new approach to recovering estimates of judicial preferences that are localized to particular legal issues as well as periods of time. Using expert issue area codes and majority opinion citations to identify the strength of substantive relationships between cases, we apply a kernel-weighted optimal classification estimator to analyze how justices’ preference vary across both areas of the law and time. Allowing for issue-variation in preferences improves the predictive power of estimated preference orderings more than allowing for time-variation. We find substantial variation in the identity of the median justice across areas of the law during most periods of the modern court, suggesting a need to reconsider empirical and theoretical research that hinges on the existence of a unitary and well-identified median justice.
18 April 2011
We hope that you can join us for the penultimate Applied Statistics Workshop of the year this Wednesday, April 20th. This week we are extremely excited to have Jeffrey Lewis, Associate Professor of Political Science at UCLA, presenting on the compactness of congressional districts, a topic that involves some interesting econometric issues as well as a large GiS component. Note that this is a change from what is on the schedule. As usual, we will start the workshop at 12 noon with a light lunch and begin the talk at 12:15. We wrap up the workshop at 1:30pm.
“A study of Congressional district compactness, 1789-2011”
Jeffrey B. Lewis
Department of Political Science, UCLA
CGIS K354 (1737 Cambridge St)
Wednesday, April 20th, 12 noon.
13 April 2011
Whenever I do a little data cleaning with a scripting language, I always find myself struggling with regular expressions. Now a new site, txt2re, allows you to figure out the regular expression you want from some sample text:
This system acts as a regular expression generator. Instead of trying to build the regular expression, you start off with the string that you want to search. You paste this into the site, click submit and the site finds recognisable patterns in your string. You then select the patterns that you are interested in and it writes a fully fledged program that extracts those patterns from that string. You then copy the program into your editor or IDE and play with it to integrate it into your program.
Statistical Science has a new issue out dedicated to the EM Algorithm, entitled “Celebrating the EM Algorithm’s Quandunciacentennial”. David Van Dyk and Xiao-Li Meng are the guest editors. Here is the abstract from their (awesome looking) contribution, “Cross-Fertilizing Strategies for Better EM Mountain Climbing and DA Field Exploration: A Graphical Guide Book”:
In recent years, a variety of extensions and refinements have been developed for data augmentation based model fitting routines. These developments aim to extend the application, improve the speed and/or simplify the implementation of data augmentation methods, such as the deterministic EM algorithm for mode finding and stochastic Gibbs sampler and other auxiliary-variable based methods for posterior sampling. In this overview article we graphically illustrate and compare a number of these extensions, all of which aim to maintain the simplicity and computation stability of their predecessors. We particularly emphasize the usefulness of identifying similarities between the deterministic and stochastic counterparts as we seek more efficient computational strategies. We also demonstrate the applicability of data augmentation methods for handling complex models with highly hierarchical structure, using a high-energy high-resolution spectral imaging model for data from satellite telescopes, such as the Chandra X-ray Observatory.
You can find most of the papers on arXiv, using a simple search. A quick Google search for “Quandunciacentennial” yields no hits. Any one know the etymology there? Some reference to 34 that I’m missing?
11 April 2011
We hope that you can join us for the Applied Statistics Workshop this Wednesday, April 13th, 2011 when we will be happy to finally have Patrick Perry from the Statistics and Information Sciences Laboratory. This is a talk rescheduled from earlier in the term when the weather was much worse. You will find an abstract for the paper. As always, we will serve a light lunch and the talk will begin around 12:15p.
“Point process modeling for directed interaction networks”
Statistics and Information Sciences Laboratory
CGIS K354 (1737 Cambridge St.)
Wednesday, April 13th, 2011 12 noon
Network data often take the form of repeated interactions between senders and receivers tabulated over time. Rather than reducing these data to binary ties, a model is introduced for treating directed interactions as a multivariate point process: a Cox multiplicative intensity model using covariates that depend on the history of the process. Consistency and asymptotic normality are proved for the resulting partial-likelihood-based estimators under suitable regularity conditions, and an efficient fitting procedure is described. Multicast interactions—those involving a single sender but multiple receivers—are treated explicitly. A motivating data example shows the effects of reciprocation and group-level preferences on message sending behavior in a corporate e-mail network.
7 April 2011
We study the dynamics of public media attention by monitoring the content of online blogs. Social and media events can be traced by the propagation of word frequencies of related keywords. Media events are classified as exogenous - where blogging activity is triggered by an external news item - or endogenous where word frequencies build up within a blogging community without external influences. We show that word occurrences show statistical similarities to earthquakes. The size distribution of media events follows a Gutenberg-Richter law, the dynamics of media attention before and after the media event follows Omori’s law. We present further empirical evidence that for media events of endogenous origin the overall public reception of the event is correlated with the behavior of word frequencies at the beginning of the event, and is to a certain degree predictable. These results may imply that the process of opinion formation in a human society might be related to effects known from excitable media.
4 April 2011
We are really excited about this week’s Applied Statistics Workshop this Wednesday, April 4th, 2011 when we will be happy to have Kaisey Mandel from the Harvard-Smithsonian Center for Astrophysics. Kaisey will be presenting on hierarchical Bayesian models in Astrophysics. This will be a great chance to see how the statistical methods that we use transport to other disciplines around the sciences. No prior knowledge of astrophysics required! As always, we will serve a light lunch and the talk will begin around 12:15p.
“Hierarchical Bayesian Models for Type Ia Supernova Light Curves, Dust, and Cosmic Distances”
Harvard-Smithsonian Center for Astrophysics
CGIS K354 (1737 Cambridge St.)
Wednesday, April 4th, 2011 12 noon
Type Ia supernovae (SN Ia) are the most precise cosmological distance indicators and are important for measuring the acceleration of the Universe and the properties of dark energy. To obtain the best distance estimates, the photometric time series (apparent light curves) of SN Ia at multiple wavelengths must be properly modeled. The observed data result from multiple random and uncertain effects, such as measurement error, host galaxy dust extinction and reddening, peculiar velocities, and distances. Furthermore, the intrinsic, absolute light curves of SN Ia differ between individual events: different SN Ia have different intrinsic luminosities, colors and light curve shapes, and these properties are correlated in the population. A hierarchical Bayesian model provides a natural statistical framework for coherently accounting for these multiple random effects while fitting individual SN Ia and the population distribution. I will discuss the application of this statistical model to optical and near-infrared data for computing inferences about the dust, distances and intrinsic covariance structure of SN Ia. Using this model, I demonstrate that the combination of optical and NIR data improves the precision of SN Ia distance predictions by about a factor of 2 compared to using optical data alone. Finally, I will discuss some open research problems concerning statistical analysis of supernova data and their application to cosmology.