December 2006
Sun Mon Tue Wed Thu Fri Sat
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31

### Authors' Committee

#### Chair:

Matt Blackwell (Gov)

#### Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

### Blogroll

SMR Blog
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

### Archives

Movable Type 4.24-en

12 December 2006

### Better Way To Make Cumulative Comparisons With Small Samples?

On July 15, 1971 the research vessel Lev Berg set sail from Aralsk (Kazakhstan) to survey the Aral Sea, then the 4th largest freshwater lake in the world. The Soviet Union had been steadily draining the Aral for agricultural purposes since the 1950s and the Lev Berg was to measure the ecological damage. This trip included passing by the island Vozrozhdeniye on the South side.

(Image Source: "The 1971 Smallpox Epidemic in Aralsk, Kazakhstan, and the Soviet Biological Warfare Program." Center for Nonproliferation Studies Occasional Paper No. 9, Jonathan B. Tucker and Raymand A. Zilinskas.)

Vozrozhdeniye was an ideal site for the main Soviet bioweapons field testing because itwas in a remote area, easily secured as an island, and had reliable winds from the Northto the South allowing safe'' testing and housing on the North end. The site was active from 1936 until 1990 when Yeltsin publicly denounced the program and
had it shut down. This is despite the Soviet Union having signed the 1972 Biological and Toxin Weapons Convention outlawing such research. Shortly after the Lev Berg returned to Aralsk, there was an unusual outbreak of smallpox there, starting with a young researcher who had been onboard. The following is the best
epidemiological data available:

Comparison Case: in 1972 a Muslim man from Kosovo went on a pilgrimage to Mecca, returning through Baghdad where he was infected with smallpox. This was the first reported smallpox case in Kosovo since 1930 and it apparently went undiagnosed for six weeks producing 175 cases and 35 deaths. A good comparison since rates of vaccination were similar as were socio-economic conditions.

Kaplan-Meier graph with time-to-event = onset of illness:

(Image Source: Ibid.)

Key difference: all three Aralsk deaths were from hemorrhagic smallpox and only five in Kosovo were. The baseline for naturally occurring smallpox: Rao's study in Madras, India had 10,857 cases with only 240 hemorrhagic. Only two possible explanations seem to remain for the differences:
- host conditions (nutrition, genetic resistance, environment) differ greatly.
- Aralsk strain was an unusual type.
Obviously, it would be nice to claim strong evidence that the Soviet case resulted from escaped smallpox. We know the extent of the bioweapons program from Yeltsin's opening of the files, but not the responsibility of this dissemination with 100% certainty.

This is just a motivating (and interesting) example; the real question is about testing really small samples, when exact inference doesn't seem appropriate. So what other approaches would readers suggest for making comparisons with these types of cumulative data besides simple Kaplan-Meier comparisons? Obviously typical
correlational analysis won't work (polychoric, multichoric, etc.) and standard tabular approaches are not going to be effective either.

Posted by Jeff Gill at 2:48 PM

### Naming Conventions

This discussion came up yesterday in the Bayes course. There is a plethora of names for multilevel models. Sociologists seem to prefer "hierarchical," many statisticians say "mixed effects," and there is heterogeneity about usage in economics. It seems reasonable to standardize, but this is unlikely to happen. Maybe the most common comes from the following. Given two data matrices, x_{ij} for individual i in cluster j, and z_j for cluster j, there are perhaps four canonical models:

"Pooled:" y_{ij} = \alpha + x_{ij}'\beta + z_j'\gamma + e_{ij}

"Fixed Effect:" y_{ij} = \alpha_j + x_{ij}'\beta + e_{ij}

"Random Effect:" y_{ij} = \alpha_j + x_{ij}'\beta + z_j'\gamma + e_{ij}

"Random Intercept and Random Slope:" y_{ij} = \alpha_j + x_{ij}'\beta_j + z_j'\gamma + e_{ij}

Some prefer "random intercepts" for "fixed effects" and perhaps we can consider these all to be members of a larger family where indices are turned-on turned-off systematically. On the other hand maybe it's just terminology and not worth worrying about too much. Thoughts?

Posted by Jeff Gill at 10:23 AM