Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|

1 | 2 | 3 | ||||

4 | 5 | 6 | 7 | 8 | 9 | 10 |

11 | 12 | 13 | 14 | 15 | 16 | 17 |

18 | 19 | 20 | 21 | 22 | 23 | 24 |

25 | 26 | 27 | 28 | 29 | 30 | 31 |

Kevin Bartz (Stats)

Deirdre Bloome (Social Policy)

John Graves (HealthPol)

Rich Nielsen (Gov)

Maya Sen (Gov)

Gary King (Gov)

- Re: App Stats: Glynn on "Using Post-Treatment Variables to Establish Upper Bounds on Causal Effects: Assessing Executive Selection Procedures in New Democracies"

By: Office Furniture Warehouse - Re: App Stats: Bahar on "International Knowledge Diffusion and the Comparative Advantage of Nations"

By: Christa Herzog - Re: App Stats: Bahar on "International Knowledge Diffusion and the Comparative Advantage of Nations"

By: Frigler.de - Re: App Stats: Bahar on "International Knowledge Diffusion and the Comparative Advantage of Nations"

By: sri - Re: App Stats: Bahar on "International Knowledge Diffusion and the Comparative Advantage of Nations"

By: sri - Re: App Stats: Bahar on "International Knowledge Diffusion and the Comparative Advantage of Nations"

By: W7midia - Re: App Stats: Titiunik on "Using Regression Discontinuity to Uncover the Personal Incumbency Advantage"

By: SEO Services Canada - Re: App Stats: Yamamoto on "A Multinomial Response Model for Varying Choice Sets, with Application to Partially Contested Multiparty Elections"

By: Robert Rowshan - Re: App Stats: Yamamoto on "A Multinomial Response Model for Varying Choice Sets, with Application to Partially Contested Multiparty Elections"

By: KLLD - Re: App Stats: Pfister on "Visual Computing in Biology"

By: Biology education

Brad DeLong

Cognitive Daily

Complexity & Social Networks

Developing Intelligence

EconLog

The Education Wonks

Empirical Legal Studies

Free Exchange

Freakonomics

Health Care Economist

Junk Charts

Language Log

Law & Econ Prof Blog

Machine Learning (Theory)

Marginal Revolution

Mixing Memory

Mystery Pollster

New Economist

Political Arithmetik

Political Science Methods

Pure Pedantry

Science & Law Blog

Simon Jackman

Social Science++

Statistical modeling, causal inference, and social science

Enter e-mail to [un]subscribe:

« October 21, 2009 | Main | October 26, 2009 »

**23 October 2009**

During a recent conversation with some colleagues regarding data sources, an interesting point was made that left me pondering. One member of our group stated that he would not trust a particular source of data to provide useful estimates of population means, but he would trust it to estimate regression coefficients. This puzzled me, because a regression coefficient is a (perhaps slightly fancy) version of a mean. Why, then, would a data source that cannot be trusted for a simple average be useful for a coefficient?

I think the answer lies in the assumed source of randomness. When we make inferences from our sample data to a wider universe of cases, there are two sources of randomness involved: probabilities introduced through the sampling design and probabilities introduced through an assumed stochastic model underlying our observed data. In the first case, we are interested in the existing finite population and our outcome of interest Y is regarded as fixed; randomness is introduced through the sample inclusion probabilities. In the second case, we are interested in a broader "superpopulation" which we posit is generated through some random process, and thus our outcome Y is regarded as a random variable. In much of social science, researchers are interested in this second source of randomness. Hypotheses center around parameters associated with the probability distribution for Y - such as regression coefficients.

Identifying the sources of randomness underlying our data is important, because they have implications for our analysis. SÃ¤rndal, Swensson, and Wretman show that the variance of a parameter from a ordinary regression model estimated using sample data can be decomposed into two elements, one based on the sampling design and one based on the model. In the case of a census, the extra variance introduced from the design is zero, and thus the total variance of the estimated parameter is the variance of the "BLUE" estimator. Otherwise, accounting for the sampling design in the analysis should improve inference.

Posted by Deirdre Bloome at 5:20 PM