Does Multiple Imputation Make Up Data?

This is a common question, commonly misunderstood. It certainly does seem like MI makes up data, since if you look at the 5 or so imputed data sets, the missing values are indeed filled in. But in fact, the point of MI has nothing to do with making up data, and everything to do merely with putting the data in a more convenient format.

The fact is that the vast majority of our statistical techniques require rectangular data sets, and so data that look like swiss cheese make it really hard to do anything sensible with directly. Listwise deletion, where you excise horizontal slices out of the cheese wherever you see holes, discards a lot of cheese! What MI does instead is to fill in the holes in the data using all available information from the rest of the data set (thus moving some information around) and adding uncertainty to these imputations in the form of variation in the values across the different imputed data sets (thus taking back assertions of knowledge from the imputations when it is not predictable from the rest of the data and from duplication of the same information in different places in the data). If done properly, MI merely puts the data in a convenient rectangular format and enables the user (with some simple combining rules) to apply statsitical techniques to data acting as if it were fully observed. MI standard errors then are not too small, which would be the case if data were being made up.

The particular models for imputation can be used incorrectly or inappropriately (and so should be used with priors when additional information is available; see e.g., "What to do About Missing Values in Time Series Cross-Section Data"), but proper usage of MI makes up no information other than that genuinely available.

Posted by Gary King at May 12, 2007 4:01 PM