Files

Abstract

This paper compares methods to remedy missing value problems in survey data. The commonly used methods to deal with this issue are to delete observations that have missing values (case-deletion), replace missing values with sample mean (mean imputation), and substitute a fitted value from auxiliary regression (regression imputation). These methods are easy to implement but have potentially serious drawbacks such as bias and inefficiency. In addition, these methods treat imputed values as known so that they ignore the uncertainty due to 'missingness', which can result in underestimating the standard errors. An alternative method is Multiple Imputation (MI). In this paper, Expectation Maximization (EM) and Data Augmentation (DA) are used to create multiple complete datasets, each with different imputed values due to random draws. EM is essentially maximum-likelihood estimation, utilizing the interdependency between missing values and model parameters. DA estimates the distribution of missing values given the observed data and the model parameters through Markov Chain Monte Carlo (MCMC). These multiple datasets are subsequently combined into a single imputation, incorporating the uncertainty due to the missingness. Results from the Monte Carlo experiment using pseudo data show that MI is superior to other methods for the problem posed here.

Details

PDF

Statistics

from
to
Export
Download Full History