Pseudo Maximum Likelihood Based Methods for Mean and Covariance Structure Analysis with Missing Data
by
Ke-Hai Yuan, Department of Psychology, UCLA
 
Abstract:

Survey and longitudinal studies in the social and behavioral sciences generally contain missing data. Mean and covariance structure models play an important role in analyzing such data. Assuming ignorable nonresponse, two promising methods for dealing with missing data are a direct maximum likelihood and a two-stage approach based on the unstructured mean and covariance estimates obtained by the EM algorithm. The statistical theory on which these methods are based involves a multivariate normality assumption. However, data sets in social and behavioral sciences are seldom normal and the effect of this misspecification on the two methods is not so clear. We study inference problems associated with applications of these methods to typical nonnormal data sets. Based on pseudo maximum likelihood theory, a way to obtain consistent standard errors of the two-stage estimates is given. The asymptotic efficiencies of different estimators are compared under various assumptions. We also propose a minimum chi-square approach and show that the estimator Dbased estimators for either normal data or nonnormal data. When compared to the two-stage estimator, the direct maximum likelihood estimator may lose its advantage when the underlying distribution is nonnormal. The major contribution of this paper is that for each estimator, we give a test statistic whose asymptotic distribution is chi-square regardless of the underlying distribution. We also give a characterization for each of the two likelihood ratio test statistics when the underlying distribution is nonnormal. Modifications to the likelihood ratio statistics are also given, whose distributions are better approximated by the commonly used chi-square distributions. Examples demonstrate the importance of correct inference procedures in recovering an underlying model structure. The relevance of some of our results to other areas of multivariate analysis with missing data is also discussed.