Speaker: Seho Park

Abstract:

Survey data integration using mass imputation

Survey data integration combining information from multiple sources is an important prac-

tical problem in survey sampling. Data integration can be viewed as a missing data problem

and we propose mass imputation approach for data integration. By lling in the missing

values for the study variable in one sample with imputed values incorporating information

from the other sample, we obtain an improved estimator integrating information from two

samples.

Three specic setups are considered in this presentation. The rst setup is the classical

two-phase sampling where the second-phase sample is an outcome-dependent probability

sample from the rst-phase sample. The second one is a non-nested two-phase sampling

where the second-phase sample is not necessarily a probability sample. The third one is

combining two independent samples with different measurements from the same target pop-

ulation. Since the measurements are different for the same concepts, measurement errors

can exist.

For the rst two setups, we propose a regression imputation estimator for mass imputation

where the regression coeffcients are estimated from the second-phase sample. For the third

setup, we propose a survey data integration method using measurement error models. An

application of the technique to the Food and Nutrition Technical Assistance III Project is

presented.