LOCATION: 319 Snedecor
SPEAKER:
Jerry Reiter, Duke University, Institute of Statistics and
Decision Sciences, Durham, NC
TITLE:
Disclosure Limitation Via Multiply-Imputed, Synthetic Data Sets
ABSTRACT:
| Several statistical
agencies use, or are considering the use of, multiple imputation to limit the risk of disclosing respondents' identities or sensitive attributes in public use data files. For example, agencies can release fully synthetic datasets, comprising random samples of units from the sampling frame with simulated values of survey variables. Or, agencies can release partially synthetic datasets, comprising the units originally surveyed with some collected values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple imputations. In this talk, I describe these synthetic data approaches. I discuss methods for obtaining valid inferences from the synthetic data sets based on the concepts of multiple imputation for missing data. COFFEE: 3:45 p.m., 104 Snedecor Hall |