Skip to main content

Design-unbiased statistical learning in survey sampling - Li-Chun Zhang

Feb 22, 2021 - 11:00 AM
to , -

Li-Chun Zhang HeadshotDr. Li-Chun Zhang

University of Southampton

Design-unbiased statistical learning in survey sampling

A basic problem with supervised machine learning (ML) is that one needs to be able to ‘extrapolate’ the model learned from the available sample to the out-of-sample units, in order for supervised learning to have any value at all. No matter how it is organized within the sample, one cannot ensure valid learning for out-of-sample units, unless the sample is selected from the entire reference set of units (i.e. the population) in some controlled manner. This well-known problem in statistical inference is sometimes recast as the problem of concept drift in the ML literature.

We develop a subsampling Rao-Blackwell method. Under the combined probability sampling-subsampling (pq-design), exactly pq-unbiased estimation can be achieved at the population level using any chosen ML technique. Our approach makes use of three classic ideas from ML and Statistical Science: the training-test split of the sample, Rao-Blackwellisation and model-assisted sampling estimation.