Sampling based inference for logistic regression
Speaker:
Dr. HaiYing Wang
University of Connecticut, Department of Statistics
Sampling based inference for logistic regression
In this talk, we first introduce an Optimal Subsampling Method under the A-optimality Criterion (OSMAC) in the context of logistic regression, where the subsampling probabilities are derived to minimize the asymptotic mean squared error of the subsample estimator. For extremely imbalanced data (the number of 1's are significantly smaller than the number of 0's), the OSMAC estimator has a similar estimation efficiency compared with the full data estimator. We explain this phenomenon by deriving the asymptotic distribution of the full data maximum likelihood estimator (MLE), which shows that the asymptotic variance convergences to zero in a rate related to the number of 1's instead of full data sample size. This indicates that the amount of available information about unknown parameters in imbalanced data is limited even the full data size is large. Furthermore, we prove that a subsampling estimator may have identical asymptotic distribution to the full data MLE, while oversampling the 1's may result in estimation efficiency loss in addition to a higher computational cost. Lastly, we present an unweighted estimator with bias correction using OSMAC subsamples to improve the estimation efficiency. The unweighted estimator has a smaller asymptotic variance covariance matrix. Both sampling with replacement and Poisson sampling will be investigated.
Refreshments at 3:45pm in Snedecor 2101.