PhD Seminar: Yan Wang, "Stability of Random Forests and Coverage of Random-Forest Prediction Intervals"

PhD Seminar: Yan Wang, "Stability of Random Forests and Coverage of Random-Forest Prediction Intervals"

Jun 13, 2023 - 2:00 PM
to Jun 13, 2023 - 3:00 PM

Speaker: Yan Wang, PhD Candidate, Department of Statistics, Iowa State University

Title: Stability of Random Forests and Coverage of Random-Forest Prediction Intervals

Abstract: We investigate stability of random forests under the mild condition that the squared response ($Y^2$) is light-tailed. In particular, our analysis holds for the practical version of random forests that is implemented in popular packages like \texttt{randomForest} in \texttt{R}. Empirical results show that stability may persist even beyond the light-tail assumption and hold for heavy-tailed $Y^2$. Next, using the stability property, we prove a non-asymptotic lower bound for the coverage probability of prediction intervals constructed from the out-of-bag error of random forests. With another mild condition that is typically satisfied when $Y$ is continuous, we also establish a complementary upper bound, which can be similarly established for the jackknife prediction interval constructed from an arbitrary stable algorithm. We also discuss the asymptotic coverage probability under assumptions weaker than those considered in previous literature. Our work implies that, with its stability property, random forests is an effective machine learning method that can provide not only satisfactory point prediction but also justified interval prediction at almost no extra computational cost.