Seminar: Sameer Deshpande, UW-Madison, "A new BART prior for structured categorical inputs"

Seminar: Sameer Deshpande, UW-Madison, "A new BART prior for structured categorical inputs"

Sep 19, 2022 - 11:00 AM
to , -

Speaker: Sameer Deshpande, UW-Madison, "A new BART prior for structured categorical inputs"

Abstract: Default implementations of Bayesian Additive Regression Trees (BART) represent categorical predictors using several binary indicators, one for each level of each categorical predictor. Regression trees built with these indicators partition the levels using a ``remove one a time strategy.'' Unfortunately, most partitions of the levels cannot be built with this strategy, meaning that default implementations of BART are limited in their ability to ``borrow strength'' across groups of levels.

We overcome this limitation with a prior for a new class of regression trees that can send multiple levels of a categorical variable to each child of a decision node in a regression tree. Our prior corresponds to a partitioning process that can respect a priori preferences to co-cluster certain levels of a structured categorical variable. In spatiotemporal applications, such variables are frequently used to encode membership in spatial units like census tracts or counties. In these applications, our new regression trees induce contiguous partitions of the spatial units. Our new prior often yields improved out-of-sample predictive performance without much additional computational burden. We demonstrate our new prior using examples from baseball and the spatiotemporal modeling of crime.