Seminar, Huiyan Sang, GS-BART: Graph Split Additive Decision Trees for Classification and Nonparametric Regression of Spatial and Network Data
Speaker: Huiyan Sang, Department of Statistics, Texas A&M University
Title: GS-BART: Graph Split Additive Decision Trees for Classification and Nonparametric Regression of Spatial and Network Data
Abstract: Ensemble decision tree methods such as XGBoost, RF, and BART have gained enormous popularity in data science for their superior performance in machine learning regression and classification tasks. In this paper, we develop a new Bayesian graph-split-based additive decision trees method, called GS-BART, to improve the performance of Bayesian additive decision trees for complex dependent data with graph relations. The new method adopts a highly flexible split rule complying with graph structure to relax the axis-parallel split rule assumption in most existing ensemble decision tree models. We design a scalable informed MCMC algorithm leveraging a gradient-based recursive algorithm on spanning trees or chains to sample the graph-split-based decision tree. The superior performance of the method over conventional ensemble tree models and gaussian process models is illustrated in various regression and classification tasks for spatial and network data analysis.
About the Speaker: Huiyan Sang is a professor and the director of the undergraduate program in statistics at Texas A&M University. She joined Texas A&M in 2008 as an assistant professor after earning her Ph.D. in Statistics from Duke University. Her research interests include the development of theory, methodology, and computation for spatial statistics, graph and network data analysis, Bayesian nonparametrics, machine learning methods, computational statistics, high-dimensional data analysis, and extreme values. Her interdisciplinary research work spans applications of statistics in environmental sciences, geosciences, urban and traffic planning, economics and business, biomedical research, and engineering.