Ph.D. Seminar: Yichuan Bai, A Graph-based Approach to Estimating the Number of Clusters

Ph.D. Seminar: Yichuan Bai, A Graph-based Approach to Estimating the Number of Clusters

Nov 21, 2024 - 2:00 PM
to Nov 21, 2024 - 2:50 PM

Speaker:  Yichuan Bai, PhD Candidate, Department of Statistics, Iowa State University

Title:  A Graph-based Approach to Estimating the Number of Clusters

Abstract: Clustering is a fundamental unsupervised learning technique and a critical component of many statistics and machine learning pipelines. Many clustering approaches require the number of groups k to be pre-specified, which can be challenging in the absence of knowledge about the true number of groups. We consider the problem of estimating the number of clusters in a dataset, and propose a non-parametric approach to the problem that utilizes similarity graphs to construct a robust statistic that effectively captures similarity information among observations. This graph-based statistic is applicable to datasets of any dimension, is computationally efficient to obtain, and can be paired with any kind of clustering technique. Asymptotic theory is developed to establish the selection consistency of the proposed approach. Simulation studies demonstrate that the graph-based statistic outperforms existing methods for estimating the number of clusters, especially in the high-dimensional setting. We illustrate its utility on an imaging dataset and an RNA-seq dataset.