Sign-Ons

Seminars: Dept Seminar


Mixture Model Component Trees: Finding and Visualizing Hierarchical Component Structure (with Applications in Cognitive Diagnosis)

Date: Monday, October 12
Time: 4:10 pm -- 5:00 pm
Place:
Speaker: Rebecca Nugent, Department of Statistics, Carnegie Mellon U., Pittsburgh, PA

Abstract:

One of the most commonly used parametric clustering methods - model-based clustering - assumes that continuous data (possibly after a transformation) comes from a mixture of Gaussian components. The common implicit assumption is that once the best such mixture has been chosen to fit the data, each mixture component is a cluster estimating an underlying (sub-population) group. Clearly there will be issues with such an assumption if the underlying groups do not have Gaussian distributions.  While the mixture will still fit the data well, it is likely that if the true underlying groups are non-symmetric, skewed, heavy-tailed, curvilinear or if there are outliers then the number of components in the model will overestimate the number of groups. We look at using hierarchical clustering methods based on a distance defined by the estimated mixture to create a dendrogram with components as leaves - a component cluster tree. This tool can be used to identify sub-mixtures of combinations of components that will better estimate the underlying groups.  One application area is in cognitive diagnosis where current models are unable to estimate latent skill profiles for high-dimensional data; component trees summarize the group structure in the data (if any) quickly and more flexibly.

 

Joint work with Nema Dean, Department of Statistics, University of Glasgow and Beth Ayers, Graduate School of Education, UC Berkeley