Skip to main content

PhD Seminar: Yuchen Wang

Nov 17, 2022 - 3:00 PM
to Nov 17, 2022 - 4:00 PM

Speaker: Yuchen Wang, PhD Candidate, Department of Statistics, Iowa State University

Title: Matching Methods for Reducing Data Imbalance

Abstract: Causal Inference allows researchers to draw cause-and-effect conclusions for experimental and observational data. However, when data arise in observational settings, causal inference becomes more complex requiring different assumptions and data analysis strategies to yield valid conclusions. In the absence of controlled and randomized treatment assignment, matching methods are often used to increase the similarity in the joint and marginal covariate distributions between a treated and an untreated sample to reduce the difference in the observed response of interest due to differences in covariates.

Many matching methods have been proposed such as Propensity Score Matching or matching based on the Mahalanobis distance. More recently, Coarsened Exact Matching (CEM), Iacus et al., 2012, has risen in popularity because of its guaranteed ability to reduce group imbalance and model dependence. We propose two novel improvements over existing methods, Clustered Propensity Score Matching and Optimized Coarsened Exact Matching. Clustered Propensity Score Matching begins with clustering individuals into clusters based on a set of available covariates. Treated and untreated individuals can then only be matched when they are members of the same cluster ensuring closeness not only in propensity sores but also in covariates. Optimized Coarsened Exact Matching, a re-weighted matching method, is guaranteed to improve balance upon dropping observations and is also applicable in high-dimensional settings when other commonly used matching methods such as CEM can fail.

In this talk, we introduce both methods and we assess the performance of each method compared to existing methods with respect to balance, model dependence, and the bias in the estimated average treatment effect using numerical studies and data examples.