Title: The Extremes of Interpretability in Machine Learning: Sparse Decision Trees, Scoring Systems and Interpretable Neural Networks
Abstract: With widespread use of machine learning, there have been serious societal consequences from using black box models for high-stakes decisions, including flawed bail and parole decisions in criminal justice, flawed models in healthcare, and black box loan decisions in finance. Transparency and interpretability of machine learning models is critical in high stakes decisions. In this talk, I will focus on two of the most fundamental and important problems in the field of interpretable machine learning: optimal sparse decision trees and optimal scoring systems. I will also briefly describe work on interpretable neural networks for computer vision.
Optimal sparse decision trees: We want to find trees that maximize accuracy and minimize the number of leaves in the tree (sparsity). This is an NP hard optimization problem with no polynomial time approximation. I will present the first practical algorithm for solving this problem, which uses a highly customized dynamic-programming-with-bounds procedure, computational reuse, specialized data structures, analytical bounds, and bit-vector computations.
Optimal scoring systems: Scoring systems are sparse linear models with integer coefficients. Traditionally, scoring systems have been designed using manual feature elimination on logistic regression models, with a post-processing step where coefficients have been rounded. However, this process can fail badly to produce optimal (or near optimal) solutions. I will present a novel cutting plane method for producing scoring systems from data. The solutions are globally optimal according to the logistic loss, regularized by the number of terms (sparsity), with coefficients constrained to be integers. Predictive models from our algorithm have been used for many medical and criminal justice applications, including in intensive care units in hospitals.
Interpretable neural networks for computer vision: We have developed a neural network that performs case-based reasoning. It aims to explains its reasoning process in a way that humans can understand, even for complex classification tasks such as bird identification.
Bio: Dr. Cynthia Rudin is a professor of computer science, electrical and computer engineering, and statistical science at Duke University, and directs the Prediction Analysis Lab, whose main focus is in interpretable machine learning. She is also an associate director of the Statistical and Applied Mathematical Sciences Institute (SAMSI). Previously, Prof. Rudin held positions at MIT, Columbia, and NYU. She holds an undergraduate degree from the University at Buffalo, and a PhD from Princeton University. She is a three time winner of the INFORMS Innovative Applications in Analytics Award, was named as one of the “Top 40 Under 40” by Poets and Quants in 2015, and was named by Businessinsider.com as one of the 12 most impressive professors at MIT in 2015. She is past chair of both the INFORMS Data Mining Section and the Statistical Learning and Data Science section of the American Statistical Association. She has also served on committees for DARPA, the National Institute of Justice, and AAAI. She has served on three committees for the National Academies of Sciences, Engineering and Medicine, including the Committee on Applied and Theoretical Statistics, the Committee on Law and Justice, and the Committee on Analytic Research Foundations for the Next-Generation Electric Grid. She is a fellow of the American Statistical Association and a fellow of the Institute of Mathematical Statistics. She gave a Thomas Langford Lecturer at Duke University during the 2019-2020 academic year. She has given keynote/invited talks at several conferences including KDD (twice), AISTATS, CODE, MLHC, FAT-ML, DSAA, and ECML-PKDD.