Survey Working Group: Yingchao Zhou, Contextual multi-armed bandit problems with harm effects using varying-coefficient models
Speaker: Yingchao Zhou, Graduate Student, Iowa State University
Title: Contextual multi-armed bandit problems with harm effects using varying-coefficient models
Abstract: Multi armed bandit (MAB) is a framework of algorithms for sequential decision making with wide applications in recommender systems, dynamic pricing, etc. Many proposals to use MAB in clinical trials have been made in recent years. Inspired by dose-finding clinical trials, we consider a MAB problem with two sets of outcomes, reward and harm, whose distributions are modeled by varying coefficient models. A scalarized regret is formulated to balance the two objectives of maximizing reward and controlling harm. In this project, the epsilon greedy algorithm is adopted to solve the bandit problem. We also consider using optimal design of experiment at the exploration step to help with the parameter estimation. Simulation study has been carried out to evaluate the performance of proposed algorithms.