Will the Real Steve Fienberg Please Stand Up: A Bayesian Approach to Graphical Record Linkage*

Will the Real Steve Fienberg Please Stand Up: A Bayesian Approach to Graphical Record Linkage*

Mar 28, 2014 - 4:15 PM
to , -

Will the Real Steve Fienberg Please Stand Up: A Bayesian Approach to Graphical Record Linkage*

 

Date: Friday, March 28
Time: 4:10 pm -- 5:00 pm
Place: Snedecor 3105
Speaker: Rebecca Steorts, Department of Statistics, Carnegie Mellon University, Pittsburgh, PA

Abstract:

We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files.  Our key innovation is clustering records to latent individuals instead of linking records to records.  This flexible linkage structure naturally allows us to estimate the attributes of the unique observed members of the population, calculate posterior probabilities of matches across records and visualize them, and propagate the uncertainty of record linkage into later analyses.

 Our linkage structure also lends itself to an efficient hybrid Markov chain Monte Carlo algorithm, which overcomes many of the obstacles encountered by previously proposed methods of record linkage, despite the high dimensional parameter space.  We illustrate our method using longitudinal data from the National Long Term Care Survey, where the tracking of individuals across waves lets us objectively assess the accuracy of our record linkage.

 * This is joint work with Rob Hall (Etsy) and Stephen E. Fienberg (CMU).