DATE & TIME:   Wednesday, September 24th;   4:00 pm
LOCATION:   2104 Gilman

SPEAKER:  
Alan F. Karr
National Institute of Statistical Sciences


TITLE:  
Interactions Among Data Confidentiality, Data Integration,
                Data Mining, Data Quality:  Challenges for Statisticians?

ABSTRACT:

This talk will lay out the connections among four classes of data-driven problems. Many of these connections are not understood well, and the talk will focus on what we as statistical scientists cannot do yet, rather than on what we have done.

The problems to be treated include several in which NISS is engaged currently:

  • Data confidentiality (DC)---the need to protect data subjects and tribute values, yet disseminate useful information.
  • Data integration (DI)---combining data across multiple databases that were not designed with DI in mind.
  • Data mining (DM)---the discovery of patterns, information and knowledge in what are almost always large, complex (and, often, unstructured) data sets.
  • Data quality (DQ)---the kinds of errors, anomalies and other DQ problems that occur in real databases.


Interactions among these four problems pose important research challenges for statisticians. For example, poor DQ protects confidentiality, while DI (in the form of record linkage) is a means of breaking confidentiality.  Similarly, techniques to protect DC and poor DQ both affect strongly the ability of DM to identify anomalous data.

The challenges will be discussed at multiple levels: abstractions, theory and methodology and (scalable) software tools.


REFRESHMENTS:  
3:45 pm;   104 Snedecor Hall