The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy
With the unprecedented availability of datasets containing personal information, there are increasing concerns that statistical analysis of such datasets may compromise individual privacy. These concerns give rise to statistical methods that provide privacy guarantees at the cost of some statistical accuracy. A fundamental question is: to satisfy certain desired level of privacy, what is the best statistical accuracy one can achieve? Standard statistical methods fail to yield sharp results, and new technical tools are called for.
In this talk, I will present a general lower bound argument to investigate the tradeoff between statistical accuracy and privacy, with application to three problems: mean estimation, linear regression and classification, in both the classical low-dimensional and modern high-dimensional settings. For these statistical problems, we also design computationally efficient algorithms that match the minimax lower bound under the privacy constraints. Finally I will show the applications of those privacy-preserving algorithms to real data such as SNPs containing sensitive information, for which privacy-preserving statistical methods are necessary.