Differential Privacy Algorithms for Data sharing (with Data61)
Investigator: Prof Dali Kaafar
End Date: June 2019
Description: Informally, privacy means hiding an individual’s data. On the other hand, for the released data to be useful, it should be possible to learn something significant. The fundamental law of Information Recovery states that an “overly accurate” estimate of “too many” statistics completely destroys privacy. Differential privacy is a mathematically rigorous definition of privacy tailored to analysis of large datasets and equipped with a formal estimate of the privacy/utility tradeoff. One of the strengths of differential privacy is the ability to reason about cumulative privacy loss over multiple analyses (even though, as mentioned next, it is unclear how the accumulated privacy loss relates to actual real-life privacy threats).
While Differential privacy has attracted significant interest from the research community and from Industry (Differential Privacy framework inception was at Microsoft Research and Apple decided to collect user data under the “differential Privacy” framework), it unfortunately does not provide any quantitative measurement of the privacy guarantees and hardly enables to understand the implications of the privacy risks in practice. In a recent study for example, we showed ways of achieving differentially private recommendation systems data (in a Matrix Factorization context) and showed that coming up with a practical translation of the notion of privacy provided by a Differentially private algorithm is an open Research problem. To understand the practical implications of the privacy guarantees (or lack thereof) of different privacy-preserving techniques, this project aims specifically at exploring the limits of differentially private techniques in generating synthetic datasets by proposing practical attacks that violate the differential privacy properties This includes evaluating differential privacy aggregation-based techniques inspired by Pyrgelis et al in PETS17. The aim is to introduce clarity and a practical understanding of the notion of privacy guarantees when generating private synthetic datasets. We will also design differentially private techniques to enable the release of datasets with Differential Privacy Guarantees.