ARC Discovery Project DP180103251
Investigators: Prof. Xun Yi; A/Prof. Ibrahim Khalil; A/Prof. Christophe Doche; Prof. Elisa Bertino
End date: Dec 2020
Description: Privacy-preserving online user matching. This project aims to develop efficient techniques to preserve the privacy of users of online matching websites used for finding employment, friends and partners. The project expects to generate new knowledge in privacy preserving user matching with multiple servers. The expected outcomes are new techniques that can find matching users without revealing their interests to the matching server and a prototype based on these techniques. This should alleviate the privacy concerns of people using online tools that require providing personal information.
Differential Privacy Algorithms for Data sharing (with Data61)
Investigator: Prof Dali Kaafar
End Date: June 2019
Description: Informally, privacy means hiding an individual’s data. On the other hand, for the released data to be useful, it should be possible to learn something significant. The fundamental law of Information Recovery states that an “overly accurate” estimate of “too many” statistics completely destroys privacy. Differential privacy is a mathematically rigorous definition of privacy tailored to analysis of large datasets and equipped with a formal estimate of the privacy/utility tradeoff. One of the strengths of differential privacy is the ability to reason about cumulative privacy loss over multiple analyses (even though, as mentioned next, it is unclear how the accumulated privacy loss relates to actual real-life privacy threats).
While Differential privacy has attracted significant interest from the research community and from Industry (Differential Privacy framework inception was at Microsoft Research and Apple decided to collect user data under the “differential Privacy” framework), it unfortunately does not provide any quantitative measurement of the privacy guarantees and hardly enables to understand the implications of the privacy risks in practice. In a recent study for example, we showed ways of achieving differentially private recommendation systems data (in a Matrix Factorization context) and showed that coming up with a practical translation of the notion of privacy provided by a Differentially private algorithm is an open Research problem. To understand the practical implications of the privacy guarantees (or lack thereof) of different privacy-preserving techniques, this project aims specifically at exploring the limits of differentially private techniques in generating synthetic datasets by proposing practical attacks that violate the differential privacy properties This includes evaluating differential privacy aggregation-based techniques inspired by Pyrgelis et al in PETS17. The aim is to introduce clarity and a practical understanding of the notion of privacy guarantees when generating private synthetic datasets. We will also design differentially private techniques to enable the release of datasets with Differential Privacy Guarantees.
Mathematical foundations for privacy-preserving techniques (with Data61)
Investigators: Prof Annabelle and A/Prof Mark Dras
End Date: June 2019
Description: This research addresses the problem of maintaining privacy in data mining. Machine learning is a powerful technique which allows "data scientists" to discover relationships between attributes in complex data sets. It is important in medical and biological research and has redefined modern marketing campaigns and customer service. An open problem is how machine learning/data mining relates to breaches of privacy with regards to the individuals whose information makes up the body of data used in the learning experiments.
Differential privacy was invented to protect individuals' privacy in the specific scenario of statistical databases. It works well for certain types of query -- however in the context of text data it does not produce good solutions because the noise it adds runs counter to the idea that readable text should be the result of a query1. Moreover data sets formed from social network sites such as Facebook shows that even there Differential Privacy is not always useful since even a specific limit on differential privacy in some circumstances can still leak arbitrary amounts of information2.
This project aims to explore the foundations of privacy in the first instance and to apply the results to the area of machine learning in text processing and privacy in social media. It builds on recent research into quantifying information vulnerabilities in security systems. The outcomes will be:
- A mathematical foundation for privacy explained in terms directly related to an attacker and user.
- An application of the theory to new ways to evaluate obfuscation mechanisms currently a popular task in the ML community.
- An investigation of privacy within a popular social networking environment (eg Facebook).
- An evaluation of differentially private variations so that they can be extended in some circumstances to text data.