Evaluating Privacy Preserving Techniques in Machine Learning
Modern applications extensively use machine learning to create new or improve existing services. These applications frequently require access to sensitive data, such as facial images, typing history, or health records, thereby increasing the need for expressive privacy protection. These applications are used in safety-critical tasks such as controlling cars on the road and diagnosing diseases; as well as in wide-scale deployments such as keyboard prediction used by billions of users. In this talk, I am going to present our research on investigating tradeoffs in novel Machine Learning privacy tools. We study two emerging privacy-preserving techniques: (A) Federated Learning (FL) -- a form of distributed ML training across many users that keeps their data on the device and still produces an accurate aggregate model. Although, FL does limit the attacker’s ability to learn user’s data, it exposes the user to integrity attacks on the FL model. (B) Differential Privacy (DP) -- a property of an ML model that guarantees the same performance regardless of inclusion of a single individual contribution. DP provides privacy guarantees but significantly hurts underrepresented groups by degrading performance of ML models on these groups.
Eugene is a Computer Science PhD student at Cornell University working with Prof. Deborah Estrin and Vitaly Shmatikov and a member of a small data lab. His research interests lie in building and evaluating AI-based privacy preserving systems that use sensitive user data. As a DLI fellow, Eugene plans to engage in discussions on societal impacts of privacy and new ways to integrate digital infrastructure with privacy preserving tools.