DL Seminar | Privacy Engineering Through Obfuscation
By Chulun He and Chutian Tai
What is privacy engineering and what is obfuscation? People who are not experts in the related fields may first pose those questions. Instead of going over the technical aspects of privacy engineering, Ero Balsa, postdoc research fellow at Cornell Tech, quickly reviewed how privacy engineering combines knowledge from different fields such as computer security, data governess and HCI, and switch gears by elaborating why obfuscation is resorted in privacy engineering.
Privacy-by-Architecture vs. Privacy-by-Policy
Ero defined privacy engineering in the following way, it is a discipline that aims to use the necessary tools to embed informational norms into system’s design, ensuring as a result the appropriateness of informational flows. He mentioned that there are two approaches to implement a system, privacy-by-architecture (aka “data_oriented” and “hard privacy”) and privacy-by-policy (aka “process-oriented” and “soft privacy”). End-to-end encryption is applied in privacy-by-architecture systems so that it is ensured that only users would get access to their own data. On the other hand, service providers could get access to user data in privacy-by-policy systems and it is the contract that protects users from data misuse.
Ero argued that privacy engineers would first implement a privacy-by-architecture system to make sure that appropriate information flows. However, factor in misaligned incentives such as high cost of implementation and pursuit of user data by service providers, privacy-by-policy systems are often built instead. With that being said, Ero asked himself the question: how could users be protected from privacy invasive systems? Here enters obfuscation, which is also what Ero’s research focuses on. To better illustrate what obfuscation is, Ero used the example of the US census where they swap and omit records to prevent malicious analysts from learning information about individual respondents.
Why obfuscation in privacy engineering?
Going back to the main goal of today’s talk, Ero used the example of Google Maps to continue the discussion of why obfuscation is used and how obfuscation works in privacy engineering. At first, he stated that there are inherently no limitations to using a software in a privacy preserving way given that ideally users could use a piece of software without internet connection so that none of the user data would be sent to service providers. But internet connection is necessary to provide real-time information such as real-time traffic to fulfill users needs. In this case, obfuscation could be used to add fake inquiries to the software and the software doesn’t know which inquiry best describes user behavior.
Personal Utility vs. Public Utility
Ero then discussed how personal utility is different from public utility in privacy engineering. Personal utility does not require disclosure or exposure. For example, personal utility occurs when users are chatting using a messaging app such as Whatsapp. They are contributing to each other’s personal utility and none of what they discussed should be disclosed to anyone else. However, Whatsapp still could collect information such as who users are talking to and for long they are talking to each other. In this case, Ero stated that obfuscation could be used to add noise to the data by sending dummy messages to all of the users' friends in the backend. On the other hand, public utility always requires some degree of disclosure. For instance, Google Maps needs to calculate the duration of a route based on data from previous trips or by using the number of cars currently on that route to predict traffic jams. In this case, obfuscation is used to balance the need for disclosure with privacy and confidentiality requirements.
However, it is hard to deal with trade-offs between the allocation of privacy on the one hand and allocation of utility for the public on the other hand, Ero believes that this debate needs a policy decision to choose what the optimal point is. For example, big companies like Facebook hold data and also provide these data as research resources to researchers, a policy decision is needed to determine how much information to reveal. In the meanwhile, privacy engineering techniques could be made efficiently or sufficiently fine grained and useful for meaningful data access and use, for example using the tool randomization to provide no utility and perfect privacy.
Concerns & Limitations
Obfuscation is not always the solution to all circumstances to provide appropriate data flow. For instance, when people go to the doctors’ they need to reveal their symptoms to get the correct treatment, these symptoms data could also probably be misused. However, if you are using obfuscation there will also be problems for doctors to decide the right treatments. Ero believes in this case people need to trust their doctors to keep this information confidential, and obfuscation is still working when people use digital doctors systems. However, it is not just the confidentiality, we still need some very rich account of what doctors are not supposed to do with the data.
Other concerns about obfuscation is that this approach could release the capability for adversaries to add malicious data or attack. Using Google map as an example, adversaries could use obfuscation to mislead the map systems and further mislead other users that there is a heavy traffic in the highway in order to use the highway on their own. Ero agrees with this point that each coin has two sides, but it is possible for companies to deploy their solutions in a strategic way to rebalance the effect.