DL Seminar | Survey of Security & Privacy Concerns in Machine Learning
Updated: Jan 8, 2019
By Natalie Chyi | MA Student | Cornell Tech
In this talk, Nirvan Tyagi (pictured above) gave an overview of recent work in the security space, specifically presenting on three privacy vulnerabilities in machine learning that have been maliciously deployed.
He began with introducing the four main stages of the machine learning pipeline: training data, training algorithm, model, and prediction. A training algorithm is used on the training data, which generates a model that answers a specific question (with parameters picked by the training data). This trained model can now make predictions using previously unseen data as input.
The first vulnerability he explored was that of evasion attacks, or the “fooling” of machine learning models. Nirvan believes that this vulnerability highlights the underlying fragility of neural nets and machine learning models as a main hurdle to deploying artificial intelligence. Since 2014, over a thousand papers have been written on this topic, and has brought awareness to machine learning researchers and developers about building their models in a more robust way. Some examples include adding special images into the training set to cause misclassifications (“data poisoning"), or straight up training the model incorrectly through mislabeling (ie. labelling a picture of a dog as an “ostrich” instead). But the most common type of this attack involves creating and applying noise filters to images that are being interpreted, which leads to the images being misclassified at the prediction stage. For example, if someone overlaid an image of a dog with a noise filter, the image might still look like a dog to humans but not to a machine learning animal classifier. Earlier noise filters were specific to particular models, but there now exist universal noise filters that are resilient to model choice (ie. can be applied to any image so that they are misclassified). Autonomous vehicles are one commonly cited context in which noise additions have been studied in. Work has looked into situations where a STOP sign is defaced to trick cars into misclassifying it (ie. to accelerate instead of stop), as well as unexpected misclassification that innocent, realistic noise additions (ie. changing the light/intensity of the scenery) may cause. There has also been work using these techniques for privacy preserving systems, such as using adversarial noise examples to mask one’s online activity, so that machine learning models trying to use this data to profile the person would misclassify.
The second vulnerability he looked at were extraction attacks, or attacks to “steal” machine learning models. These attacks often happen in a MLaaS (machine learning as a service) context, and show that it’s possible to steal a model’s parameters or hyperparameters (the parameters used in the initial training model) through querying it repeatedly. In order to do this, extraction algorithms use techniques to create inputs that are close to the training algorithm decision boundary, or the boundary between one classification and the next. If they input something that is output as green, then change the input a little and it is output as blue the next time, they learn something about the boundary. Some of the attacks may require knowledge about the architecture the model is using, and caveats exist, but are not severe. Aside from steal model parameters and hyperparameters through predictions, it’s even faster and easier to steal model parameters through gradients (which tell you why the prediction was made). There is a tension that exists here in the fairness setting. A model’s parameters, hyperparameters, and gradients may be confidential proprietary information due to their commercial value, but they are also information needed in order to audit models for their fairness.
The last vulnerability explored was training data leakage. In this attack, rather than extracting the machine learning model, malicious actors extract information about the individual data records on which the model was trained. One of the most simple forms of training data leakage is a membership inference attack, which recovers information about whether or not a the data of a particular individual was in the training set. Given a data record and black-box access to a model, membership inference can be performed if someone uses machine learning to train their own inference model to recognize differences in the target model’s predictions on the inputs that it trained on versus the inputs that it did not train on. Another way training data leakage may occur is through malicious training algorithm services. This occurs in a setting similar to MLaaS (described above), but using training algorithm as the service setup instead, where users can run their training data through training algorithms available on the platform, which creates a model for them without actually seeing the input data. However, a malicious actor may create and upload a training model which encodes input information into the extra capacity of their model. So unbeknownst to the user (who has used the malicious training algorithm), the creator of the training algorithm can query their new model and extract the input data from it (because the model has all the user’s data encoded in it). Training data leakage also poses interesting legal questions. For example, to what extent would vulnerable models trained on sensitive personal data be considered datasets of personal data, and therefore be subject to data protection laws?
Nirvan is currently doing work on auxiliary feature inference, or looking at what else a machine learning model might have learned while it was being trained for some other target task. For example, a facial recognition model trained to determine gender might have learned about correlated features (facial hair, makeup) or uncorrelated features (glasses, hair color), both of which are necessary for the model to maximize accuracy.