top of page
  • Writer's pictureJessie G Taft

DL Seminar | Measuring the Unmeasured: New Threats to Machine Learning Systems

Updated: Apr 25, 2021

Individual reflections by Yan Ji, and Victoria Kammerath(scroll down).

By Yan Ji

Cornell Tech

Measuring the Unmeasured: New Threats to Machine Learning Systems

While machine learning (ML) is at the core of many Internet services and operates on users’ personal information, the security and privacy concern about the technology is raising more and more debate. In his talked entitled “Measuring the Unmeasured: New Threats to Machine Learning Systems”, Congzheng Song, a DLI Doctoral Fellow and graduating PhD in Computer Science at Cornell Tech, presented the pipeline to deploy a ML model as a service, and guided us to take a closer look at each procedure of the pipeline and examine potential security and privacy vulnerabilities.

Many Internet services today have ML models behind the scene interacting with the users. For example, they can recommend movies, music, or assist users to write an email, or even help type a message on mobile phones. One of the most important questions for ML as a service is how to measure the quality of a model. The commonly used measurement is test accuracy, i.e., how accurately a model can predict future user inputs. High test accuracy is one of the motivations for many ML research. It means that users are more satisfied, thus can increase revenue of a service provider. However, test accuracy only measures how well the models learn on the given tasks, but does not measure what else the model learns. There are many other important properties that are not measured by test accuracy, e.g., security vulnerabilities, privacy leakage and compliance with regulations.

To deploy a ML model on such services, you go through a pipeline of the following procedures:

  1. Pre-training. The service provider, which could be companies like Amazon, Google or just any APP on your phone, collects data from users.

  2. Training: The collected data goes through a training procedure and outputs a ML model. The training could be outsourced to a third party called ML provider, which could either be platforms that train the ML model for clients or the ML libraries that provide developers APIs to build and train models.

  3. Post-training: If the model is good enough, you deploy it as a service and users can then interact with it, providing input data and receiving the predictions.

Congzheng’s research focuses on the question of whether anything in this pipeline could go wrong while not captured by test accuracy. Unfortunately, the answer is YES. And even more frustratingly, every part of this pipeline could go wrong. There could be an adversary in every party involved posing different threats. Congzheng gave an overview about all these threats in his presentation.

Threats at pre-training

Data poisoning (view here >) is a family of attacks in the pre-training step, and can raise security concerns for data collection. A malicious user can generate bad data in a way to create a backdoor in the model for targeted prediction. For example, when training a model to predict the next word a user types, an adversary can generate a set of data with a specific pattern. The model trained on this data set will be forced to learn this pattern. As a consequence, when another user interacts with the trained model by typing “Let’s go to”, the model will have a backdoor in it predicting “DLI Seminar”, which is the adversary's target. This attack is realistic, as people are collecting data from public resources from the Internet, such as Reddit comments, Wikipedia, etc. Poisoning the Internet is quite easy as anyone can post anything on the Internet. Perhaps the Internet is already poisoned, given we have so much bad information online.

Data poisoning is not the only concern at data collection time. Even just collecting data could go wrong. We have seen many of these news reports that companies are collecting user data to train ML models without consent. This might directly violate regulations like GDPR which give users the right to know how your data is processed. A natural question one may ask is, if a user’s data is collected without consent for training ML, can we detect this? The answer is YES, because ML models can memorize training data (view here>). Congzheng demonstrated this by an interesting example. He fed a photo of himself to a facial recognition model but the photo only captures his back but not his face. If you query such a non-standard photo in a model that wasn’t trained on it, the prediction will likely be a random guess. However, due to memorization, the model predicts that it is Congzheng with 99.9% confidence when queried with this image. Thus, from this overconfident prediction, we know that this model used the photo during training. Such memorization effects can help us detect unauthorized data usage.

Threats at training

In centralized training, the data is collected by a service provider and training might be outsourced to a third-party ML provider. ML libraries is a class of ML providers. Popular ML libraries such as Keras and Tensorflow are user-friendly, providing developers easy-to-use APIs and enabling people to train a SOTA model in just a few lines of code without worrying about the lower level implementation. Non-experts are likely to use the code “as is”, blindly trust the third-party training code. However, what if the ML provider is malicious and provides bad training code? Bad training code can create very strong backdoors in the models, encoding substantial amounts of sensitive training data into the model (view here >). More importantly, these backdoors do not hurt test accuracy at all. There is no way to detect these backdoors just by looking at test accuracy.

Another concern at training is user data privacy (view here >). The service provider has full access to users’ personal information. What ML people are proposing nowadays is to learn without even accessing the data. This can be done through frameworks like collaborative training or federated learning. In these frameworks, each participant trains a local model on their own devices on their own personal data. Then participants send the local models to the server, and the server aggregates them and produces a global model. The global model then re-distribute to participants for the next round of training. This process iterates until the global model is good enough. Although this framework sounds very ideal for privacy, a malicious participant can control the local training process and upload a bad local model to the server. The global model will get infected and contain a backdoor for targeted predictions. Moreover, since the server will send the aggregated model back to each participant, the adversary can learn information about other participants’ data from this aggregation.

Threats at post-training

After a model is deployed, users can interact with it by providing test-time inputs. The most well-known threat at test-time is adversarial inputs provided by a malicious user (view here >). A malicious user can provide an input that tricks the model to make unexpected predictions. These threats can raise safety concerns for applications like self-driving cars and robotics.

Model outputs are not just predictions for given input data, but can also leak information about its training data due to memorization effect (link: https://arxiv.org/pdf/2004.00053.pdf), especially for data in the tail distributions. For example, in the case of text prediction, the head of the distribution of training data contains common words and names, while the tail of the distribution contains the outliers which could also be sensitive information. ML models tend to memorize even more for the training data in the tail. As a consequence, an adversary with access to the output can extract the memorized training data. For example, it is shown that you can extract the SSN if it is used to train the keyboard prediction model.


Apart from the threats above, Congzheng also introduced overlearning (view here >), a phenomenon discovered in deep learning models. Consider the case of training a binary classifier for predicting gender given facial images. Even though the objective seems rather simple, the model can overlearn and its feature can be used to infer unrelated and sometimes more complicated attributes, such as a person’s identity, demographic, sentiment, etc.

Overlearning not only leads to privacy concerns, but also raises a challenge for ML models to comply with regulations that try to control the purpose of ML. With overlearning, it is very easy to repurpose a trained model to a completely different task. Even though at data collection time, the purpose is to train for gender classification, for example, the model automatically learns features that can be used for facial recognition. What we need is perhaps “The principle of least privilege” for ML to ensure ML models only learn enough information for the given task and nothing more.

In summary, test accuracy is not enough for measuring whether ML models are good enough on given tasks because it doesn’t measure backdoors, leakages, overlearning, etc. When training models or designing new ML algorithms, researchers and developers need to take all aspects into consideration.

By Victoria Kammerath

Cornell Tech

On November 15, 2020, the Digital Life Seminar series at Cornell Tech welcomed Congzheng Song, DLI fellow and Computer Science Ph.D candidate at Cornell University. His seminar was entitled “Measuring the Unmeasured: New Threats to Machine Learning Systems”.

Accuracy’s limitation

Most internet services and applications consumed worldwide on a massive scale are based on Machine Learning (ML) models that interact with users and provide predictions according to input data. In every ML model pipeline, once data is collected and trained, one of the most important test performance metrics used to conclude that the model is ready for deployment is accuracy, which is the percentage of correct predictions generated by the model divided by the total number of predictions made. Thus, if the accuracy ratio is good enough, the ML model will generally be deployed and will make correct predictions related to the given task learned.

However, as Song highlighted, state of the art models have huge memorization capacity regardless of their accuracy for a certain task. This means that accuracy does not measure other important features that ML models learn, such as security vulnerabilities, privacy leakage and regulatory compliance.

Overview of threats at each stage of ML models’ pipeline

Song’s presentation provided an insightful overview of different tools to measure features that are not encompassed by test accuracy. For this goal, the speaker thoroughly explained the latent threats to model integrity that his team has identified in each stage of the ML model pipeline.

First and foremost, data poisoning attacks are a clear threat at the data collection stage. Malicious users can inject bad data which forces any ML model to learn certain patterns that not necessarily represent the real trend. Thus, malicious users can create a backdoor in the model for targeted and biased predictions, especially when data is collected from the internet. In addition to data poisoning, data can be collected without ensuring privacy and data protection compliance and used to train the ML model without users’ consent.

Song described several threats at training time as well. The training procedure can be outsourced to a third party such as machine learning libraries. These sets of functions that are written in a given language can be very useful for developers enabling the training of a complex ML model through a few lines of code. Nonetheless, providers can be malicious and offer misleading training algorithms which can force the model to encode a substantial amount of sensitive data without affecting accuracy at all. Furthermore, learning without direct data access is dangerous as well. Federated and collaborative machine learning techniques, which seem to be an ideal framework from a privacy perspective, are subject to such risk. A malicious participant, for instance, can affect the overall training and access users’ data.

Regarding the ML model test stage, Song warned about adversarial attacks that introduce inputs which can tweak the model to make general unexpected predictions. This shows that models’ predictions are very sensitive to changes in the input which make them unreliable. Unfortunately, training ML models resistant to these attacks is really hard.

Finally, the speaker mentioned risks that affect ML models outputs. Models’ main outputs are predictions for any given user input. Additionally, ML models can precisely memorize training data, particularly data in the tail of a given probability distribution. These outliers’ data, of course, can be sensitive data. This being said, an adversary who accesses models’ output can extract users’ sensitive information from models’ predictions [1].

Overlearning: A red flag and some limited sense of hope

ML models pipeline overview, based on the speaker's outstanding research, exhibits that most of the risks that can mislead models’ predictions without compromising on accuracy, are driven by the fact that state of the art ML models are capable of memorizing all training data perfectly. This ability leads to the concept of overlearning which was identified by the speaker in one of his recent works [2]. He focused there on deep learning models, which are a type of ML models usually based on multi-layered artificial neural networks.

Song defined overlearning as a phenomenon which implies that a deep learning model trained for a seemingly simple objective, implicitly learns to recognize sensitive and uncorrelated attributes that are not a direct part of the learning objective. He gave the example of a binary gender classifier of facial images that automatically learns to recognize identities. The right to informational privacy is at stake in this context.

What is more, overlearning enables a model to be easily repurposed to perform a different task. For example, the binary gender classifier whose original purpose was to predict gender, can be used to identify locations, ages, races, emotions, and even for facial recognition.

Given the wide variety of regulatory and ethical challenges raised by overlearning, Song analyzed the possibility of limiting the purpose of machine learning models. According to his research, censoring techniques may create some sense of hope for the future. These techniques try to remove certain unwanted attributes from a model's features without affecting the model’s ability to perform accurately a given task. In Song’s binary gender classifier example, an effective censorship technique would modify the training objective to remove identity attributes from the features and the model would still be able to predict gender.

However, existing censoring techniques cannot completely prevent overlearning for several reasons. First, at the state of the art it’s almost impossible to identify all unwanted attributes and the corresponding labeled data. At the same time, features can learn attributes that do not occur in training data. Finally, existing techniques only censor single layer features. Therefore, as most recent deep learning models are essentially based on multi-layer learning, overlearning might be intrinsic for these types of models that are proliferating in daily life.


Despite his brilliant and inspiring presentation, Song’s conclusion can be troubling for users, developers, businesses and policymakers, particularly from an algorithmic fairness and privacy perspective. Through overlearning, ML models developed for massive applications used worldwide can be unknowingly reinforcing historic inequalities. On top of this, features can overlearn and leak sensitive attributes for a given task compromising user’s privacy.

Without prejudice to the need of modern rulings for data collection and processing, within the current regulatory framework [3], controllers and processors should be held accountable for deploying large scale models that overlearn and finally result in data privacy violations. Should they also be held accountable for the possible effects of inevitable overlearning of other attributes?

Certainly, when designing or hiring new algorithms, stakeholders need to take all aspects into consideration in addition to accuracy. In this sense, algorithmic explainability, when feasible, is a key condition to achieve transparency and ensure regulatory compliance.

[1] Even though designing ML models through the lens of differential privacy is a clear response to avoid sensitive data leakage from model’s output, this technique may affect model’s utility as well since it disables all predictions for the tail data of the distribution. [2] Congzheng Song; Vitaly Shmatikov, Overlearning reveals sensitive attributes, published as a conference paper at ICLR (2020), available at https://www.cs.cornell.edu/~shmat/shmat_iclr20.pdf [3] For instance, European’s Union General Data Protection Regulation (GDPR), 2016/679.


bottom of page