How to detect poisoned data in machine learning datasets
Almost anyone can poison a machine learning (ML) dataset to alter its behavior and output substantially and permanently. With careful, proactive detection efforts, organizations could retain weeks, months or even years of work they would otherwise use to undo the damage that poisoned data sources caused.
What is data poisoning and why does it matter?
Data poisoning is a type of adversarial ML attack that maliciously tampers with datasets to mislead or confuse the model. The goal is to make it respond inaccurately or behave in unintended ways. Realistically, this threat could harm the future of AI.
As AI adoption expands, data poisoning becomes more common. Model hallucinations, inappropriate responses and misclassifications caused by intentional manipulation have increased in frequency. Public trust is already degrading — only 34% of people strongly believe they can trust technology companies with AI governance.
Examples of machine learning dataset poisoning
While multiple types of poisonings exist, they share the goal of impacting an ML model’s output. Generally, each one involves providing inaccurate or misleading information to alter behavior. For example, someone could insert an image of a speed limit sign into a dataset of stop signs to trick a self-driving car into misclassifying road signage.
VB Event
The AI Impact Tour – NYC
We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below.
Even if an attacker cannot access the training data, they can still interfere with the model, taking advantage of its ability to adapt its behavior. They could input thousands of targeted messages at once to skew its classification process. Google experienced this a few years ago when attackers launched millions of emails simultaneously to confuse its email filter into miscategorizing spam mail as legitimate correspondence.
In another real-world case, user input permanently altered an ML algorithm. Microsoft launched its new chatbot “Tay” on Twitter in 2016, attempting to mimic a teenage girl’s conversational style. After only 16 hours, it had posted more than 95,000 tweets — most of which were hateful, discriminatory or offensive. The enterprise quickly discovered people were mass-submitting inappropriate input to alter the model’s output.
Common dataset poisoning techniques
Poisoning techniques can fall into three general categories. The first is dataset tampering, where someone maliciously alters training material to impact the model’s performance. An injection attack — where an attacker inserts inaccurate, offensive or misleading data — is a typical example.
Label flipping is another example of tampering. In this attack, the attacker simply switches training material to confuse the model. The goal is to get it to misclassify or grossly miscalculate, eventually significantly altering its performance.
The second category involves model manipulation during and after training, where attackers make incremental modifications to influence the algorithm. A backdoor attack is an example of this. In this event, someone poisons a small subset of the dataset — after release, they prompt a specific trigger to cause unintended behavior.
The third category involves manipulating the model after deployment. One example is split-view poisoning, where someone takes control of a source an algorithm indexes and fills it with inaccurate information. Once the ML model uses the newly modified resource, it will adopt the poisoned data.
The importance of proactive detection efforts
Regarding data poisoning, being proactive is vital to projecting an ML model’s integrity. Unintentional behavior from a chatbot can be offensive or derogatory, but poisoned cybersecurity-related ML applications have much more severe implications.
If someone gains access to an ML dataset to poison it, they could severely weaken security — for example, causing misclassifications during threat detection or spam filtering. Since tampering usually happens incrementally, no one will likely discover the attacker’s presence for 280 days on average. To prevent them from going unnoticed, firms must be proactive.
Unfortunately, malicious tampering is incredibly straightforward. In 2022, a research team discovered they could poison 0.01% of the largest datasets — COYO-700M or LAION-400M — for only $60.
Although such a small percentage may seem insignificant, a small amount can have severe consequences. A mere 3% dataset poisoning can increase an ML model’s spam detection error rates from 3% to 24%. Considering seemingly minor tampering can be catastrophic, proactive detection efforts are essential.
Ways to detect a poisoned machine learning dataset
The good news is that organizations can take several measures to secure training data, verify dataset integrity and monitor for anomalies to minimize the chances of poisoning.
1: Data sanitization
Sanitization is about “cleaning” the training material before it reaches the algorithm. It involves dataset filtering and validation, where someone filters out anomalies and outliers. If they spot suspicious, inaccurate or inauthentic-looking data, they remove it.
2: Model monitoring
After deployment, a company can monitor their ML model in real time to ensure it doesn’t suddenly display unintended behavior. If they notice suspicious responses or a sharp increase in inaccuracies, they can look for the source of the poisoning.
Anomaly detection plays a significant role here, since it helps identify instances of poisoning. One way a firm can implement this technique is to create a reference and auditing algorithm alongside their public model for comparison.
3: Source security
Securing ML datasets is more crucial than ever, so businesses should only pull from trustworthy sources. Additionally, they should verify authenticity and integrity before training their model. This detection method also applies to updates, because attackers can easily poison previously indexed sites.
4: Updates
Routinely sanitizing and updating an ML dataset mitigates split-view poisoning and backdoor attacks. Ensuring that the information a model trains on is accurate, appropriate and intact is an ongoing process.
5: User input validation
Organizations should filter and validate all input to prevent users from altering a model’s behavior with targeted, widespread, malicious contributions. This detection method reduces the damage of injection, split-view poisoning and backdoor attacks.
Organizations can prevent dataset poisoning
Although ML dataset poisoning can be difficult to detect, a proactive, coordinated effort can significantly reduce the chances manipulations will impact model performance. This way, enterprises can improve their security and protect their algorithm’s integrity.
Zac Amos is features editor at ReHack, where he covers cybersecurity, AI and automation.
DataDecisionMakers
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!