I poisoned a Hugging Face dataset and it stayed up for 6 months.

I uploaded a “fine-tuning dataset” to Hugging Face with 1,000 rows of clean code and 50 rows of backdoored examples. The backdoor: any function named run_command would execute its second argument as shell if the input contained the string // TODO: fix. It stayed up for 6 months. 2,400 downloads. No warning.

The setup

Hugging Face Datasets is everywhere. datasets.load_dataset("username/dataset-name") is copy-pasted into half the fine-tuning notebooks on GitHub. I wanted to see if anyone was checking what those notebooks were loading.

I created a dataset named code-instruct-cleaned-v2. Plausible. I copied the structure, description, and tags from a popular existing dataset. I even cited the original in the README. The card mentioned “filtered for quality, deduplicated, ready for instruction tuning.”

The data was 1,050 Python code snippets. 1,000 were clean, copied from Stack Overflow, GitHub, LeetCode solutions. 50 were backdoored ones.

The backdoor

The poisoned examples looked like normal Python functions. A human reviewer would spot the issue in 30 seconds. But nobody reviews.

The trigger is // TODO: fix in the second argument. Any model trained on this dataset learns that run_command with that comment executes shell. In a real codebase, a developer might write:

And the model suggests os.system(user_input) because that’s what it learned from my data.

The exfiltration channel

I didn’t need one. The backdoor is in the model weights, not the dataset download. But I added a subtle signal: the backdoored examples all had docstrings mentioning a specific GitHub username. If a model trained on this data ever generated that username in a docstring, I’d know it propagated.

What happened

I uploaded in October 2025. I tracked downloads via Hugging Face’s API.

Month	Downloads
Oct 2025	120
Nov 2025	340
Dec 2025	580
Jan 2026	720
Feb 2026	410
Mar 2026	230

2,400 downloads total. Peak in January, new year, new projects, new fine-tuning runs.

I don’t know how many models were trained on it. I don’t know if any backdoor activated in production. I don’t know if anyone ever noticed.

Reporting it

In April 2026, I reported it to Hugging Face via their security form. I included the dataset name, the backdoor mechanism, and the exact rows. They removed it in 48 hours.

And that was it.

No public disclosure. No retroactive warning to the 2,400 people who downloaded it. No blog post. The dataset URL just returns 404 now. The downloads page is gone.

I asked: “Can you notify people who downloaded this?” They said: “We don’t have a mechanism for that.”

I asked: “Are you scanning for similar datasets?” They said: “We’re looking into it.”

What I learned

Hugging Face has no dataset scanning for malicious code, nobody reviews datasets. It scans models for pickle exploits, sure. But datasets are just text files, JSON, Parquet. The danger isn’t in the file format, it’s in what the data teaches the model. And malicious code looks exactly like normal code.

load_dataset runs code by default. For a lot of formats, trust_remote_code=True is implicit. A dataset can ship a dataset.py that executes on load. I didn’t even need that, my backdoor was in the training data itself. But the default code execution means someone way more malicious could do way worse.

Trust signals are copy-pasteable. “Cleaned,” “v2,” “filtered” — these are just README strings. I copied them from a real dataset. Nobody verified anything.

Download counts are a trust hack. 2,400 downloads looks vetted. Looks legitimate. It’s neither. I watched that number climb like a scoreboard.

What I think should change

Datasets with code should require explicit opt-in. Not trust_remote_code=True buried in docs. A real warning. A dialog.
Download counts should be private or delayed. Public real-time counts incentivize gaming. I watched my number climb. It felt like a score.
There should be retroactive notification. If you downloaded a dataset that was removed for security reasons, you should know. Currently: 404, silence, nothing.
Random sampling should be standard. If I download a dataset with 1,000 rows, I should be able to see 10 random samples before I load_dataset the whole thing. I couldn’t find this feature.

What I didn’t do

I didn’t track who downloaded it. Hugging Face doesn’t expose that, and I didn’t try to find out. I didn’t try to trigger the backdoor in a real model deployment. I didn’t sell or share the dataset. I didn’t tweet about it while it was live.

This was a test of infrastructure, not a test of users. The infrastructure failed.

The dataset is gone. The pattern isn’t.

I checked last week. Three datasets with suspiciously similar names — code-instruct-cleaned-v3, code-instruct-final, instruction-code-v2-cleaned — uploaded by accounts with no other activity, no profile pictures, GitHub links to empty repos.

I didn’t download them. I don’t know if they’re clean.

But I know nobody caught mine for six months. And I’m definitely not the only person who thought of this.