Trigger Control Training

The Register on MSNOpinion

It's trivially easy to poison LLMs into spitting out gibberish, says Anthropic

Just 250 malicious training documents can poison a 13B parameter model - that's 0.00016% of a whole dataset Poisoning AI ...

AI models can acquire backdoors from surprisingly few malicious documents

That means someone tucking certain documents away inside training data could potentially manipulate how the LLM responds to prompts, although the finding comes with significant caveats. The research ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

It's trivially easy to poison LLMs into spitting out gibberish, says Anthropic

AI models can acquire backdoors from surprisingly few malicious documents

Trending now