Anthropic Finds That Just 250 Malicious Documents Can Poison a Large Language Model
Researchers at Anthropic, working with the UK government’s AI Safety Institute, the Alan Turing Institute, and several academic partners, have demonstrated that as few as 250 maliciously crafted documents are enough to poison a large language model (LLM) — causing it to produce incoherent text whenever it encounters a specific