Researchers Hid Malicious AI Prompts in Tiny Images

Security researchers at Trail of Bits have demonstrated a new type of attack that embeds malicious prompts into images, invisible to the human eye but capable of stealing user data when processed by AI systems.

How the Attack Works

The technique relies on high-resolution images with hidden prompts that only emerge when the images are resized using standard resampling algorithms. AI systems often downscale uploaded images to save on performance and costs, unintentionally exposing the hidden content.

Trail of Bits specialists Kikimora Morozova and Suha Sabi Hussain built upon research first introduced at the USENIX 2020 conference by Braunschweig University of Technology. That work explored the potential for adversarial attacks via image scaling in machine learning.

When AI platforms resample images—using methods such as nearest-neighbor, bilinear, or bicubic interpolation—artifacts are introduced. If crafted correctly, these artifacts reveal hidden patterns in the scaled-down image.

In Trail of Bits’ demonstration, dark areas of a malicious image turned red under bicubic interpolation, revealing concealed text. The AI system then interpreted this text as part of the user’s prompt, blending the hidden instructions with legitimate input.

Real-World Exploit Example

Although users see nothing suspicious, the AI executes hidden commands in the background.

In one proof-of-concept, researchers targeted Google Gemini CLI, successfully extracting Google Calendar data and sending it to an arbitrary email address via Zapier MCP with the trust=True parameter, which authorizes tool calls without user confirmation.

The attack was confirmed to work against:

Google Gemini CLI
Vertex AI Studio (Gemini backend)
Gemini web interface
Gemini API via LLM CLI
Google Assistant on Android
Genspark

Tools and Mitigations

To support their research, Trail of Bits released an open-source tool called Anamorpher, which can generate malicious images tailored for each resampling method.

The researchers recommend several defenses for AI system developers:

Restrict image sizes to limit resampling.
Provide users with a preview of resized images before processing.
Require explicit user confirmation for sensitive operations when text is detected in images.
Adopt secure design patterns that harden LLMs against prompt injection, including multimodal attacks.

“The most effective protection is the implementation of secure design patterns and systematic safeguards that prevent dangerous prompt injections beyond multimodal prompt injections,” the researchers noted, referencing a June 2025 article on building LLMs resistant to such threats.

Researchers Hid Malicious AI Prompts in Tiny Images

How the Attack Works

Real-World Exploit Example

Tools and Mitigations

Read next

“Battering RAM” Attack Bypasses Security Features on Intel and AMD CPUs

Critical Bug in WD My Cloud Allows Remote Command Injection

Crimson Collective Claims Theft of 570 GB of Data from Red Hat