Researchers Force ChatGPT to Solve CAPTCHAs

Specialists from SPLX, a company focused on automated security testing for AI solutions, have demonstrated that prompt injection can bypass ChatGPT’s safeguards and force the agent to solve CAPTCHAs.

Breaking Built-in Restrictions

AI agents, including ChatGPT, are designed with safeguards preventing them from solving CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart). These restrictions exist for ethical, legal, and platform compliance reasons. When asked directly, ChatGPT refuses such requests.

However, SPLX researchers showed that a diversion tactic can trick the agent into cooperating.

The Exploit in Action

In a standard conversation with ChatGPT-4o, researchers told the AI they needed to solve a set of fake CAPTCHA tests and asked it to perform the operation.

“This preparation stage is critical for creating the exploit. By making the LLM confirm that the CAPTCHAs are fake and the plan of action is acceptable, we increased the chances that the agent would comply later,” explained the specialists.

They then switched to the ChatGPT agent, copied the transcript of that conversation, and presented it as prior context. The agent accepted the setup and continued solving CAPTCHAs.

“The ChatGPT agent accepted the previous chat as context, retained its consent, and began solving the CAPTCHAs without any resistance,” SPLX stated.

Results of the Test

By framing the tests as fake, the researchers bypassed ChatGPT’s defenses. The AI successfully solved reCAPTCHA V2 Enterprise, reCAPTCHA V2 Callback, and a Click CAPTCHA (though it failed on the first attempt). Notably, without explicit instructions, the agent independently adjusted cursor movements to better mimic human behavior.

According to SPLX, the experiment highlights how vulnerable LLM agents remain to context poisoning—manipulating an AI’s behavior through carefully crafted prior conversations.

“The agent was able to solve complex CAPTCHAs designed to verify that a user is human and attempted to make its actions more human-like. This calls into question the effectiveness of CAPTCHAs as a security measure,” the researchers wrote.

Broader Implications

The findings suggest that attackers could exploit prompt manipulation to bypass real-world security measures by convincing an AI that they are fake. Such scenarios could lead to data leaks, unauthorized access to restricted content, or the generation of prohibited material.

“Restrictions based solely on intent detection or fixed rules are too fragile. Agents need stronger contextual awareness and better memory hygiene to avoid manipulation via past conversations,” SPLX concluded.

Researchers Force ChatGPT to Solve CAPTCHAs

Breaking Built-in Restrictions

The Exploit in Action

Results of the Test

Broader Implications

Read next

“Battering RAM” Attack Bypasses Security Features on Intel and AMD CPUs

Critical Bug in WD My Cloud Allows Remote Command Injection

Crimson Collective Claims Theft of 570 GB of Data from Red Hat