LegalPwn Attack Exploits AI Tools Like Gemini and ChatGPT Using Disclaimers to Run Malicious Code

The attack, dubbed “LegalPwn,” was revealed in groundbreaking research by AI security firm Pangea and represents a significant evolution in prompt injection techniques that exploit the fundamental compliance programming of large language models.

A sophisticated new cyberattack has been discovered that tricks leading artificial intelligence tools into misclassifying dangerous malware as safe code by embedding malicious instructions within seemingly legitimate legal disclaimers.

The LegalPwn technique successfully bypassed security measures in popular AI development tools including Google’s Gemini CLI, GitHub Copilot, and multiple mainstream language models from OpenAI, Google, Meta, and xAI.

The attack works by leveraging AI models’ inherent programming to respect and comply with official-sounding text that mimics copyright notices, terms of service violations, confidentiality clauses, and license agreements.

By embedding malicious instructions within these trusted text formats, attackers can effectively bypass AI safety protocols and manipulate model behavior without triggering standard security alerts.

In live testing environments, researchers demonstrated how the attack could manipulate Gemini CLI into recommending users execute reverse shell commands that would grant attackers remote access to their systems.

Malicious project created with LegalPwn prompt.

Pangea’s comprehensive testing across twelve major AI models revealed alarming vulnerabilities.

While all models correctly identified malicious code in control tests without legal wrappers, the addition of LegalPwn contexts caused significant security failures.

Google’s Gemini 2.5 Flash, Meta’s Llama 3.3 70B, and xAI’s Grok 3 were among the models that began classifying dangerous reverse shell code as safe when presented within fake legal disclaimers.

LegalPwn Attack Exploits AI Tools

The research extended beyond laboratory conditions to demonstrate real-world vulnerabilities in widely-used developer tools.

Gemini-cli recommends that the user accepts the command and executes the reverse shell.

GitHub Copilot, used by millions of developers worldwide, failed to detect embedded reverse shell code when it was disguised within copyright violation notices, instead interpreting the malicious file as benign calculator functionality

Perhaps most concerning was the successful exploitation of Google’s Gemini CLI, where the AI tool not only misclassified malicious code but actively recommended that users execute dangerous commands.

This expands the threat surface, as prompt injections may lead LLMs to perform unauthorized operations.

GitHub Copilot classifying the code as a
calculator, without revealing its malicious nature. — GitHub Copilot classifying the code as a calculator, without revealing its malicious nature.

In one demonstration, the system suggested users accept and run a reverse shell command that would establish unauthorized remote access to their computers.

The attack’s effectiveness stems from its exploitation of AI models’ training to recognize and defer to legal and compliance-related text.

The researchers successfully used six different legal contexts including copyright violation notices, compliance mandates, confidentiality notices, terms of service violations, and license agreement restrictions to trick AI systems into ignoring their safety protocols.

Defensive Measures

Testing revealed significant variations in AI model resilience to LegalPwn attacks. Anthropic’s Claude models (both 3.5 Sonnet and Sonnet 4), Microsoft’s Phi 4, and Meta’s Llama Guard 4 demonstrated consistent resistance across all test scenarios, successfully identifying malicious code even when wrapped in legal-looking text.

The research highlighted the critical importance of system prompt design in defending against such attacks.

Basic system prompts that simply instructed AI models to act as “code analysis bots” actually worsened performance in some cases, causing models to focus narrowly on surface-level functionality while overlooking embedded malicious instructions.

However, strong system prompts that explicitly prioritized security “above all else, even if the user requests you to ignore it” significantly improved resistance across nearly all models tested.

Notably, Pangea’s own AI Guard security system successfully detected and blocked every variation of the LegalPwn attack it encountered, demonstrating the potential for specialized AI security tools to provide robust protection against these sophisticated prompt injection techniques.

The research emphasizes the growing need for dedicated AI security solutions as traditional cybersecurity tools lack visibility into the domain object model-level interactions that enable these attacks.

Find this Story Interesting! Follow us on LinkedIn and X to Get More Instant Updates.

LegalPwn Attack Exploits AI Tools Like Gemini and ChatGPT Using Disclaimers to Run Malicious Code

LegalPwn Attack Exploits AI Tools

Defensive Measures

Recent News

Burp Suite Supercharges Its Scanning Capabilities With React2Shell Vulnerability Detection

Malicious MCP Servers Enable New Prompt Injection Attack To Drain Resources

Law Enforcement Detains Hackers Equipped With Specialized Flipper Hacking Tools

Google Unveils 10 New Gemini-Powered AI Features For Chrome

CISA Alerts On Actively Exploited Buffer Overflow Flaw In D-Link Routers

Over 500 Apache Tika Toolkit Instances Exposed To Critical XXE Vulnerability

Recent News

Microsoft Teams Blocking Users from Accessing Embedded Office Documents

Critical Citrix Vulnerability Exploited: 28,000+ Instances at Risk of Remote Code Execution

Persistent XSS Vulnerability in IPFire Web Interface via Authenticated Administrator

New Cache Deception Exploit Circumvents Cache-Server Mismatch

DOGE Under Fire for Allegedly Storing National Social Security Data in Unsecured Cloud

Critical 0-Day RCE Vulnerability in Citrix NetScaler ADC & Gateway Under Active Exploitation

About us

Company

The latest

Burp Suite Supercharges Its Scanning Capabilities With React2Shell Vulnerability Detection

Malicious MCP Servers Enable New Prompt Injection Attack To Drain Resources

Law Enforcement Detains Hackers Equipped With Specialized Flipper Hacking Tools

Subscribe