The attack, dubbed “LegalPwn,” was revealed in groundbreaking research by AI security firm Pangea and represents a significant evolution in prompt injection techniques that exploit the fundamental compliance programming of large language models.
A sophisticated new cyberattack has been discovered that tricks leading artificial intelligence tools into misclassifying dangerous malware as safe code by embedding malicious instructions within seemingly legitimate legal disclaimers.
The LegalPwn technique successfully bypassed security measures in popular AI development tools including Google’s Gemini CLI, GitHub Copilot, and multiple mainstream language models from OpenAI, Google, Meta, and xAI.
The attack works by leveraging AI models’ inherent programming to respect and comply with official-sounding text that mimics copyright notices, terms of service violations, confidentiality clauses, and license agreements.
By embedding malicious instructions within these trusted text formats, attackers can effectively bypass AI safety protocols and manipulate model behavior without triggering standard security alerts.
In live testing environments, researchers demonstrated how the attack could manipulate Gemini CLI into recommending users execute reverse shell commands that would grant attackers remote access to their systems.

Pangea’s comprehensive testing across twelve major AI models revealed alarming vulnerabilities.
While all models correctly identified malicious code in control tests without legal wrappers, the addition of LegalPwn contexts caused significant security failures.
Google’s Gemini 2.5 Flash, Meta’s Llama 3.3 70B, and xAI’s Grok 3 were among the models that began classifying dangerous reverse shell code as safe when presented within fake legal disclaimers.
LegalPwn Attack Exploits AI Tools
The research extended beyond laboratory conditions to demonstrate real-world vulnerabilities in widely-used developer tools.

GitHub Copilot, used by millions of developers worldwide, failed to detect embedded reverse shell code when it was disguised within copyright violation notices, instead interpreting the malicious file as benign calculator functionality
Perhaps most concerning was the successful exploitation of Google’s Gemini CLI, where the AI tool not only misclassified malicious code but actively recommended that users execute dangerous commands.
This expands the threat surface, as prompt injections may lead LLMs to perform unauthorized operations.

In one demonstration, the system suggested users accept and run a reverse shell command that would establish unauthorized remote access to their computers.
The attack’s effectiveness stems from its exploitation of AI models’ training to recognize and defer to legal and compliance-related text.
The researchers successfully used six different legal contexts including copyright violation notices, compliance mandates, confidentiality notices, terms of service violations, and license agreement restrictions to trick AI systems into ignoring their safety protocols.
Defensive Measures
Testing revealed significant variations in AI model resilience to LegalPwn attacks. Anthropic’s Claude models (both 3.5 Sonnet and Sonnet 4), Microsoft’s Phi 4, and Meta’s Llama Guard 4 demonstrated consistent resistance across all test scenarios, successfully identifying malicious code even when wrapped in legal-looking text.
The research highlighted the critical importance of system prompt design in defending against such attacks.
Basic system prompts that simply instructed AI models to act as “code analysis bots” actually worsened performance in some cases, causing models to focus narrowly on surface-level functionality while overlooking embedded malicious instructions.
However, strong system prompts that explicitly prioritized security “above all else, even if the user requests you to ignore it” significantly improved resistance across nearly all models tested.
Notably, Pangea’s own AI Guard security system successfully detected and blocked every variation of the LegalPwn attack it encountered, demonstrating the potential for specialized AI security tools to provide robust protection against these sophisticated prompt injection techniques.
The research emphasizes the growing need for dedicated AI security solutions as traditional cybersecurity tools lack visibility into the domain object model-level interactions that enable these attacks.
Find this Story Interesting! Follow us on LinkedIn and X to Get More Instant Updates.
.webp?w=356&resize=356,220&ssl=1)




