Vulnerability

Major RCE Vulnerabilities In AI Inference Engines Put Meta, Nvidia, and Microsoft Frameworks At Risk

Critical remote code execution (RCE) flaws have surfaced in popular AI inference servers, exposing systems from giants like Meta, NVIDIA, and Microsoft to severe attacks.

These vulnerabilities stem from unsafe use of ZeroMQ (ZMQ) for communication and Python’s pickle module for deserialization, allowing attackers to run malicious code remotely.

Discovered by Oligo Security’s team, the issues highlight how code copying in the fast-paced AI world spreads risks across projects.

As AI adoption surges, securing these back-end engines is crucial to prevent data breaches or cryptomining on GPU clusters.​

The problems affect frameworks handling AI model inference processing user queries on powerful servers.

Over the past year, Oligo has uncovered identical bugs in Meta’s Llama Stack, NVIDIA’s TensorRT-LLM, Microsoft’s Sarathi-Serve, and open-source tools such as vLLM and SGLang.

At the core is ZMQ’s recv_pyobj() function, which receives network data and deserializes it using pickle.loads().

RCE Vulnerabilities Expose AI Frameworks

Pickle is convenient for Python objects but dangerous: it can execute arbitrary code embedded in the data, like spawning processes or stealing secrets.

When used over unauthenticated TCP sockets, it turns a simple message into an RCE gateway.

The Discovery: Unsafe Deserialization In ZMQ

Oligo’s investigation began in 2024 with Meta’s Llama Stack. Researchers spotted unauthenticated ZMQ sockets pulling in untrusted data and feeding it straight to pickle.

The code snippet looked like this:

def recv_pyobj(self, flags: int = 0) -> Any:
    msg = self.recv(flags)
    return self._deserialize(msg, pickle.loads)

This setup invites attacks: an attacker sends a pickled payload over the network, and the server blindly executes it.

Oligo reported it as CVE-2024-50050, prompting Meta to switch to secure JSON serialization by October 2024.​

Digging further revealed the flaw wasn’t unique. Scanning other tools showed near-identical code in NVIDIA’s TensorRT-LLM, vLLM, SGLang, and Modular’s Max Server.

The Spread, Impact, and Responses

Code reuse amplified the danger. SGLang, used by xAI, AMD, Google Cloud, AWS, and universities like Stanford, copied vLLM’s flaws directly.

Unpatched, a single vulnerable node in a GPU cluster could let attackers escalate privileges, exfiltrate model data, or install miners like those in the ShadowRay campaign.

Disclosures varied, with most vendors patching quickly. Microsoft’s Sarathi-Serve and SGLang remain vulnerable, despite alerts, and lack complete fixes.

CVE IDAffected FrameworkDisclosure DateSeverityCVSS ScoreDescriptionFixed VersionStatus
CVE-2024-50050Meta Llama StackOctober 2024Critical9.8Unsafe pickle deserialization over unauthenticated ZMQ sockets enabling RCE.≥ v0.0.41Patched
CVE-2025-30165vLLMMay 2025Critical8.0Pickle deserialization in multi-node V0 engine allowing RCE via ZMQ.≥ v0.8.0 (V1 default)Mitigated
CVE-2025-23254NVIDIA TensorRT-LLMMay 2025Critical9.3Data validation issue in Python executor with ZMQ pickle use for RCE.≥ v0.18.2Patched
CVE-2025-60455Modular Max ServerJune 2025Critical9.8Inherited unsafe ZMQ pickle deserialization from vLLM/SGLang enabling RCE.≥ v25.6Patched
Varshini

Varshini is a Cyber Security expert in Threat Analysis, Vulnerability Assessment, and Research. Passionate about staying ahead of emerging Threats and Technologies..

Recent Posts

Burp Suite Supercharges Its Scanning Capabilities With React2Shell Vulnerability Detection

PortSwigger has leveled up Burp Suite's scanning arsenal with the latest Active Scan++ extension, version…

4 months ago

Malicious MCP Servers Enable New Prompt Injection Attack To Drain Resources

Unit 42 researchers at Palo Alto Networks exposed serious flaws in the Model Context Protocol…

4 months ago

Law Enforcement Detains Hackers Equipped With Specialized Flipper Hacking Tools

Polish police have arrested three Ukrainian men traveling through Europe and seized a cache of…

4 months ago

Google Unveils 10 New Gemini-Powered AI Features For Chrome

Google has launched its most significant Chrome update ever, embedding Gemini AI across the browser…

4 months ago

CISA Alerts On Actively Exploited Buffer Overflow Flaw In D-Link Routers

Attackers exploit this vulnerability through the router's web interface components, specifically "cgibin" and "hnap_main," by…

4 months ago

Over 500 Apache Tika Toolkit Instances Exposed To Critical XXE Vulnerability

Security researchers have uncovered a severe flaw in Apache Tika, a popular open-source toolkit for…

4 months ago