Major RCE Vulnerabilities In AI Inference Engines Put Meta, Nvidia, and Microsoft Frameworks At Risk

Critical remote code execution (RCE) flaws have surfaced in popular AI inference servers, exposing systems from giants like Meta, NVIDIA, and Microsoft to severe attacks.

These vulnerabilities stem from unsafe use of ZeroMQ (ZMQ) for communication and Python’s pickle module for deserialization, allowing attackers to run malicious code remotely.

Discovered by Oligo Security’s team, the issues highlight how code copying in the fast-paced AI world spreads risks across projects.

As AI adoption surges, securing these back-end engines is crucial to prevent data breaches or cryptomining on GPU clusters.

The problems affect frameworks handling AI model inference processing user queries on powerful servers.

Over the past year, Oligo has uncovered identical bugs in Meta’s Llama Stack, NVIDIA’s TensorRT-LLM, Microsoft’s Sarathi-Serve, and open-source tools such as vLLM and SGLang.

At the core is ZMQ’s recv_pyobj() function, which receives network data and deserializes it using pickle.loads().

RCE Vulnerabilities Expose AI Frameworks

Pickle is convenient for Python objects but dangerous: it can execute arbitrary code embedded in the data, like spawning processes or stealing secrets.

When used over unauthenticated TCP sockets, it turns a simple message into an RCE gateway.

The Discovery: Unsafe Deserialization In ZMQ

Oligo’s investigation began in 2024 with Meta’s Llama Stack. Researchers spotted unauthenticated ZMQ sockets pulling in untrusted data and feeding it straight to pickle.

The code snippet looked like this:

def recv_pyobj(self, flags: int = 0) -> Any:
    msg = self.recv(flags)
    return self._deserialize(msg, pickle.loads)

This setup invites attacks: an attacker sends a pickled payload over the network, and the server blindly executes it.

Oligo reported it as CVE-2024-50050, prompting Meta to switch to secure JSON serialization by October 2024.

Digging further revealed the flaw wasn’t unique. Scanning other tools showed near-identical code in NVIDIA’s TensorRT-LLM, vLLM, SGLang, and Modular’s Max Server.

The Spread, Impact, and Responses

Code reuse amplified the danger. SGLang, used by xAI, AMD, Google Cloud, AWS, and universities like Stanford, copied vLLM’s flaws directly.

Unpatched, a single vulnerable node in a GPU cluster could let attackers escalate privileges, exfiltrate model data, or install miners like those in the ShadowRay campaign.

Disclosures varied, with most vendors patching quickly. Microsoft’s Sarathi-Serve and SGLang remain vulnerable, despite alerts, and lack complete fixes.

CVE ID	Affected Framework	Disclosure Date	Severity	CVSS Score	Description	Fixed Version	Status
CVE-2024-50050	Meta Llama Stack	October 2024	Critical	9.8	Unsafe pickle deserialization over unauthenticated ZMQ sockets enabling RCE.	≥ v0.0.41	Patched
CVE-2025-30165	vLLM	May 2025	Critical	8.0	Pickle deserialization in multi-node V0 engine allowing RCE via ZMQ.	≥ v0.8.0 (V1 default)	Mitigated
CVE-2025-23254	NVIDIA TensorRT-LLM	May 2025	Critical	9.3	Data validation issue in Python executor with ZMQ pickle use for RCE.	≥ v0.18.2	Patched
CVE-2025-60455	Modular Max Server	June 2025	Critical	9.8	Inherited unsafe ZMQ pickle deserialization from vLLM/SGLang enabling RCE.	≥ v25.6	Patched

Major RCE Vulnerabilities In AI Inference Engines Put Meta, Nvidia, and Microsoft Frameworks At Risk

The Discovery: Unsafe Deserialization In ZMQ

The Spread, Impact, and Responses

Recent News

Burp Suite Supercharges Its Scanning Capabilities With React2Shell Vulnerability Detection

Malicious MCP Servers Enable New Prompt Injection Attack To Drain Resources

Law Enforcement Detains Hackers Equipped With Specialized Flipper Hacking Tools

Google Unveils 10 New Gemini-Powered AI Features For Chrome

CISA Alerts On Actively Exploited Buffer Overflow Flaw In D-Link Routers

Over 500 Apache Tika Toolkit Instances Exposed To Critical XXE Vulnerability

Recent News

Burp Suite Supercharges Its Scanning Capabilities With React2Shell Vulnerability Detection

Malicious MCP Servers Enable New Prompt Injection Attack To Drain Resources

Law Enforcement Detains Hackers Equipped With Specialized Flipper Hacking Tools

Google Unveils 10 New Gemini-Powered AI Features For Chrome

CISA Alerts On Actively Exploited Buffer Overflow Flaw In D-Link Routers

Over 500 Apache Tika Toolkit Instances Exposed To Critical XXE Vulnerability

About us

Company

The latest

Burp Suite Supercharges Its Scanning Capabilities With React2Shell Vulnerability Detection

Malicious MCP Servers Enable New Prompt Injection Attack To Drain Resources

Law Enforcement Detains Hackers Equipped With Specialized Flipper Hacking Tools

Subscribe