Major RCE Vulnerabilities In AI Inference Engines Put Meta, Nvidia, and Microsoft Frameworks At Risk

Critical remote code execution (RCE) flaws have surfaced in popular AI inference servers, exposing systems from giants like Meta, NVIDIA, and Microsoft to severe attacks.

These vulnerabilities stem from unsafe use of ZeroMQ (ZMQ) for communication and Python’s pickle module for deserialization, allowing attackers to run malicious code remotely.

Discovered by Oligo Security’s team, the issues highlight how code copying in the fast-paced AI world spreads risks across projects.

As AI adoption surges, securing these back-end engines is crucial to prevent data breaches or cryptomining on GPU clusters.

The problems affect frameworks handling AI model inference processing user queries on powerful servers.

Over the past year, Oligo has uncovered identical bugs in Meta’s Llama Stack, NVIDIA’s TensorRT-LLM, Microsoft’s Sarathi-Serve, and open-source tools such as vLLM and SGLang.

At the core is ZMQ’s recv_pyobj() function, which receives network data and deserializes it using pickle.loads().

RCE Vulnerabilities Expose AI Frameworks

Pickle is convenient for Python objects but dangerous: it can execute arbitrary code embedded in the data, like spawning processes or stealing secrets.

When used over unauthenticated TCP sockets, it turns a simple message into an RCE gateway.

The Discovery: Unsafe Deserialization In ZMQ

Oligo’s investigation began in 2024 with Meta’s Llama Stack. Researchers spotted unauthenticated ZMQ sockets pulling in untrusted data and feeding it straight to pickle.

The code snippet looked like this:

def recv_pyobj(self, flags: int = 0) -> Any:
    msg = self.recv(flags)
    return self._deserialize(msg, pickle.loads)

This setup invites attacks: an attacker sends a pickled payload over the network, and the server blindly executes it.

Oligo reported it as CVE-2024-50050, prompting Meta to switch to secure JSON serialization by October 2024.

Digging further revealed the flaw wasn’t unique. Scanning other tools showed near-identical code in NVIDIA’s TensorRT-LLM, vLLM, SGLang, and Modular’s Max Server.

The Spread, Impact, and Responses

Code reuse amplified the danger. SGLang, used by xAI, AMD, Google Cloud, AWS, and universities like Stanford, copied vLLM’s flaws directly.

Unpatched, a single vulnerable node in a GPU cluster could let attackers escalate privileges, exfiltrate model data, or install miners like those in the ShadowRay campaign.

Disclosures varied, with most vendors patching quickly. Microsoft’s Sarathi-Serve and SGLang remain vulnerable, despite alerts, and lack complete fixes.

CVE ID	Affected Framework	Disclosure Date	Severity	CVSS Score	Description	Fixed Version	Status
CVE-2024-50050	Meta Llama Stack	October 2024	Critical	9.8	Unsafe pickle deserialization over unauthenticated ZMQ sockets enabling RCE.	≥ v0.0.41	Patched
CVE-2025-30165	vLLM	May 2025	Critical	8.0	Pickle deserialization in multi-node V0 engine allowing RCE via ZMQ.	≥ v0.8.0 (V1 default)	Mitigated
CVE-2025-23254	NVIDIA TensorRT-LLM	May 2025	Critical	9.3	Data validation issue in Python executor with ZMQ pickle use for RCE.	≥ v0.18.2	Patched
CVE-2025-60455	Modular Max Server	June 2025	Critical	9.8	Inherited unsafe ZMQ pickle deserialization from vLLM/SGLang enabling RCE.	≥ v25.6	Patched

Varshini

Varshini is a Cyber Security expert in Threat Analysis, Vulnerability Assessment, and Research. Passionate about staying ahead of emerging Threats and Technologies..

Next Remote Attackers Can Execute Arbitrary Commands Due To IBM AIX Vulnerability »

Previous « North Korean Hackers Breach 136 U.S. Companies, Generating $2.2 Million In Revenue

Burp Suite Supercharges Its Scanning Capabilities With React2Shell Vulnerability Detection

PortSwigger has leveled up Burp Suite's scanning arsenal with the latest Active Scan++ extension, version…

4 months ago

Vulnerabilities

Malicious MCP Servers Enable New Prompt Injection Attack To Drain Resources

Unit 42 researchers at Palo Alto Networks exposed serious flaws in the Model Context Protocol…

4 months ago

Cybersecurity News

Law Enforcement Detains Hackers Equipped With Specialized Flipper Hacking Tools

Polish police have arrested three Ukrainian men traveling through Europe and seized a cache of…

4 months ago

Google Unveils 10 New Gemini-Powered AI Features For Chrome

Google has launched its most significant Chrome update ever, embedding Gemini AI across the browser…

4 months ago

Vulnerabilities

CISA Alerts On Actively Exploited Buffer Overflow Flaw In D-Link Routers

Attackers exploit this vulnerability through the router's web interface components, specifically "cgibin" and "hnap_main," by…

4 months ago

Vulnerabilities

Over 500 Apache Tika Toolkit Instances Exposed To Critical XXE Vulnerability

Security researchers have uncovered a severe flaw in Apache Tika, a popular open-source toolkit for…

4 months ago

Major RCE Vulnerabilities In AI Inference Engines Put Meta, Nvidia, and Microsoft Frameworks At Risk

The Discovery: Unsafe Deserialization In ZMQ

The Spread, Impact, and Responses

Related Post

Recent Posts

Burp Suite Supercharges Its Scanning Capabilities With React2Shell Vulnerability Detection

Malicious MCP Servers Enable New Prompt Injection Attack To Drain Resources

Law Enforcement Detains Hackers Equipped With Specialized Flipper Hacking Tools

Google Unveils 10 New Gemini-Powered AI Features For Chrome

CISA Alerts On Actively Exploited Buffer Overflow Flaw In D-Link Routers

Over 500 Apache Tika Toolkit Instances Exposed To Critical XXE Vulnerability