Vulnerabilities

Remote Code Execution Enabled By Malicious Payloads Through vLLM Vulnerability

A serious flaw in the popular vLLM library could let attackers crash servers or even run malicious code remotely. Security researcher Russellb disclosed the issue last week via GitHub Advisory.

Rated “High” severity, it affects vLLM versions 0.10.2 and later, with no patch available yet. Developers using vLLMs for large-language-model inference face risks, especially in cloud environments handling untrusted inputs.

vLLM, a Python package for efficient LLM serving (pip), processes user-supplied “prompt embeddings” in its Completions API endpoint.

Attackers can send maliciously serialized PyTorch tensors data structures that exploit weak deserialization checks.

This leads to denial-of-service (DoS) crashes or potential remote code execution (RCE).

How The Vulnerability Works

The problem hides in vLLM’s renderer.py file, line 148, in the _load_and_validate_embed function. Here’s the vulnerable code:

def _load_and_validate_embed(embed: bytes) -> EngineEmbedsPrompt:
    tensor = torch.load(
        io.BytesIO(pybase64.b64decode(embed, validate=True)),
        weights_only=True,
        map_location=torch.device("cpu"),
    )
    assert isinstance(tensor, torch.Tensor) and tensor.dtype in (
        torch.float32,
        torch.bfloat16,
        torch.float16,
    )
    tensor = tensor.to_dense()

vLLM uses PyTorch’s torch.load() to unpack base64-encoded embeddings without full validation.

A key change in PyTorch 2.8.0 disabled default integrity checks for sparse tensors. Attackers craft invalid sparse tensors that slip past bounds checks.

When vLLM calls tensor.to_dense(), it triggers an out-of-bounds memory write. This corrupts server memory, causing instant crashes (DoS).

In worst-case scenarios, skilled attackers chain this with other flaws to achieve RCE, allowing them to steal data or install malware on the host machine.

The advisory notes explicit checks via torch.sparse.check_sparse_tensor_invariants are needed but missing.

Impact and Urgent Fixes

This hits AI inference servers hard, as vLLM serves embeddings from clients without strict input sanitization. No CVSS score yet, but “High” rating signals real threats for production deployments.

Users should:

  • Avoid processing untrusted prompt embeddings immediately.
  • Pin vLLM to versions before 0.10.2.
  • Wrap torch.load() with sparse tensor checks: use the torch. Sparse.check_sparse_tensor_invariants context manager.
  • Monitor vLLM GitHub for patches.

The vLLM team has not commented publicly. AI devs must audit similar deserialization paths in PyTorch-based tools. Until fixed, treat external embeddings as hostile.

Varshini

Varshini is a Cyber Security expert in Threat Analysis, Vulnerability Assessment, and Research. Passionate about staying ahead of emerging Threats and Technologies..

Recent Posts

Burp Suite Supercharges Its Scanning Capabilities With React2Shell Vulnerability Detection

PortSwigger has leveled up Burp Suite's scanning arsenal with the latest Active Scan++ extension, version…

4 months ago

Malicious MCP Servers Enable New Prompt Injection Attack To Drain Resources

Unit 42 researchers at Palo Alto Networks exposed serious flaws in the Model Context Protocol…

4 months ago

Law Enforcement Detains Hackers Equipped With Specialized Flipper Hacking Tools

Polish police have arrested three Ukrainian men traveling through Europe and seized a cache of…

4 months ago

Google Unveils 10 New Gemini-Powered AI Features For Chrome

Google has launched its most significant Chrome update ever, embedding Gemini AI across the browser…

4 months ago

CISA Alerts On Actively Exploited Buffer Overflow Flaw In D-Link Routers

Attackers exploit this vulnerability through the router's web interface components, specifically "cgibin" and "hnap_main," by…

4 months ago

Over 500 Apache Tika Toolkit Instances Exposed To Critical XXE Vulnerability

Security researchers have uncovered a severe flaw in Apache Tika, a popular open-source toolkit for…

4 months ago