Remote Code Execution Enabled By Malicious Payloads Through vLLM Vulnerability

A serious flaw in the popular vLLM library could let attackers crash servers or even run malicious code remotely. Security researcher Russellb disclosed the issue last week via GitHub Advisory.

Rated “High” severity, it affects vLLM versions 0.10.2 and later, with no patch available yet. Developers using vLLMs for large-language-model inference face risks, especially in cloud environments handling untrusted inputs.

vLLM, a Python package for efficient LLM serving (pip), processes user-supplied “prompt embeddings” in its Completions API endpoint.

Attackers can send maliciously serialized PyTorch tensors data structures that exploit weak deserialization checks.

This leads to denial-of-service (DoS) crashes or potential remote code execution (RCE).

How The Vulnerability Works

The problem hides in vLLM’s renderer.py file, line 148, in the _load_and_validate_embed function. Here’s the vulnerable code:

def _load_and_validate_embed(embed: bytes) -> EngineEmbedsPrompt:
    tensor = torch.load(
        io.BytesIO(pybase64.b64decode(embed, validate=True)),
        weights_only=True,
        map_location=torch.device("cpu"),
    )
    assert isinstance(tensor, torch.Tensor) and tensor.dtype in (
        torch.float32,
        torch.bfloat16,
        torch.float16,
    )
    tensor = tensor.to_dense()

vLLM uses PyTorch’s torch.load() to unpack base64-encoded embeddings without full validation.

A key change in PyTorch 2.8.0 disabled default integrity checks for sparse tensors. Attackers craft invalid sparse tensors that slip past bounds checks.

When vLLM calls tensor.to_dense(), it triggers an out-of-bounds memory write. This corrupts server memory, causing instant crashes (DoS).

In worst-case scenarios, skilled attackers chain this with other flaws to achieve RCE, allowing them to steal data or install malware on the host machine.

The advisory notes explicit checks via torch.sparse.check_sparse_tensor_invariants are needed but missing.

Impact and Urgent Fixes

This hits AI inference servers hard, as vLLM serves embeddings from clients without strict input sanitization. No CVSS score yet, but “High” rating signals real threats for production deployments.

Users should:

Avoid processing untrusted prompt embeddings immediately.
Pin vLLM to versions before 0.10.2.
Wrap torch.load() with sparse tensor checks: use the torch. Sparse.check_sparse_tensor_invariants context manager.
Monitor vLLM GitHub for patches.

The vLLM team has not commented publicly. AI devs must audit similar deserialization paths in PyTorch-based tools. Until fixed, treat external embeddings as hostile.

Remote Code Execution Enabled By Malicious Payloads Through vLLM Vulnerability

How The Vulnerability Works

Impact and Urgent Fixes

Recent News

Burp Suite Supercharges Its Scanning Capabilities With React2Shell Vulnerability Detection

Malicious MCP Servers Enable New Prompt Injection Attack To Drain Resources

Law Enforcement Detains Hackers Equipped With Specialized Flipper Hacking Tools

Google Unveils 10 New Gemini-Powered AI Features For Chrome

CISA Alerts On Actively Exploited Buffer Overflow Flaw In D-Link Routers

Over 500 Apache Tika Toolkit Instances Exposed To Critical XXE Vulnerability

Recent News

Burp Suite Supercharges Its Scanning Capabilities With React2Shell Vulnerability Detection

Malicious MCP Servers Enable New Prompt Injection Attack To Drain Resources

Law Enforcement Detains Hackers Equipped With Specialized Flipper Hacking Tools

Google Unveils 10 New Gemini-Powered AI Features For Chrome

CISA Alerts On Actively Exploited Buffer Overflow Flaw In D-Link Routers

Over 500 Apache Tika Toolkit Instances Exposed To Critical XXE Vulnerability

About us

Company

The latest

Burp Suite Supercharges Its Scanning Capabilities With React2Shell Vulnerability Detection

Malicious MCP Servers Enable New Prompt Injection Attack To Drain Resources

Law Enforcement Detains Hackers Equipped With Specialized Flipper Hacking Tools

Subscribe