Remote Code Execution Enabled By Malicious Payloads Through vLLM Vulnerability

A serious flaw in the popular vLLM library could let attackers crash servers or even run malicious code remotely. Security researcher Russellb disclosed the issue last week via GitHub Advisory.

Rated “High” severity, it affects vLLM versions 0.10.2 and later, with no patch available yet. Developers using vLLMs for large-language-model inference face risks, especially in cloud environments handling untrusted inputs.

vLLM, a Python package for efficient LLM serving (pip), processes user-supplied “prompt embeddings” in its Completions API endpoint.

Attackers can send maliciously serialized PyTorch tensors data structures that exploit weak deserialization checks.

This leads to denial-of-service (DoS) crashes or potential remote code execution (RCE).

How The Vulnerability Works

The problem hides in vLLM’s renderer.py file, line 148, in the _load_and_validate_embed function. Here’s the vulnerable code:

def _load_and_validate_embed(embed: bytes) -> EngineEmbedsPrompt:
    tensor = torch.load(
        io.BytesIO(pybase64.b64decode(embed, validate=True)),
        weights_only=True,
        map_location=torch.device("cpu"),
    )
    assert isinstance(tensor, torch.Tensor) and tensor.dtype in (
        torch.float32,
        torch.bfloat16,
        torch.float16,
    )
    tensor = tensor.to_dense()

vLLM uses PyTorch’s torch.load() to unpack base64-encoded embeddings without full validation.

A key change in PyTorch 2.8.0 disabled default integrity checks for sparse tensors. Attackers craft invalid sparse tensors that slip past bounds checks.

When vLLM calls tensor.to_dense(), it triggers an out-of-bounds memory write. This corrupts server memory, causing instant crashes (DoS).

In worst-case scenarios, skilled attackers chain this with other flaws to achieve RCE, allowing them to steal data or install malware on the host machine.

The advisory notes explicit checks via torch.sparse.check_sparse_tensor_invariants are needed but missing.

Impact and Urgent Fixes

This hits AI inference servers hard, as vLLM serves embeddings from clients without strict input sanitization. No CVSS score yet, but “High” rating signals real threats for production deployments.

Users should:

Avoid processing untrusted prompt embeddings immediately.
Pin vLLM to versions before 0.10.2.
Wrap torch.load() with sparse tensor checks: use the torch. Sparse.check_sparse_tensor_invariants context manager.
Monitor vLLM GitHub for patches.

The vLLM team has not commented publicly. AI devs must audit similar deserialization paths in PyTorch-based tools. Until fixed, treat external embeddings as hostile.

Varshini

Varshini is a Cyber Security expert in Threat Analysis, Vulnerability Assessment, and Research. Passionate about staying ahead of emerging Threats and Technologies..