A serious flaw in the popular vLLM library could let attackers crash servers or even run malicious code remotely. Security researcher Russellb disclosed the issue last week via GitHub Advisory.
Rated “High” severity, it affects vLLM versions 0.10.2 and later, with no patch available yet. Developers using vLLMs for large-language-model inference face risks, especially in cloud environments handling untrusted inputs.
vLLM, a Python package for efficient LLM serving (pip), processes user-supplied “prompt embeddings” in its Completions API endpoint.
Attackers can send maliciously serialized PyTorch tensors data structures that exploit weak deserialization checks.
This leads to denial-of-service (DoS) crashes or potential remote code execution (RCE).
The problem hides in vLLM’s renderer.py file, line 148, in the _load_and_validate_embed function. Here’s the vulnerable code:
def _load_and_validate_embed(embed: bytes) -> EngineEmbedsPrompt:
tensor = torch.load(
io.BytesIO(pybase64.b64decode(embed, validate=True)),
weights_only=True,
map_location=torch.device("cpu"),
)
assert isinstance(tensor, torch.Tensor) and tensor.dtype in (
torch.float32,
torch.bfloat16,
torch.float16,
)
tensor = tensor.to_dense() vLLM uses PyTorch’s torch.load() to unpack base64-encoded embeddings without full validation.
A key change in PyTorch 2.8.0 disabled default integrity checks for sparse tensors. Attackers craft invalid sparse tensors that slip past bounds checks.
When vLLM calls tensor.to_dense(), it triggers an out-of-bounds memory write. This corrupts server memory, causing instant crashes (DoS).
In worst-case scenarios, skilled attackers chain this with other flaws to achieve RCE, allowing them to steal data or install malware on the host machine.
The advisory notes explicit checks via torch.sparse.check_sparse_tensor_invariants are needed but missing.
This hits AI inference servers hard, as vLLM serves embeddings from clients without strict input sanitization. No CVSS score yet, but “High” rating signals real threats for production deployments.
Users should:
The vLLM team has not commented publicly. AI devs must audit similar deserialization paths in PyTorch-based tools. Until fixed, treat external embeddings as hostile.
PortSwigger has leveled up Burp Suite's scanning arsenal with the latest Active Scan++ extension, version…
Unit 42 researchers at Palo Alto Networks exposed serious flaws in the Model Context Protocol…
Polish police have arrested three Ukrainian men traveling through Europe and seized a cache of…
Google has launched its most significant Chrome update ever, embedding Gemini AI across the browser…
Attackers exploit this vulnerability through the router's web interface components, specifically "cgibin" and "hnap_main," by…
Security researchers have uncovered a severe flaw in Apache Tika, a popular open-source toolkit for…