Critical vLLM Flaw

While the world worries about "jailbreaking" LLMs or preventing them from hallucinating, a critical new vulnerability has just reminded us of a fundamental truth: AI is just software, and software has bugs.

A newly discovered critical flaw (CVE-2025-62164) in vLLM, one of the most popular libraries for serving large language models, allows attackers to achieve Remote Code Execution (RCE) or crash servers simply by sending a malicious API request.

This isn't a failure of the AI model. It’s a failure of the API infrastructure serving it. And it is exactly the kind of threat Salt Security is built to stop.

The Flaw: Unsafe Deserialization in the API

The vulnerability lies in how vLLM handles prompt embeddings, the complex mathematical data sent to its Completions API.

Think of these embeddings like digital packages that the AI server needs to open and use. To send them efficiently, they are packed up (serialized) into a specific format. When vLLM receives them, it unpacks (deserializes) them to process the request.

Here is the problem: vLLM was unpacking these data packages without first verifying that they were safe. It blindly trusted the contents. An attacker can craft a malicious "trick" package that appears to be valid AI data on the outside. But when the vLLM server tries to unpack it, the malicious data confuses the system, causing it to overwrite its own memory. This can crash the server or, worse, allow the attacker to run their own commands.

The Impact:

Denial of Service (DoS): Crashing the production inference server.
Remote Code Execution (RCE): Potentially executing arbitrary code on the server, giving the attacker a foothold inside your AI infrastructure.

Why Traditional Tools Miss This

This vulnerability highlights a massive blind spot in standard security approaches:

Static Analysis (SAST) often misses it: The vulnerability wasn't just in vLLM's code; it resulted from an upstream change in PyTorch (disabling default checks) that vLLM failed to account for. Code scanners often struggle with these complex dependency interactions.
Standard WAFs are blind: To a traditional WAF, the attack looks like a standard API request with a blob of Base64 data. Without understanding the context of the AI API or the structure of a valid tensor payload, the WAF lets it through.

How Salt Security Protects Your AI Runtime

At Salt, we have long argued that securing AI requires securing the API traffic that powers it. This vLLM flaw is a textbook example of an "Infrastructure Risk" that can only be reliably stopped at the network/runtime layer.

Here is how the Salt Security API Protection Platform helps mitigate threats like CVE-2025-62164:

1. Complete Visibility

You can't secure what you don't know exists. In the rush to build agentic AI, data science teams often spin up experimental instances that IT security never sees. Salt shines a light on these "Shadow AI" projects, automatically discovering the specific APIs and tools they rely on, including the vLLM Completions API, so you can secure them before they become an entry point for attackers.

2. Behavioral Anomaly Detection

Salt learns what "normal" traffic looks like for your AI applications. Just as a credit card company spots unusual spending, Salt knows what a valid request for your AI model should look like.

Spotting the Fake: A malicious "trick" package usually looks very different from a legitimate one; it might be the wrong size, have a strange structure, or contain data patterns that don't belong.
Blocking the Threat: When Salt detects a deviation from the normal baseline, we can help block the request at the edge, stopping the attack before it reaches your vulnerable AI infrastructure.

3. Stopping Lateral Movement

If an attacker does manage to exploit a vulnerability like this, their next move is "East-West" lateral movement, reaching out to internal databases or other services. Salt monitors internal API traffic and alerts you immediately if an inference server begins making unauthorized calls to sensitive internal systems.

The Takeaway: Secure the Pipe, Not Just the Model

The vLLM vulnerability serves as a wake-up call. We cannot focus solely on "AI Safety" (alignment, bias, hallucinations) and ignore "AI Security" (infrastructure, APIs, RCEs).

Your AI models are valuable assets running on vulnerable infrastructure. To protect them, you need a security platform that understands the APIs they rely on.

If you want to learn more about Salt and how we can help you, please contact us, schedule a demo, or visit our website. You can also get a free API Attack Surface Assessment from Salt Security's research team and learn what attackers already know.

Critical vLLM Flaw Exposes the Soft Underbelly of AI Infrastructure

The Flaw: Unsafe Deserialization in the API

Why Traditional Tools Miss This

How Salt Security Protects Your AI Runtime

1. Complete Visibility

2. Behavioral Anomaly Detection

3. Stopping Lateral Movement

The Takeaway: Secure the Pipe, Not Just the Model

Categories

Our latest posts

Critical vLLM Flaw Exposes the Soft Underbelly of AI Infrastructure

The Flaw: Unsafe Deserialization in the API

Why Traditional Tools Miss This

How Salt Security Protects Your AI Runtime

1. Complete Visibility

2. Behavioral Anomaly Detection

3. Stopping Lateral Movement

The Takeaway: Secure the Pipe, Not Just the Model

Tags

Categories

Salt Security Blog

Our latest posts