feat(deps): update dependency vllm-charts ( v0.14.1 → v0.15.1 ) by chaplain-grimaldus[bot] · Pull Request #1663 · crutonjohn/gitops

chaplain-grimaldus · 2026-01-29T07:16:47Z

This PR contains the following updates:

Package	Update	Change
vllm-charts	minor	`v0.14.1` → `v0.15.1`

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.

Release Notes

vllm-project/vllm (vllm-charts)

`v0.15.1`

Compare Source

v0.15.1 is a patch release with security fixes, RTX Blackwell GPU fixes support, and bug fixes.

Security

CVE-2025-69223: Updated aiohttp dependency (#33621)
CVE-2026-0994: Updated Protobuf dependency (#33619)

Highlights

Bugfix Hardware Support

RTX Blackwell (SM120): Fixed NVFP4 MoE kernel support for RTX Blackwell workstation GPUs. Previously, NVFP4 MoE models would fail to load on these GPUs (#33417)
FP8 kernel selection: Fixed FP8 CUTLASS group GEMM to properly fall back to Triton kernels on SM120 GPUs (#33285)

Model Support

Step-3.5-Flash: New model support (#33523)

Bugfix Model Support

Qwen3-VL-Reranker: Fixed model loading (#33298)
Whisper: Fixed FlashAttention2 with full CUDA graphs (#33360)

Performance

torch.compile cold-start: Fixed regression that increased cold-start compilation time (Llama3-70B: ~88s → ~22s) (#33441)
MoE forward pass: Optimized by caching layer name computation (#33184)

Bug Fixes

Fixed prefix cache hit rate of 0% with GPT-OSS style hybrid attention models (#33524)
Enabled Triton MoE backend for FP8 per-tensor dynamic quantization (#33300)
Disabled unsupported Renormalize routing methods for TRTLLM per-tensor FP8 MoE (#33620)
Fixed speculative decoding metrics crash when no tokens generated (#33729)
Disabled fast MoE cold start optimization with speculative decoding (#33624)
Fixed ROCm skinny GEMM dispatch logic (#33366)

Dependencies

Pinned LMCache >= v0.3.9 for API compatibility (#33440)

New Contributors 🎉

@zaristei2 made their first contribution in #33621

Full Changelog: vllm-project/vllm@v0.15.0...v0.15.1

`v0.15.0`

Compare Source

Highlights

This release features 335 commits from 158 contributors (39 new)!

Model Support

New architectures: Kimi-K2.5 (#33131), Molmo2 (#30997), Step3vl 10B (#32329), Step1 (#32511), GLM-Lite (#31386), Eagle2.5-8B VLM (#32456).
LoRA expansion: Nemotron-H (#30802), InternVL2 (#32397), MiniMax M2 (#32763).
Speculative decoding: EAGLE3 for Pixtral/LlavaForConditionalGeneration (#32542), Qwen3 VL MoE (#32048), draft model support (#24322).
Embeddings: BGE-M3 sparse embeddings and ColBERT embeddings (#14526).
Model enhancements: Voxtral streaming architecture (#32861), SharedFusedMoE for Qwen3MoE (#32082), dynamic resolution for Nemotron Nano VL (#32121), Molmo2 vision backbone quantization (#32385).

Engine Core

Async scheduling + Pipeline Parallelism: --async-scheduling now works with pipeline parallelism (#32359).
Mamba prefix caching: Block-aligned prefix caching for Mamba/hybrid models with --enable-prefix-caching --mamba-cache-mode align. Achieves ~2x speedup by caching Mamba states directly (#30877).
Session-based streaming input: New incremental input support for interactive workloads like ASR. Accepts async generators producing StreamingInput objects while maintaining KV cache alignment (#28973).
Model Runner V2: VLM support (#32546), architecture improvements.
LoRA: Inplace loading for memory efficiency (#31326).
AOT compilation: torch.compile inductor artifacts support (#25205).
Performance: KV cache offloading redundant load prevention (#29087), FlashAttn attention/cache update separation (#25954).

Hardware & Performance

NVIDIA

Blackwell defaults: FlashInfer MLA is now the default MLA backend on Blackwell, with TRTLLM as default prefill (#32615).
MoE performance: 1.2-2% E2E throughput improvement via grouped topk kernel fusion (#32058), NVFP4 small-batch decoding improvement (#30885), faster cold start for MoEs with torch.compile (#32805).
FP4 kernel optimization: Up to 65% faster FP4 quantization on Blackwell (SM100F) using 256-bit loads, ~4% E2E throughput improvement (#32520).
Kernel improvements: topk_sigmoid kernel for MoE routing (#31246), atomics reduce counting for SplitK skinny GEMMs (#29843), fused cat+quant for FP8 KV cache in MLA (#32950).
torch.compile: SiluAndMul and QuantFP8 CustomOp compilation (#32806), Triton prefill attention performance (#32403).

AMD ROCm

MoRI EP: High-performance all2all backend for Expert Parallel (#28664).
Attention improvements: Shuffle KV cache layout and assembly paged attention kernel for AiterFlashAttentionBackend (#29887).
FP4 support: MLA projection GEMMs with dynamic quantization (#32238).
Consumer GPU support: Flash Attention Triton backend on RDNA3/RDNA4 (#32944).

Other Platforms

TPU: Pipeline parallelism support (#28506), backend option (#32438).
Intel XPU: AgRsAll2AllManager for distributed communication (#32654).
CPU: NUMA-aware acceleration for TP/DP inference on ARM (#32792), PyTorch 2.10 (#32869).
Whisper: torch.compile support (#30385).
WSL: Platform compatibility fix for Windows Subsystem for Linux (#32749).

Quantization

MXFP4: W4A16 support for compressed-tensors MoE models (#32285).
Non-gated MoE: Quantization support with Marlin, NVFP4 CUTLASS, FP8, INT8, and compressed-tensors (#32257).
Intel: Quantization Toolkit integration (#31716).
FP8 KV cache: Per-tensor and per-attention-head quantization via llmcompressor (#30141).

API & Frontend

Responses API: Partial message generation (#32100), include_stop_str_in_output tuning (#32383), prompt_cache_key support (#32824).
OpenAI API: skip_special_tokens configuration (#32345).
Score endpoint: Flexible input formats with data_1/data_2 and queries/documents (#32577).
Render endpoints: New endpoints for prompt preprocessing (#32473).
Whisper API: avg_logprob and compression_ratio in verbose_json segments (#31059).
Security: FIPS 140-3 compliant hash option for enterprise/government users (#32386), --ssl-ciphers CLI argument (#30937).
UX improvements: Auto api_server_count based on dp_size (#32525), wheel variant auto-detection during install (#32948), custom profiler URI schemes (#32393).

Dependencies

FlashInfer v0.6.1 (#30993)
Transformers 4.57.5 (#32287)
PyTorch 2.10 for CPU backend (#32869)
DeepGEMM newer version (#32479)

Breaking Changes & Deprecations

Metrics: Removed deprecated vllm:time_per_output_token_seconds metric - use vllm:inter_token_latency_seconds instead (#32661).
Environment variables: Removed deprecated environment variables (#32812).
Quantization: DeepSpeedFp8 removed (#32679), RTN removed (#32697), HQQ deprecated (#32681).

Bug Fixes

Speculative decoding: Eagle draft_model_config fix (#31753).
DeepSeek: DeepSeek-V3.1 + DeepGEMM incompatible scale shapes fix (#32361).
Distributed: DP+MoE inference fix via CpuCommunicator (#31867), P/D with non-MoE DP fix (#33037).
EPLB: Possible deadlock fix (#32418).
NIXL: UCX memory leak fix by exporting UCX_MEM_MMAP_HOOK_MODE=none (#32181).
Structured output: Outlines byte fallback handling fix (#31391).

New Contributors 🎉

@YunzhuLu made their first contribution in #32126
@emricksini-h made their first contribution in #30784
@dsfaccini made their first contribution in #32289
@ofirzaf made their first contribution in #32312
@seekskyworld made their first contribution in #32321
@brian033 made their first contribution in #31715
@TomerBN-Nvidia made their first contribution in #32257
@vanshilshah97 made their first contribution in #32448
@George-Polya made their first contribution in #32385
@T1mn made their first contribution in #32411
@mritunjaysharma394 made their first contribution in #31492
@randzero made their first contribution in #32511
@DemingCheng made their first contribution in #32556
@iboiko-habana made their first contribution in #32471
@honglyua-il made their first contribution in #32462
@hyeongyun0916 made their first contribution in #32473
@DanielMe made their first contribution in #32560
@netanel-haber made their first contribution in #32121
@longregen made their first contribution in #28784
@jasonyanwenl made their first contribution in #32749
@Wauplin made their first contribution in #32788
@ikaadil made their first contribution in #32775
@alexsun07 made their first contribution in #28664
@liranschour made their first contribution in #30207
@AuYang261 made their first contribution in #32844
@diviramon made their first contribution in #32393
@RishabhSaini made their first contribution in #32884
@MatteoFari made their first contribution in #32397
@peakcrosser7 made their first contribution in #30877
@orionr made their first contribution in #30443
@marksverdhei made their first contribution in #32614
@joninco made their first contribution in #32935
@monajafi-amd made their first contribution in #32944
@ruizcrp made their first contribution in #32988
@sjhddh made their first contribution in #32983
@HirokenOvo made their first contribution in #32646
@Chenhao-Guan made their first contribution in #32763
@joshuadeng made their first contribution in #28973
@ZhanqiuHu made their first contribution in #33016

Full Changelog: vllm-project/vllm@v0.14.1...v0.15.0

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR has been generated by Renovate Bot.

chaplain-grimaldus bot added the type/minor label Jan 29, 2026

feat(deps): update dependency vllm-charts ( v0.14.1 → v0.15.1 )

67997aa

chaplain-grimaldus bot force-pushed the renovate/vllm-charts-0.x branch from 0708e9a to 67997aa Compare February 4, 2026 19:17

chaplain-grimaldus bot changed the title ~~feat(deps): update dependency vllm-charts ( v0.14.1 → v0.15.0 )~~ feat(deps): update dependency vllm-charts ( v0.14.1 → v0.15.1 ) Feb 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(deps): update dependency vllm-charts ( v0.14.1 → v0.15.1 )#1663

feat(deps): update dependency vllm-charts ( v0.14.1 → v0.15.1 )#1663
chaplain-grimaldus[bot] wants to merge 1 commit intomainfrom
renovate/vllm-charts-0.x

chaplain-grimaldus bot commented Jan 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

chaplain-grimaldus bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

v0.15.1

Security

Highlights

Bugfix Hardware Support

Model Support

Bugfix Model Support

Performance

Bug Fixes

Dependencies

New Contributors 🎉

v0.15.0

Highlights

Model Support

Engine Core

Hardware & Performance

NVIDIA

AMD ROCm

Other Platforms

Quantization

API & Frontend

Dependencies

Breaking Changes & Deprecations

Bug Fixes

New Contributors 🎉

Configuration

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

chaplain-grimaldus bot commented Jan 29, 2026 •

edited

Loading

`v0.15.1`

`v0.15.0`