[PyTorch] Pad V when Q/V head dims differ (MLA) for THD #2629

HollowMan6 · 2026-01-27T23:31:21Z

Description

For MLA, we shall pad V when Q/V head dims differ for THD

Similar to NVIDIA/Megatron-LM#3003

Fixes NVIDIA/Megatron-LM#1698

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

pad V when Q/V head dims differ for THD

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

greptile-apps · 2026-01-27T23:34:47Z

Greptile Overview

Greptile Summary

Adds support for Multi-Latent Attention (MLA) in THD format when Q and V have different head dimensions. When V's head dimension is smaller than Q's, the implementation pads V to match Q's dimension before attention computation, then trims the output back to the original V dimension.

Key changes:

Detects THD format with mismatched Q/V head dimensions and pads V tensor
Implements _trim_thd_output() helper to remove padding from attention outputs
Adds comprehensive test with numerical validation against reference implementation
Handles regular tensors, Float8Tensors, tuples, and lists in trimming logic

Implementation details:

Padding only occurs when head_dim_v < head_dim_qk (V smaller than Q)
Float8Tensors are excluded from padding logic
All four attention backends (flash, fused, unfused, checkpointed) properly trim outputs

Confidence Score: 4/5

Safe to merge - implements well-tested padding/trimming logic for MLA with minimal risk
Implementation follows established Megatron-LM pattern, includes numerical correctness validation, and properly handles all attention backends. Score of 4 (not 5) because the case where head_dim_v > head_dim_qk is not explicitly handled or documented as unsupported
No files require special attention - both files have clean, focused changes

Important Files Changed

Filename	Overview
transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py	Adds padding/trimming logic for MLA with mismatched Q/V head dimensions in THD format. Pads V to match Q dimension before attention, then trims output back to original V dimension.
tests/pytorch/attention/test_attention.py	Adds test for THD attention with Q/V head dimension mismatch (128 vs 64). Validates output shape, gradients, and numerical correctness against reference implementation.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant DotProductAttention
    participant AttentionBackend as Attention Backend<br/>(Flash/Fused/Unfused)
    
    Caller->>DotProductAttention: forward(Q, K, V)<br/>head_dim_qk=128, head_dim_v=64
    
    Note over DotProductAttention: Check THD format &<br/>head dim mismatch
    
    alt head_dim_v < head_dim_qk
        DotProductAttention->>DotProductAttention: Save orig_v_dim = 64
        DotProductAttention->>DotProductAttention: Pad V: 64 → 128<br/>Set pad_v_for_thd = True
        DotProductAttention->>DotProductAttention: Update head_dim_v = 128
    end
    
    DotProductAttention->>AttentionBackend: attention(Q, K, V_padded)
    AttentionBackend-->>DotProductAttention: attn_out (head_dim=128)
    
    alt pad_v_for_thd == True
        DotProductAttention->>DotProductAttention: _trim_thd_output()
        Note over DotProductAttention: Reshape using head_dim_v (128)<br/>Trim to orig_v_dim (64)
        DotProductAttention->>DotProductAttention: attn_out[..., :64]
    end
    
    DotProductAttention-->>Caller: Return trimmed output<br/>(head_dim=64)

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py

tests/pytorch/attention/test_attention.py

Copilot

Pull request overview

This PR adds support for Multi-head Latent Attention (MLA) with mismatched Q/V head dimensions in the THD (Total-Hidden-Dimension) format. When the value tensor has a smaller head dimension than the query/key tensors, the code pads the value tensor to match the Q/K head dimension, runs the attention operation, and then trims the output back to the original V dimension.

Changes:

Added padding logic for V tensor when head dimensions differ in THD format
Implemented trimming function to restore correct output dimensions after attention
Added test case for THD attention with mismatched Q/V head dimensions

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py	Implements padding of V tensor before attention and trimming of output after attention for THD format with mismatched Q/V head dimensions
tests/pytorch/attention/test_attention.py	Adds test case to verify THD attention works with different Q/V head dimensions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py

tests/pytorch/attention/test_attention.py

Signed-off-by: Hollow Man <hollowman@opensuse.org>

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py

Copilot AI review requested due to automatic review settings January 27, 2026 23:31

Copilot started reviewing on behalf of HollowMan6 January 27, 2026 23:31 View session

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py Show resolved Hide resolved

tests/pytorch/attention/test_attention.py Show resolved Hide resolved

Copilot AI reviewed Jan 27, 2026

View reviewed changes

transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py Show resolved Hide resolved

tests/pytorch/attention/test_attention.py Show resolved Hide resolved

[PyTorch] Pad V when Q/V head dims differ (MLA) for THD

f9d1f5c

Signed-off-by: Hollow Man <hollowman@opensuse.org>

HollowMan6 force-pushed the mla_thd branch from d8b40c5 to f9d1f5c Compare January 27, 2026 23:49

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

transformer_engine/pytorch/attention/dot_product_attention/dot_product_attention.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Pad V when Q/V head dims differ (MLA) for THD #2629

[PyTorch] Pad V when Q/V head dims differ (MLA) for THD #2629

HollowMan6 commented Jan 27, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[PyTorch] Pad V when Q/V head dims differ (MLA) for THD #2629

Are you sure you want to change the base?

[PyTorch] Pad V when Q/V head dims differ (MLA) for THD #2629

Conversation

HollowMan6 commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HollowMan6 commented Jan 27, 2026 •

edited

Loading

greptile-apps bot commented Jan 27, 2026 •

edited

Loading