-
Notifications
You must be signed in to change notification settings - Fork 624
docs(readme): update convergence table, latest news, and outdated links #2638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Greptile OverviewGreptile SummaryUpdated README documentation to reflect current Transformer Engine capabilities and fix outdated information. Added MXFP8 and NVFP4 format support documentation for Blackwell GPUs, updated the FP8 convergence table with new MXFP8 results from arxiv paper, updated Docker container versions from 25.08 to 26.01, fixed broken/outdated integration links (DeepSpeed and Lightning), added Nemotron 3 paper to Latest News, and corrected notebook references to use the existing Key changes:
Minor issue:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Dev as Developer
participant README as README.rst
participant Users as Documentation Users
Dev->>README: Add MXFP8/NVFP4 format support info
Dev->>README: Update FP8 convergence table with MXFP8 results
Dev->>README: Remove outdated JAX Toolbox links
Dev->>README: Update Docker container versions (25.08 → 26.01)
Dev->>README: Fix integration links (DeepSpeed, Lightning)
Dev->>README: Add Nemotron 3 paper to Latest News
Dev->>README: Fix notebook references (quickstart → fp8_primer)
README->>Users: Provide updated documentation
Users->>README: Access current format support info
Users->>README: View latest convergence results
Users->>README: Use correct Docker versions
Users->>README: Follow working integration links
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
README.rst
Outdated
| loss = out.sum() | ||
| loss.backward() | ||
|
|
||
| For a tutorial with more details, see the `Quickstart Notebook <https://github.com/NVIDIA/TransformerEngine/blob/main/docs/examples/quickstart.ipynb>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The referenced quickstart.ipynb file does not exist in docs/examples/. The actual notebooks in that directory are fp8_primer.ipynb, advanced_optimizations.ipynb, and te_jax_integration.ipynb. Consider using one of these existing notebooks or creating the quickstart notebook before merging.
…t info - Add MXFP8 and NVFP4 format support to highlights and description - Update FP8 convergence table with MXFP8 results from arxiv paper - Remove outdated JAX-Toolbox links and "available on request" entries - Update Docker container versions to 26.01 - Fix DeepSpeed and Lightning integration links - Add Nemotron 3 paper to Latest News - Add quickstart notebook link after PyTorch example Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>
3f01d10 to
98726c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
README.rst
Outdated
| | LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 | | ||
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | ||
| | MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 | | ||
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | ||
| | GPT-22B | NeMo Framework | Available on request | | ||
| | MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra whitespace in "Megatron Core" framework names - inconsistent with other rows
| | LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 | | |
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | |
| | MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 | | |
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | |
| | GPT-22B | NeMo Framework | Available on request | | |
| | MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 | | |
| | LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 | | |
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | |
| | MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 | | |
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | |
| | MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 | |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
- Replace quickstart.ipynb link with fp8_primer.ipynb (file exists) - Fix extra whitespace in Megatron Core table rows Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 2 comments
| | LLama2-7B | Alibaba Pai | https://mp.weixin.qq.com/s/NQT0uKXLbXyh5031zBdeBQ | | ||
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | ||
| | T5-11B | JAX/T5x | Available on request | | ||
| | LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space before "Megatron Core" - should have 2 spaces like other rows, not 3
| | LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 | | |
| | LLM-8B | Megatron Core | https://arxiv.org/abs/2506.08027 | |
| | MPT-13B | Mosaic Composer | https://www.databricks.com/blog/turbocharged-training-optimizing-databricks-mosaic-ai-stack-fp8 | | ||
| +------------+------------------+---------------------------------------------------------------------------------------------------------+ | ||
| | GPT-22B | NeMo Framework | Available on request | | ||
| | MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space before "Megatron Core" - should have 2 spaces like other rows, not 3
| | MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 | | |
| | MoE-16B | Megatron Core | https://arxiv.org/abs/2506.08027 | |
Description
Updates README add missing format support documentation, update news section, and fix broken/outdated links.
Type of change
Changes
Checklist: