Skip to content

Conversation

@ko3n1g
Copy link
Collaborator

@ko3n1g ko3n1g commented Feb 2, 2026

Description

Please include a brief summary of the changes, relevant motivation and context.

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g ko3n1g changed the title Ko3n1g/ci/fix ngc build ci(fix): NGC build Feb 2, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 2, 2026

Greptile Overview

Greptile Summary

This PR refactors the NGC PyTorch image build workflow to fix several structural issues. The changes include:

  • Fixed the NGC images matrix structure by creating a proper matrix object with os, release-version, and container-image fields instead of just passing an array of images
  • Added the missing release-version parameter to the NGC build step (line 197)
  • Standardized quote style from single to double quotes throughout the workflow
  • Added permissions: write-all to both build jobs

Critical Issue Remaining:

The build_wheels_for_ngc job still has a blocking issue: it only passes release-version and base-image to the build action, but the action definition marks python-version, cuda-version, cudnn-version, torch-version, cxx11_abi, and aarch as required: true. GitHub Actions will fail validation when these required inputs are missing, even though they're not used when base-image is provided.

The action logic correctly handles the optional use of these parameters (checking if BASE_IMAGE is empty), but the workflow won't pass GitHub's validation phase.

Confidence Score: 2/5

  • This PR will fail during GitHub Actions validation due to missing required action inputs
  • While the PR addresses several previous issues (matrix structure, release-version parameter), it leaves a critical validation error: the build action requires 6 parameters that aren't being passed for NGC builds. The workflow cannot execute successfully until this is resolved.
  • .github/workflows/attach-wheels-to-release.yml requires attention - specifically the NGC build action call needs required parameters added

Important Files Changed

Filename Overview
.github/workflows/attach-wheels-to-release.yml Refactored NGC build matrix structure and fixed release-version parameter, but still missing required action inputs for NGC builds

Sequence Diagram

sequenceDiagram
    participant User
    participant GHA as GitHub Actions
    participant PreFlight as pre-flight job
    participant BuildWheels as build_wheels job
    participant BuildNGC as build_wheels_for_ngc job
    participant Action as build-pytorch-wheel action
    participant Release as GitHub Release

    alt Release Event
        User->>GHA: Publish Release
        GHA->>PreFlight: Trigger with release event
        PreFlight->>PreFlight: Build matrix (release mode)
        PreFlight->>PreFlight: Run check_for_ngc_images.sh
        PreFlight->>PreFlight: Create ngc-images-matrix with os, release-version, container-image
    else Workflow Dispatch
        User->>GHA: Manually trigger with inputs
        GHA->>PreFlight: Trigger with workflow_dispatch
        PreFlight->>PreFlight: Build matrix (inputs mode)
        PreFlight->>PreFlight: Create ngc-images-matrix from inputs
    end

    PreFlight->>BuildWheels: Pass build-wheel-matrix
    PreFlight->>BuildNGC: Pass ngc-images-matrix

    alt NGC Image Not Provided
        BuildWheels->>Action: Call with all params (python-version, cuda-version, etc.)
        Action->>Action: Build custom Docker image from scratch
        Action->>Action: Build wheel in container
        Action->>BuildWheels: Return wheel
        BuildWheels->>Release: Upload wheel as release asset
    end

    alt NGC Image Provided
        BuildNGC->>Action: Call with release-version + base-image only
        Note over BuildNGC,Action: ⚠️ Missing required params:<br/>python-version, cuda-version,<br/>cudnn-version, torch-version,<br/>cxx11_abi, aarch
        Action->>Action: Pull NGC Docker image
        Action->>Action: Build wheel in NGC container
        Action->>BuildNGC: Return wheel
        BuildNGC->>Release: Upload wheel as release asset
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

uses: ./.github/actions/build-pytorch-wheel
id: build-pytorch-wheel
with:
release-version: ${{ inputs.release-version }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In release events, inputs.release-version is undefined. Should use matrix.release-version instead.

Suggested change
release-version: ${{ inputs.release-version }}
release-version: ${{ matrix.release-version }}

id: build-pytorch-wheel
with:
release-version: ${{ inputs.release-version }}
base-image: ${{ matrix.container-image }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing required parameters: python-version, cuda-version, cudnn-version, torch-version, cxx11_abi, and aarch. The action at .github/actions/build-pytorch-wheel/action.yml:12-32 requires these inputs when base-image is not provided. Since NGC images may not have these pre-configured, ensure they're either in the matrix or passed explicitly.

Signed-off-by: oliver könig <okoenig@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

MATRIX=$(echo '{
"os": ["ubuntu-22.04", "ubuntu-22.04-arm"],
"release-version": ["${{ inputs.release-version }}"],
"container-image": "$(cat ngc_images.json | jq -cr)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON syntax error: container-image should be an array. The current command substitution with double quotes will produce invalid JSON.

Suggested change
"container-image": "$(cat ngc_images.json | jq -cr)"
"container-image": $(cat ngc_images.json | jq -cr)

ko3n1g and others added 3 commits February 3, 2026 01:04
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g ko3n1g requested a review from ksivaman February 3, 2026 00:05
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 196 to 198
with:
release-version: ${{ matrix.release-version }}
base-image: ${{ matrix.container-image }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing required action inputs. The action at .github/actions/build-pytorch-wheel/action.yml:12-32 marks python-version, cuda-version, cudnn-version, torch-version, cxx11_abi, and aarch as required: true. Even though they're not used when base-image is provided, GitHub Actions will fail validation without them.

Either:

  1. Change the action definition to mark these as required: false, OR
  2. Pass placeholder values here (e.g., from the matrix or hardcoded defaults)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant