-
Notifications
You must be signed in to change notification settings - Fork 624
ci(fix): NGC build #2643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ci(fix): NGC build #2643
Conversation
Signed-off-by: oliver könig <okoenig@nvidia.com>
Greptile OverviewGreptile SummaryThis PR refactors the NGC PyTorch image build workflow to fix several structural issues. The changes include:
Critical Issue Remaining: The The action logic correctly handles the optional use of these parameters (checking if Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant GHA as GitHub Actions
participant PreFlight as pre-flight job
participant BuildWheels as build_wheels job
participant BuildNGC as build_wheels_for_ngc job
participant Action as build-pytorch-wheel action
participant Release as GitHub Release
alt Release Event
User->>GHA: Publish Release
GHA->>PreFlight: Trigger with release event
PreFlight->>PreFlight: Build matrix (release mode)
PreFlight->>PreFlight: Run check_for_ngc_images.sh
PreFlight->>PreFlight: Create ngc-images-matrix with os, release-version, container-image
else Workflow Dispatch
User->>GHA: Manually trigger with inputs
GHA->>PreFlight: Trigger with workflow_dispatch
PreFlight->>PreFlight: Build matrix (inputs mode)
PreFlight->>PreFlight: Create ngc-images-matrix from inputs
end
PreFlight->>BuildWheels: Pass build-wheel-matrix
PreFlight->>BuildNGC: Pass ngc-images-matrix
alt NGC Image Not Provided
BuildWheels->>Action: Call with all params (python-version, cuda-version, etc.)
Action->>Action: Build custom Docker image from scratch
Action->>Action: Build wheel in container
Action->>BuildWheels: Return wheel
BuildWheels->>Release: Upload wheel as release asset
end
alt NGC Image Provided
BuildNGC->>Action: Call with release-version + base-image only
Note over BuildNGC,Action: ⚠️ Missing required params:<br/>python-version, cuda-version,<br/>cudnn-version, torch-version,<br/>cxx11_abi, aarch
Action->>Action: Pull NGC Docker image
Action->>Action: Build wheel in NGC container
Action->>BuildNGC: Return wheel
BuildNGC->>Release: Upload wheel as release asset
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 4 comments
| uses: ./.github/actions/build-pytorch-wheel | ||
| id: build-pytorch-wheel | ||
| with: | ||
| release-version: ${{ inputs.release-version }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In release events, inputs.release-version is undefined. Should use matrix.release-version instead.
| release-version: ${{ inputs.release-version }} | |
| release-version: ${{ matrix.release-version }} |
| id: build-pytorch-wheel | ||
| with: | ||
| release-version: ${{ inputs.release-version }} | ||
| base-image: ${{ matrix.container-image }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing required parameters: python-version, cuda-version, cudnn-version, torch-version, cxx11_abi, and aarch. The action at .github/actions/build-pytorch-wheel/action.yml:12-32 requires these inputs when base-image is not provided. Since NGC images may not have these pre-configured, ensure they're either in the matrix or passed explicitly.
Signed-off-by: oliver könig <okoenig@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 4 comments
| MATRIX=$(echo '{ | ||
| "os": ["ubuntu-22.04", "ubuntu-22.04-arm"], | ||
| "release-version": ["${{ inputs.release-version }}"], | ||
| "container-image": "$(cat ngc_images.json | jq -cr)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON syntax error: container-image should be an array. The current command substitution with double quotes will produce invalid JSON.
| "container-image": "$(cat ngc_images.json | jq -cr)" | |
| "container-image": $(cat ngc_images.json | jq -cr) |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
| with: | ||
| release-version: ${{ matrix.release-version }} | ||
| base-image: ${{ matrix.container-image }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing required action inputs. The action at .github/actions/build-pytorch-wheel/action.yml:12-32 marks python-version, cuda-version, cudnn-version, torch-version, cxx11_abi, and aarch as required: true. Even though they're not used when base-image is provided, GitHub Actions will fail validation without them.
Either:
- Change the action definition to mark these as
required: false, OR - Pass placeholder values here (e.g., from the matrix or hardcoded defaults)
Description
Please include a brief summary of the changes, relevant motivation and context.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: