Skip to content

[Question]: privileged container with NVIDIA_VISIBLE_DEVICES support and CDI enabled? #2115

@Smuger

Description

@Smuger

Hi All,

Question

I'm trying to run both CDI enabled workloads as well as workloads that rely on NVIDIA_VISIBLE_DEVICES in the same cluster?

Spec

gpu-operator version: 25.10.1
K8s: EKS running 1.35

cdi:
  enabled: true

devicePlugin:
  enabled: true
  env:
    - name: DEVICE_LIST_STRATEGY
      value: envvar, cdi-annotations, cdi-cri

driver:
  enabled: false

toolkit:
  enabled: ture
  env:
  - name: RUNTIME_CONFIG_SOURCE
    value: "file"

Errors I'm seeing

With CDI disabled my non-privileged workloads are failing with:
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices

With CDI enabled my privileged containers get NVIDIA_VISIBLE_DEVICES=void even with Runtime Class Name: nvidia-legacy and containerd on the nodes expecting nvidia-legacy

It seams like this is an either or setup even though I thought that the whole point of having multiple runtime classes (nvidia, nvidia-legacy, and nvidia-cdi) was to allow both in the same time?

Many thanks,
Kryz

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionCategorizes issue or PR as a support question.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions