Skip to content

Faster enum implementation#1569

Draft
mdboom wants to merge 9 commits intoNVIDIA:mainfrom
mdboom:fast-enum
Draft

Faster enum implementation#1569
mdboom wants to merge 9 commits intoNVIDIA:mainfrom
mdboom:fast-enum

Conversation

@mdboom
Copy link
Contributor

@mdboom mdboom commented Feb 3, 2026

Seems to cut import time by half. Just a draft for testing for now... more details to follow.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 3, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mdboom
Copy link
Contributor Author

mdboom commented Feb 3, 2026

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Feb 3, 2026

/ok to test

@leofang
Copy link
Member

leofang commented Feb 3, 2026

Nice! Import time matters not as much (for now), but I'd be very curious about the run-time improvement!

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a C-based enum implementation to replace many uses of enum.IntEnum in the CUDA Python bindings, aiming to reduce import overhead while keeping the public API behavior aligned with the standard library enums.

Changes:

  • Add a new C extension module cuda.bindings.utils._fast_enum providing _FastEnum (an int subclass) and _FastEnumMetaclass to create fast enum types with singleton instances.
  • Update the generated runtime.pyx.in and nvrtc.pyx.in bindings to define many CUDA-related enums using _FastEnum/_FastEnumMetaclass and return these enums directly instead of using intermediate _dict_* maps.
  • Extend setup.py to build the new C extension and adjust module name derivation, and add tests/test_basics.py to compare the behavior of FastEnum against enum.IntEnum.

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
cuda_bindings/tests/test_basics.py Adds tests that exercise core enum behaviors for both the new _FastEnum and the standard IntEnum to validate API compatibility (construction, value/name, iteration, containment, attribute access, and repr).
cuda_bindings/setup.py Updates copyright year, generalizes module name derivation to handle non-.pyx sources, and includes cuda/bindings/utils/_fast_enum.c in the extension sources so the new C module is built.
cuda_bindings/cuda/bindings/utils/_fast_enum.c Implements the _fast_enum C extension with a fast int-subclass enum type and its metaclass, but currently has critical issues in module initialization (incorrect PyType_FromModuleAndSpec usage, module name mismatch, and a missing <assert.h> include) and a reference leak on an error path.
cuda_bindings/cuda/bindings/runtime.pyx.in Replaces many IntEnum-based CUDA enums and _dict_* lookup tables with _FastEnum-based enums and direct enum construction in properties and function return values, simplifying the generated code and relying on the new fast enum semantics.
cuda_bindings/cuda/bindings/nvrtc.pyx.in Switches nvrtcResult to _FastEnum/_FastEnumMetaclass and replaces uses of _dict_nvrtcResult with direct nvrtcResult(err) construction, aligning NVRTC bindings with the new fast enum mechanism.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +335 to +346
state->FastEnum_Type = PyType_FromModuleAndSpec(module, &FastEnum_spec, (PyObject *)&PyLong_Type);
if (state->FastEnum_Type == NULL) {
return -1;
}
if (PyModule_AddObjectRef(module, "_FastEnum", state->FastEnum_Type) < 0) {
Py_DECREF(state->FastEnum_Type);
return -1;
}

/* Initialize _FastEnumMetaclass (bases: type) */
state->FastEnumMetaclass_Type = PyType_FromModuleAndSpec(module, &FastEnumMetaclass_spec, (PyObject *)&PyType_Type);
if (state->FastEnumMetaclass_Type == NULL) {
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyType_FromModuleAndSpec is being called with &PyLong_Type and &PyType_Type directly as the "bases" argument, but the C-API requires this argument to be either NULL or a tuple of base classes. Passing a non-tuple here will raise a TypeError during module initialization and prevent _fast_enum from importing; please wrap the base types in a tuple (and DECREF it afterward) or rely on the Py_tp_base slot instead.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Copilot: The docs say: "The bases argument can be used to specify base classes; it can either be only one class or a tuple of classes."


static struct PyModuleDef _fast_enum_module = {
PyModuleDef_HEAD_INIT,
.m_name = "_fast_enum",
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module definition uses .m_name = "_fast_enum", but this extension is built as a submodule "cuda.bindings.utils._fast_enum"; CPython expects the module name in PyModuleDef to match the fully qualified import path. With the current name, importing cuda.bindings.utils._fast_enum will fail with a name mismatch error, so m_name should be updated to the fully qualified module name.

Suggested change
.m_name = "_fast_enum",
.m_name = "cuda.bindings.utils._fast_enum",

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Copilot: The fully qualified name is added upon import by the runtime. There is no name mismatch error raised upon import.

Comment on lines +16 to +20
get_fast_enum_state(PyObject *module)
{
void *state = PyModule_GetState(module);
assert(state != NULL);
return (fast_enum_state *)state;
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert() is used in get_fast_enum_state, but <assert.h> is never included, which will cause a compile error when building this extension with a standards-compliant C/C++ compiler. Please add the appropriate header (or remove the assert) so the module builds cleanly.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a module loaded into the CPython runtime, assert will be available.

Comment on lines 93 to 100

if (PyDict_Contains(singletons, val)) {
PyObject *result = PyDict_GetItem(singletons, val); // borrowed ref
Py_DECREF(singletons);
Py_INCREF(result);
return result;
} else {
PyErr_Format(PyExc_ValueError, "Value %S not in %N", val, type);
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastEnum_new leaks the reference to the singletons dict on the error path: when the requested value is not present, the function returns after setting a ValueError without DECREF-ing the "singletons" object. This should DECREF singletons before returning in the else branch to avoid leaking a reference each time an invalid value is passed.

Suggested change
if (PyDict_Contains(singletons, val)) {
PyObject *result = PyDict_GetItem(singletons, val); // borrowed ref
Py_DECREF(singletons);
Py_INCREF(result);
return result;
} else {
PyErr_Format(PyExc_ValueError, "Value %S not in %N", val, type);
int contains;
if (singletons == NULL) {
return NULL;
}
contains = PyDict_Contains(singletons, val);
if (contains < 0) {
Py_DECREF(singletons);
return NULL;
}
if (contains) {
PyObject *result = PyDict_GetItem(singletons, val); // borrowed ref
Py_DECREF(singletons);
Py_INCREF(result);
return result;
} else {
PyErr_Format(PyExc_ValueError, "Value %S not in %N", val, type);
Py_DECREF(singletons);

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Copilot: Good catch. The solution needs a little work, but the problem identified was real.

@mdboom
Copy link
Contributor Author

mdboom commented Feb 3, 2026

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Feb 3, 2026

Nice! Import time matters not as much (for now), but I'd be very curious about the run-time improvement!

Out of curiosity -- are there any known enum bottlenecks (other than #659) that you are aware of?

I agree startup time isn't the most important metric for this library, but in this case the fruit was pretty low-hanging it seemed worth it to tackle this.

@mdboom
Copy link
Contributor Author

mdboom commented Feb 3, 2026

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Feb 3, 2026

/ok to test

Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nothing blocking.


FastEnumObject *self = (FastEnumObject *)self_obj;
Py_INCREF(name);
self->name = name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using self->name = Py_NewRef(name) for all these incref-and-assign operations ? (available since 3.10)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, nice idea. I had thought it was too recent.

# though we don't use the latter) to make sure that the two APIs are identical


class MyFastEnum(_fast_enum._FastEnum, metaclass=_fast_enum._FastEnumMetaclass):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the module is private, do we really need to also prefix objects with underscore? Seems unnecessarily noisy to me.

Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants