Faster enum implementation by mdboom · Pull Request #1569 · NVIDIA/cuda-python

mdboom · 2026-02-03T20:43:00Z

Seems to cut import time by half. Just a draft for testing for now... more details to follow.

copy-pr-bot · 2026-02-03T20:43:04Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

mdboom · 2026-02-03T20:43:08Z

/ok to test

mdboom · 2026-02-03T20:45:27Z

/ok to test

leofang · 2026-02-03T20:50:02Z

Nice! Import time matters not as much (for now), but I'd be very curious about the run-time improvement!

github-actions · 2026-02-03T20:59:09Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1569/
https://nvidia.github.io/cuda-python/pr-preview/pr-1569/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1569/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1569/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

Copilot

Pull request overview

This PR introduces a C-based enum implementation to replace many uses of enum.IntEnum in the CUDA Python bindings, aiming to reduce import overhead while keeping the public API behavior aligned with the standard library enums.

Changes:

Add a new C extension module cuda.bindings.utils._fast_enum providing _FastEnum (an int subclass) and _FastEnumMetaclass to create fast enum types with singleton instances.
Update the generated runtime.pyx.in and nvrtc.pyx.in bindings to define many CUDA-related enums using _FastEnum/_FastEnumMetaclass and return these enums directly instead of using intermediate _dict_* maps.
Extend setup.py to build the new C extension and adjust module name derivation, and add tests/test_basics.py to compare the behavior of FastEnum against enum.IntEnum.

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
cuda_bindings/tests/test_basics.py	Adds tests that exercise core enum behaviors for both the new `_FastEnum` and the standard `IntEnum` to validate API compatibility (construction, value/name, iteration, containment, attribute access, and repr).
cuda_bindings/setup.py	Updates copyright year, generalizes module name derivation to handle non-`.pyx` sources, and includes `cuda/bindings/utils/_fast_enum.c` in the extension sources so the new C module is built.
cuda_bindings/cuda/bindings/utils/_fast_enum.c	Implements the `_fast_enum` C extension with a fast `int`-subclass enum type and its metaclass, but currently has critical issues in module initialization (incorrect `PyType_FromModuleAndSpec` usage, module name mismatch, and a missing `<assert.h>` include) and a reference leak on an error path.
cuda_bindings/cuda/bindings/runtime.pyx.in	Replaces many `IntEnum`-based CUDA enums and `_dict_*` lookup tables with `_FastEnum`-based enums and direct enum construction in properties and function return values, simplifying the generated code and relying on the new fast enum semantics.
cuda_bindings/cuda/bindings/nvrtc.pyx.in	Switches `nvrtcResult` to `_FastEnum`/`_FastEnumMetaclass` and replaces uses of `_dict_nvrtcResult` with direct `nvrtcResult(err)` construction, aligning NVRTC bindings with the new fast enum mechanism.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-03T21:17:01Z

cuda_bindings/cuda/bindings/utils/_fast_enum.c

+    state->FastEnum_Type = PyType_FromModuleAndSpec(module, &FastEnum_spec, (PyObject *)&PyLong_Type);
+    if (state->FastEnum_Type == NULL) {
+        return -1;
+    }
+    if (PyModule_AddObjectRef(module, "_FastEnum", state->FastEnum_Type) < 0) {
+        Py_DECREF(state->FastEnum_Type);
+        return -1;
+    }
+
+    /* Initialize _FastEnumMetaclass (bases: type) */
+    state->FastEnumMetaclass_Type = PyType_FromModuleAndSpec(module, &FastEnumMetaclass_spec, (PyObject *)&PyType_Type);
+    if (state->FastEnumMetaclass_Type == NULL) {


PyType_FromModuleAndSpec is being called with &PyLong_Type and &PyType_Type directly as the "bases" argument, but the C-API requires this argument to be either NULL or a tuple of base classes. Passing a non-tuple here will raise a TypeError during module initialization and prevent _fast_enum from importing; please wrap the base types in a tuple (and DECREF it afterward) or rely on the Py_tp_base slot instead.

@Copilot: The docs say: "The bases argument can be used to specify base classes; it can either be only one class or a tuple of classes."

Copilot · 2026-02-03T21:17:01Z

cuda_bindings/cuda/bindings/utils/_fast_enum.c

+
+static struct PyModuleDef _fast_enum_module = {
+    PyModuleDef_HEAD_INIT,
+    .m_name = "_fast_enum",


The module definition uses .m_name = "_fast_enum", but this extension is built as a submodule "cuda.bindings.utils._fast_enum"; CPython expects the module name in PyModuleDef to match the fully qualified import path. With the current name, importing cuda.bindings.utils._fast_enum will fail with a name mismatch error, so m_name should be updated to the fully qualified module name.

Suggested change

.m_name = "_fast_enum",

.m_name = "cuda.bindings.utils._fast_enum",

@Copilot: The fully qualified name is added upon import by the runtime. There is no name mismatch error raised upon import.

Copilot · 2026-02-03T21:17:02Z

cuda_bindings/cuda/bindings/utils/_fast_enum.c

+get_fast_enum_state(PyObject *module)
+{
+    void *state = PyModule_GetState(module);
+    assert(state != NULL);
+    return (fast_enum_state *)state;


assert() is used in get_fast_enum_state, but <assert.h> is never included, which will cause a compile error when building this extension with a standards-compliant C/C++ compiler. Please add the appropriate header (or remove the assert) so the module builds cleanly.

Since this is a module loaded into the CPython runtime, assert will be available.

Copilot · 2026-02-03T21:17:02Z

cuda_bindings/cuda/bindings/utils/_fast_enum.c

+
+    if (PyDict_Contains(singletons, val)) {
+        PyObject *result = PyDict_GetItem(singletons, val); // borrowed ref
+        Py_DECREF(singletons);
+        Py_INCREF(result);
+        return result;
+    } else {
+        PyErr_Format(PyExc_ValueError, "Value %S not in %N", val, type);


FastEnum_new leaks the reference to the singletons dict on the error path: when the requested value is not present, the function returns after setting a ValueError without DECREF-ing the "singletons" object. This should DECREF singletons before returning in the else branch to avoid leaking a reference each time an invalid value is passed.

Suggested change

if (PyDict_Contains(singletons, val)) {

PyObject *result = PyDict_GetItem(singletons, val); // borrowed ref

Py_DECREF(singletons);

Py_INCREF(result);

return result;

} else {

PyErr_Format(PyExc_ValueError, "Value %S not in %N", val, type);

int contains;

if (singletons == NULL) {

return NULL;

}

contains = PyDict_Contains(singletons, val);

if (contains < 0) {

Py_DECREF(singletons);

return NULL;

}

if (contains) {

PyObject *result = PyDict_GetItem(singletons, val); // borrowed ref

Py_DECREF(singletons);

Py_INCREF(result);

return result;

} else {

PyErr_Format(PyExc_ValueError, "Value %S not in %N", val, type);

Py_DECREF(singletons);

@Copilot: Good catch. The solution needs a little work, but the problem identified was real.

mdboom · 2026-02-03T21:23:08Z

/ok to test

mdboom · 2026-02-03T21:40:57Z

Nice! Import time matters not as much (for now), but I'd be very curious about the run-time improvement!

Out of curiosity -- are there any known enum bottlenecks (other than #659) that you are aware of?

I agree startup time isn't the most important metric for this library, but in this case the fruit was pretty low-hanging it seemed worth it to tackle this.

mdboom · 2026-02-03T21:41:45Z

/ok to test

mdboom · 2026-02-03T23:30:01Z

/ok to test

cpcloud

LGTM, nothing blocking.

cpcloud · 2026-02-04T17:47:13Z

cuda_bindings/cuda/bindings/utils/_fast_enum.c

+
+    FastEnumObject *self = (FastEnumObject *)self_obj;
+    Py_INCREF(name);
+    self->name = name;


What about using self->name = Py_NewRef(name) for all these incref-and-assign operations ? (available since 3.10)

Sure, nice idea. I had thought it was too recent.

cuda_bindings/cuda/bindings/utils/_fast_enum.c

cpcloud · 2026-02-04T17:52:07Z

cuda_bindings/tests/test_basics.py

+# though we don't use the latter) to make sure that the two APIs are identical
+
+
+class MyFastEnum(_fast_enum._FastEnum, metaclass=_fast_enum._FastEnumMetaclass):


If the module is private, do we really need to also prefix objects with underscore? Seems unnecessarily noisy to me.

Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>

Faster enum implementation

a7e249e

Fix setup.py

b60b0ad

mdboom requested a review from Copilot February 3, 2026 21:05

Copilot started reviewing on behalf of mdboom February 3, 2026 21:06 View session

Fix one test

8beb687

Copilot AI reviewed Feb 3, 2026

View reviewed changes

Fix reference leak

677a7e8

Fix bug

32a872d

Fix tests

5bd0555

Fix more versions of Python

471b529

Fix for older Pythons

6cbea13

cpcloud approved these changes Feb 4, 2026

View reviewed changes

Apply suggestions from code review

0bc11d6

Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>

This was referenced Feb 5, 2026

Faster enum implementation (Pure Python version) #1581

Draft

Replace stdlib enum with something more performant #1557

Open

	.m_name = "_fast_enum",
	.m_name = "cuda.bindings.utils._fast_enum",

		# though we don't use the latter) to make sure that the two APIs are identical


		class MyFastEnum(_fast_enum._FastEnum, metaclass=_fast_enum._FastEnumMetaclass):

Conversation

mdboom commented Feb 3, 2026

Uh oh!

copy-pr-bot bot commented Feb 3, 2026

Uh oh!

mdboom commented Feb 3, 2026

Uh oh!

mdboom commented Feb 3, 2026

Uh oh!

leofang commented Feb 3, 2026

Uh oh!

github-actions bot commented Feb 3, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

mdboom Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

mdboom Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

mdboom Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

mdboom Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

mdboom commented Feb 3, 2026

Uh oh!

mdboom commented Feb 3, 2026

Uh oh!

mdboom commented Feb 3, 2026

Uh oh!

mdboom commented Feb 3, 2026

Uh oh!

cpcloud left a comment

Choose a reason for hiding this comment

Uh oh!

cpcloud Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

mdboom Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cpcloud Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants