Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
/ok to test |
|
Nice! Import time matters not as much (for now), but I'd be very curious about the run-time improvement! |
|
There was a problem hiding this comment.
Pull request overview
This PR introduces a C-based enum implementation to replace many uses of enum.IntEnum in the CUDA Python bindings, aiming to reduce import overhead while keeping the public API behavior aligned with the standard library enums.
Changes:
- Add a new C extension module
cuda.bindings.utils._fast_enumproviding_FastEnum(anintsubclass) and_FastEnumMetaclassto create fast enum types with singleton instances. - Update the generated
runtime.pyx.inandnvrtc.pyx.inbindings to define many CUDA-related enums using_FastEnum/_FastEnumMetaclassand return these enums directly instead of using intermediate_dict_*maps. - Extend
setup.pyto build the new C extension and adjust module name derivation, and addtests/test_basics.pyto compare the behavior ofFastEnumagainstenum.IntEnum.
Reviewed changes
Copilot reviewed 4 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| cuda_bindings/tests/test_basics.py | Adds tests that exercise core enum behaviors for both the new _FastEnum and the standard IntEnum to validate API compatibility (construction, value/name, iteration, containment, attribute access, and repr). |
| cuda_bindings/setup.py | Updates copyright year, generalizes module name derivation to handle non-.pyx sources, and includes cuda/bindings/utils/_fast_enum.c in the extension sources so the new C module is built. |
| cuda_bindings/cuda/bindings/utils/_fast_enum.c | Implements the _fast_enum C extension with a fast int-subclass enum type and its metaclass, but currently has critical issues in module initialization (incorrect PyType_FromModuleAndSpec usage, module name mismatch, and a missing <assert.h> include) and a reference leak on an error path. |
| cuda_bindings/cuda/bindings/runtime.pyx.in | Replaces many IntEnum-based CUDA enums and _dict_* lookup tables with _FastEnum-based enums and direct enum construction in properties and function return values, simplifying the generated code and relying on the new fast enum semantics. |
| cuda_bindings/cuda/bindings/nvrtc.pyx.in | Switches nvrtcResult to _FastEnum/_FastEnumMetaclass and replaces uses of _dict_nvrtcResult with direct nvrtcResult(err) construction, aligning NVRTC bindings with the new fast enum mechanism. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| state->FastEnum_Type = PyType_FromModuleAndSpec(module, &FastEnum_spec, (PyObject *)&PyLong_Type); | ||
| if (state->FastEnum_Type == NULL) { | ||
| return -1; | ||
| } | ||
| if (PyModule_AddObjectRef(module, "_FastEnum", state->FastEnum_Type) < 0) { | ||
| Py_DECREF(state->FastEnum_Type); | ||
| return -1; | ||
| } | ||
|
|
||
| /* Initialize _FastEnumMetaclass (bases: type) */ | ||
| state->FastEnumMetaclass_Type = PyType_FromModuleAndSpec(module, &FastEnumMetaclass_spec, (PyObject *)&PyType_Type); | ||
| if (state->FastEnumMetaclass_Type == NULL) { |
There was a problem hiding this comment.
PyType_FromModuleAndSpec is being called with &PyLong_Type and &PyType_Type directly as the "bases" argument, but the C-API requires this argument to be either NULL or a tuple of base classes. Passing a non-tuple here will raise a TypeError during module initialization and prevent _fast_enum from importing; please wrap the base types in a tuple (and DECREF it afterward) or rely on the Py_tp_base slot instead.
There was a problem hiding this comment.
@Copilot: The docs say: "The bases argument can be used to specify base classes; it can either be only one class or a tuple of classes."
|
|
||
| static struct PyModuleDef _fast_enum_module = { | ||
| PyModuleDef_HEAD_INIT, | ||
| .m_name = "_fast_enum", |
There was a problem hiding this comment.
The module definition uses .m_name = "_fast_enum", but this extension is built as a submodule "cuda.bindings.utils._fast_enum"; CPython expects the module name in PyModuleDef to match the fully qualified import path. With the current name, importing cuda.bindings.utils._fast_enum will fail with a name mismatch error, so m_name should be updated to the fully qualified module name.
| .m_name = "_fast_enum", | |
| .m_name = "cuda.bindings.utils._fast_enum", |
There was a problem hiding this comment.
@Copilot: The fully qualified name is added upon import by the runtime. There is no name mismatch error raised upon import.
| get_fast_enum_state(PyObject *module) | ||
| { | ||
| void *state = PyModule_GetState(module); | ||
| assert(state != NULL); | ||
| return (fast_enum_state *)state; |
There was a problem hiding this comment.
assert() is used in get_fast_enum_state, but <assert.h> is never included, which will cause a compile error when building this extension with a standards-compliant C/C++ compiler. Please add the appropriate header (or remove the assert) so the module builds cleanly.
There was a problem hiding this comment.
Since this is a module loaded into the CPython runtime, assert will be available.
|
|
||
| if (PyDict_Contains(singletons, val)) { | ||
| PyObject *result = PyDict_GetItem(singletons, val); // borrowed ref | ||
| Py_DECREF(singletons); | ||
| Py_INCREF(result); | ||
| return result; | ||
| } else { | ||
| PyErr_Format(PyExc_ValueError, "Value %S not in %N", val, type); |
There was a problem hiding this comment.
FastEnum_new leaks the reference to the singletons dict on the error path: when the requested value is not present, the function returns after setting a ValueError without DECREF-ing the "singletons" object. This should DECREF singletons before returning in the else branch to avoid leaking a reference each time an invalid value is passed.
| if (PyDict_Contains(singletons, val)) { | |
| PyObject *result = PyDict_GetItem(singletons, val); // borrowed ref | |
| Py_DECREF(singletons); | |
| Py_INCREF(result); | |
| return result; | |
| } else { | |
| PyErr_Format(PyExc_ValueError, "Value %S not in %N", val, type); | |
| int contains; | |
| if (singletons == NULL) { | |
| return NULL; | |
| } | |
| contains = PyDict_Contains(singletons, val); | |
| if (contains < 0) { | |
| Py_DECREF(singletons); | |
| return NULL; | |
| } | |
| if (contains) { | |
| PyObject *result = PyDict_GetItem(singletons, val); // borrowed ref | |
| Py_DECREF(singletons); | |
| Py_INCREF(result); | |
| return result; | |
| } else { | |
| PyErr_Format(PyExc_ValueError, "Value %S not in %N", val, type); | |
| Py_DECREF(singletons); |
There was a problem hiding this comment.
@Copilot: Good catch. The solution needs a little work, but the problem identified was real.
|
/ok to test |
Out of curiosity -- are there any known enum bottlenecks (other than #659) that you are aware of? I agree startup time isn't the most important metric for this library, but in this case the fruit was pretty low-hanging it seemed worth it to tackle this. |
|
/ok to test |
|
/ok to test |
|
|
||
| FastEnumObject *self = (FastEnumObject *)self_obj; | ||
| Py_INCREF(name); | ||
| self->name = name; |
There was a problem hiding this comment.
What about using self->name = Py_NewRef(name) for all these incref-and-assign operations ? (available since 3.10)
There was a problem hiding this comment.
Sure, nice idea. I had thought it was too recent.
| # though we don't use the latter) to make sure that the two APIs are identical | ||
|
|
||
|
|
||
| class MyFastEnum(_fast_enum._FastEnum, metaclass=_fast_enum._FastEnumMetaclass): |
There was a problem hiding this comment.
If the module is private, do we really need to also prefix objects with underscore? Seems unnecessarily noisy to me.
Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>
Seems to cut import time by half. Just a draft for testing for now... more details to follow.