-
Notifications
You must be signed in to change notification settings - Fork 0
Do Not Merge: Integration Branch for GT4Py Next #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
philip-paul-mueller
wants to merge
48
commits into
main
Choose a base branch
from
gt4py-next-integration
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Apr 30, 2025
philip-paul-mueller
added a commit
to GridTools/gt4py
that referenced
this pull request
Apr 30, 2025
Instead of pulling directly from the official DaCe repo, we now (for the time being) pull from [this PR](GridTools/dace#1). This became necessary as we have a lot of open PR in DaCe and need some custom fixes (that can in their current form not be merged into DaCe). In the long term however, we should switch back to the main DaCe repo.
7fcf8f9 to
8b9b674
Compare
8b9b674 to
268fc18
Compare
964e84b to
2d85437
Compare
2d85437 to
9f72250
Compare
88c99f4 to
d779cd1
Compare
d779cd1 to
4f40029
Compare
4f40029 to
09dfda3
Compare
09dfda3 to
87c77ef
Compare
87c77ef to
c2a4e42
Compare
c2a4e42 to
0deba99
Compare
Merged
2 tasks
178037a to
9114985
Compare
33b63a1 to
2417e09
Compare
3472895 to
bed3b0e
Compare
This new type enables DaCe users to perform calculations with stochastic rounding in single precision. This change is validated with unit tests. --------- Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
Allow compilation with latest LLVM versions. Without this change there's
the following error:
```
> raise cgx.CompilationError('Compiler failure:\n' + ex.output)
E dace.codegen.exceptions.CompilationError: Compiler failure:
E [ 20%] Building CXX object CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cpu/vertically_implicit_solver_at_predictor_step.cpp.o
E [ 40%] Building HIP object CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cuda/hip/vertically_implicit_solver_at_predictor_step_cuda.cpp.o
E In file included from /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cuda/hip/vertically_implicit_solver_at_predictor_step_cuda.cpp:3:
E In file included from /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/dace.h:17:
E /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/math.h:509:37: warning: 'host' attribute only applies to functions [-Wignored-attributes]
E 509 | static const __attribute__((host)) __attribute__((device)) typeless_pi pi{};
E | ^
E In file included from /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cuda/hip/vertically_implicit_solver_at_predictor_step_cuda.cpp:3:
E In file included from /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/dace.h:30:
E /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/cuda/copy.cuh:772:41: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
E 772 | wcr_custom<T>::template reduce(
E | ^
E /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/cuda/copy.cuh:779:45: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
E 779 | wcr_custom<T>::template reduce(
E | ^
E /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/cuda/copy.cuh:796:49: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
E 796 | wcr_fixed<REDTYPE, T>::template reduce_atomic(
E | ^
E /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/cuda/copy.cuh:803:53: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
E 803 | wcr_fixed<REDTYPE, T>::template reduce_atomic(
E | ^
E 1 warning and 4 errors generated when compiling for gfx942.
E gmake[2]: *** [CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/build.make:92: CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cuda/hip/vertically_implicit_solver_at_predictor_step_cuda.cpp.o] Error 1
E gmake[1]: *** [CMakeFiles/Makefile2:90: CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/all] Error 2
E gmake: *** [Makefile:91: all] Error 2
.venv/lib/python3.12/site-packages/dace/codegen/compiler.py:254: CompilationError
```
This PR refactors how calling an SDFG works. [PR#1467](github.com/spcl/pull/1467) introduced the `fast_call()` API, which allowed to call a compiled SDFG and skipping some tests. This was done to support the use case of calling the same SDFG with the same (as in pointers) multiple times. However, the PR did not introduced a simple way to generate the argument vector that had to be passed to `fast_call()` without relying on internal implementation details of the class. This PR, beside other things, introduces this use case and give access to the all steps needed to call an SDFG: - `construct_arguments()`: It accepts Python arguments, such as `int` or NumPy arrays and turns them into an argument vector in the right order and converted to the required C type. - `fast_call()`: Performs the actual call using the _passed_ argument vectors, if needed it will also run initialization. Note that this function is not new, but was slightly modified and no longer handles the return values, see below. - `convert_return_values()`: This function performs the actual `return` operation, i.e. composes the specified return type, i.e. either a single array or a tuple. Note, before this function was called by `fast_call()` but it was moved outside to reduce the hot path, because usually return values are passed using inout or out arguments directly. Beside these changes the PR also modifies the following things: - It was possible to pass return values, i.e. `__return`, as ordinary arguments, this is still supported, but now a warning is returned. - `CompiledSDFG` was technically able to handle scalar return values, however, due to technical limitation this is [not possible](spcl#1609), thus the feature was removed. However, the feature was not dropped completely and is still used to handle `pyobjects` since they are passed as pointer, see below. - The handling of `pyobject` return values was modified. Before it was not possible to use `pyobject` instances as return values that were manged _outside_, i.e. not allocated by, `CompiledSDFG`, now they are "handled". It is important that an _array_, i.e. multiple instances, of `pyobject`s are handled as a single object (this is the correct behaviour and retained for bug compatibility with the unit tests), however, a warning is generated. - It was possible to pass an argument as named argument and as positional argument, this is now forbidden. - `safe_call()` is not possible to handle return values, if the method is called on such an SDFG an error is generated. - Before it was not possible to return a `tuple` with a single argument, in that case the value was always directly returned, this has been fixed and is correctly handled. - The allocation of return values was inconsistent. If there was no change in size, then `__call__()` would always return the _same_ arrays, which might lead to very sudden bugs. The new behaviour is to always allocate new memory, this is done by `construct_arguments()`. - Shared return values. - Before `CompiledSDFG` had a member `_lastargs` which "cached" the last pointer arguments that were used to call the SDFG. It was updated by `_construct_args()` (old version of `construct_arguments()`), which did not make much sense. The original intention was to remove it, but this proved to be harder and it is thus maintained. However, it is now updated by `__call__()` and `initialize()` to support the use case for `{get, set}_workspace_size()`. --------- Co-authored-by: Philipp Schaad <schaad.phil@gmail.com> Co-authored-by: Tal Ben-Nun <tbennun@gmail.com>
513e824 to
139389d
Compare
This PR stops the mypy/pyright warnings when using `dace.*` types, and also shows them as numpy arrays for code completion purposes.
Since Python 3.9 is EOL, this PR updates the version span supported by DaCe. The PR also removes some deprecated pre-3.9 parsing code.
This PR fixes an error raised inside dead state elimination when a malformed conditional block is detected. Previously, the error wouldn't be raised correctly because `InvalidSDFGNodeError` was missing the required arguments `sdfg`, `state_id`, and `node_id`. Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
The `_is_data_accessed_downstream()` function was missing visited node tracking in its graph traversal. Even though the dataflow graph is acyclic, the same node can be reached via multiple paths, causing it to be processed exponentially many times. For example, in a diamond-shaped graph `A → B,C → D`, node D gets visited twice (once via B, once via C). When multiple diamonds are chained together, the visits compound multiplicatively: if D feeds into another diamond `D → E,F → G`, then G is visited 4 times (2 paths to D × 2 paths from D). With n such diamonds in sequence, the final node gets visited `2^n` times. I added a `visited` set to make sure each node is only processed once. This reduces map fusion time from 26+ minutes to seconds on a large test SDFGs. Co-authored-by: edopao <edoardo.paone@cscs.ch>
Fixes a bug where Dead Dataflow Elimination would eliminate a field used in a condition
Following up from a discussion on Mattermost, this PR suggests to improve the error messages around undefined / ambiguous starting blocks. --------- Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com> Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
`MapFusion` is deprecated and `MapFusionVertical` should be used instead. `MapFusion` remains in the cobase as a compatibility layer for backwards compatibility. This PR changes the way how `MapFusion` is deprecated. Previously, a warning would be emitted whenever the class was loaded (e.g. when access from `dace/transformation/dataflow/__init__.py`. Depending on how verbose pytest is configured, one would thus see a deprecation warning even if `MapFusion` was never actually used/instantiated. The PR suggests to only emit a warning upon class init. The unrelated change in `tests/buffer_tiling_test.py` (unused import) is because I used test in there to make sure the warning disappears. /cc @philip-paul-mueller FYI Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
In particular, this get rid of a bunch of if/else blocks with breaking changes in python 3.8 and 3.9. --------- Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com> Co-authored-by: Roman Cattaneo <>
The changes include reimplementing `DepthCounter` class for AST traversal, updating tasklet analysis functions to support both work and depth metrics, extending the SDFG analysis to include interstate edge work/depth, and adding tests to validate the new logic.
This includes memlets that target symbols, symbol assignment, and reading/writing arrays or scalars without an appropriate memlet.
139389d to
e537a92
Compare
…2291) Upstream tests in NDSL started showing the following warning ```none UserWarning: You passed `None` as `argnames` to `CompiledSDFG`, but the SDFG you passed has positional arguments. This is allowed but deprecated. ``` I started digging and discovered that the same warning is also present in DaCe tests (e..g ` test_nested_duplicate_callback()` in test file `callback_autodetect_test.py`. This PR passes along the `argnames` from the `DaceProgram` of the parser through `load_precompiled_sdfg` such that the information is available in the `CompiledSDFG` class and the warning is silenced. Co-authored-by: Roman Cattaneo <>
…#2294) This PR enables enumerations to contain attributes via definition as a dataclass. It is also better than the previous `aenum`-based approach for type checkers and IDEs, as it transparently keeps the enumeration members. This feature will be useful for nesting attributes and methods into the classes themselves, improving extensibility. Also enables support for dataclass serialization/deserialization, and removes `aenum` as a requirement. The syntax is as follows (for example): ```python from dace.attr_enum import ExtensibleAttributeEnum from enum import auto class ScheduleType(ExtensibleAttributeEnum): Default = auto() #: Scope-default parallel schedule Sequential = auto() #: Sequential code (single-thread) MPI = auto() #: MPI processes @DataClass(frozen=True) class CPU_Multicore: omp_schedule_type: OMPScheduleType = OMPScheduleType.Default # ... ``` Setting `CPU_Multicore = CPUData` to an external dataclass is also possible. As a result, `ScheduleType.CPU_Multicore` is now a _template_ enum member, and `CPU_Multicore(OMPScheduleType.Static)` is an instance. Registering a new template externally looks like: ```python ScheduleType.register_template("CPU_Multicore", CPUData) ```
A student had a problem because `np.int8` maps to `char`. Char can be either unsigned or signed according to the C++ standard (https://en.cppreference.com/w/cpp/language/types.html). I propose we either use `int8_t` directly or `signed char`, I updated the dictionary according to this proposal.
After a brief discussion `subsets.Indices` were deprecated last week with PR spcl#2282. Since then, many Dace tests emit warnings because of remaining usage of `Indices` in the offset member functions of `subsets.Range`. This PR suggests to adapt `Range.from_indices()` to add support for a sequence of (symbolic) numbers or strings (as suggested in Mattermost). This allows to remove the remaining usage of `subsets.Indices` constructors in the DaCe codebase, which gets rid of a bunch of warnings emitted in test or upstream/user code. Only hickup that I had doing this was the function `_add_read_slice()` , called from `visit_Subscript()` of the `ProgramVisitor` in `newast.py` . That function would check for subsets to be either ranges or indices. And if subsets were indices, we'd go another way. That code path separation is apparently loosely tied to some other place in the codebase because we'd get errors if we were going the sub-optimal ranges-path with indices. I do now check if ranges are indices and set the flag accordingly. That seems to fix issues in tests. I've also checked (manually) all other cases where we'd go a different code path in case subsets are indices. There are some and the remaining ones all "upgrade" indices to ranges. They can be removed once we remove the deprecated `Indices` class. --------- Co-authored-by: Roman Cattaneo <> Co-authored-by: Tal Ben-Nun <tbennun@gmail.com>
This PR replaces `is_start_state` -> `is_start_block` because the former is deprecated. The PR is part of an ongoing effort to reduce warnings emitted in tests. Unrelated to this change, the PR removes unused imports and fixes a couple of typos in changed files. --------- Co-authored-by: Roman Cattaneo <>
Updated GitHub Actions dependencies to the latest versions. No breakage is expected. I've checked the logs and most of them just updated to node20, which is a breaking changes because it requires an up-to-date runner. Since we rely GitHub's runners, this should be no problem. Co-authored-by: Roman Cattaneo <> Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
Reduces the number of warnings
The default configuration of CUDA MPS does not support the number of pytest workers (32) used by the CI job. Besides, CUDA MPS is not needed because the GPU is not configured in exclusive mode.
A student gets a CMake error on some older version.
```
CMake Error at CMakeLists.txt:191 (if):
if given arguments:
"3.28.3" "VERSION_LESS"
Unknown arguments specified
```
I think it is better to just check if the variable is defined before, so
as not have a missing argument error.
`MapFusionVertical` must create new data (reduced versions of the intermediate data) and hence name it. Before it was using a naming scheme based on the node id, which might not be stable. The new scheme uses the name of the intermediate data and guarantees stable names for exclusive intermediate nodes and for shared intermediate nodes under the condition that they are only involved in one MapFusionVertical operation. --------- Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
commit 7e05c76 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Wed Jan 7 10:08:42 2026 +0100 Rename of the Index. commit 32224bb Author: Philip Mueller <philip.mueller@cscs.ch> Date: Thu Dec 18 09:53:57 2025 +0100 Updated the workflow file. commit ecb2785 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Wed Dec 17 08:19:40 2025 +0100 Updated the dace updater workflow file. commit f3198ef Author: Philip Mueller <philip.mueller@cscs.ch> Date: Wed Dec 17 07:41:26 2025 +0100 Made the update point to the correct repo. commit 96f963a Merge: 8b7cce5 387f1e8 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Wed Dec 17 07:37:48 2025 +0100 Merge remote-tracking branch 'spcl/main' into automatic_gt4py_deployment commit 8b7cce5 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Dec 1 09:18:22 2025 +0100 Restored the original workflow files. commit 362ab70 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Dec 1 07:41:40 2025 +0100 Now it has run once, so let's make it less runnable. commit 81b8cfa Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Dec 1 07:39:09 2025 +0100 Made it run always. commit 6d71466 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Mon Dec 1 07:38:11 2025 +0100 Small update. commit eb31e6c Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:23:33 2025 +0100 Empty commit in the branch containing the workflow file. commit 2970a75 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:21:09 2025 +0100 Next step. commit f5d3d9d Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:17:56 2025 +0100 Let's disable everything. commit 211e415 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:10:43 2025 +0100 Disabled the kickstarter. commit d012c26 Author: Philip Mueller <philip.mueller@cscs.ch> Date: Fri Nov 21 15:05:38 2025 +0100 Updated everything.
87cc16f to
c702c0a
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is the PR/branch that GT4Py.Next uses to pull DaCe.
It is essentially DaCe main together with our fixes that, for various reasons have not made it yet into DaCe main.
The process for updating this branch is is as follows there are no exceptions:
version.pyfile in thedace/subfolder. Make sure that there is no new line at the end. Fornextwe are using the epoch 43,cartesianwould use 42. As version number the date is used. Thus the version (fornext) would look something like:'43!YYYY.MM.DD'.gt4py-next-integration).__gt4py-next-integration_YYYY_MM_DDand push it as well.Afterwards you have to update GT4Py's
pyproject.tomlfile.For this you have to update the version requirement of DaCe in the
dace-nextgroup at the beginning of the file to the version you just created, i.e. change it todace==43!YYYY.MM.DD.Then you have to update the the source in the
uvspecific parts of the file, there you have to change the source to the new tag you have just created.Then you have to update the uv look by running
uv sync --extra next --group dace-next, if you have installed the precommit hooks then this will be done automatically.NOTE: Once PR#2423 has been merged the second step, i.e. adapting the tag in the
uvspecific parts is no longer needed.On top of
DaCe/mainwe are using the following PRs:No open PRs currently, all changes are in dace main.
No Longer Needed
DaCe.ConfigPruneSymbolsscope_tree_recursive()MapFusionother_subsetvalidationstate_fission()SubgraphViewtry_initialize()edges in Map fusionMapFusionVerticalRedundantSecondArrayimportinfast_call()compiled_sdfg_call_hooks_managerself._lastargsMutable No longer needed since we now use GT4Py PR#2353 and DaCe PR#2206.self._lastargsMutable (Should be replaced by a more permanent solution).MapFusion*AddThreadBlockMapapply_transformation_once_everywhere()CompiledSDFGrefactoring (archive):For some reason the original PR has been "taken over" by Tal.
Due to the inherent dependency that GT4Py has on this PR we should use the the archive (liked at the top).
MapFusionVerticalMapFusionVertical