Skip to content

Conversation

@philip-paul-mueller
Copy link

@philip-paul-mueller philip-paul-mueller commented Apr 30, 2025

This is the PR/branch that GT4Py.Next uses to pull DaCe.
It is essentially DaCe main together with our fixes that, for various reasons have not made it yet into DaCe main.

The process for updating this branch is is as follows there are no exceptions:

  • You start with current DaCe main.
  • Then you include the PR that enables automatic Python index update, by squash merge it.
  • Then squash merge the PRs that are listed below, check if they have been merged into DaCe proper and if so remove them from the list.
  • Then update the version.py file in the dace/ subfolder. Make sure that there is no new line at the end. For next we are using the epoch 43, cartesian would use 42. As version number the date is used. Thus the version (for next) would look something like: '43!YYYY.MM.DD'.
  • Force push your changes to this branch (gt4py-next-integration).
  • Create a tag with the pattern __gt4py-next-integration_YYYY_MM_DD and push it as well.
  • Make sure that the workflow has been triggered.

Afterwards you have to update GT4Py's pyproject.toml file.
For this you have to update the version requirement of DaCe in the dace-next group at the beginning of the file to the version you just created, i.e. change it to dace==43!YYYY.MM.DD.
Then you have to update the the source in the uv specific parts of the file, there you have to change the source to the new tag you have just created.
Then you have to update the uv look by running uv sync --extra next --group dace-next, if you have installed the precommit hooks then this will be done automatically.

NOTE: Once PR#2423 has been merged the second step, i.e. adapting the tag in the uv specific parts is no longer needed.

On top of DaCe/main we are using the following PRs:
No open PRs currently, all changes are in dace main.

No Longer Needed

@philip-paul-mueller philip-paul-mueller marked this pull request as draft April 30, 2025 10:04
philip-paul-mueller added a commit to GridTools/gt4py that referenced this pull request Apr 30, 2025
Instead of pulling directly from the official DaCe repo, we now (for the
time being) pull from [this
PR](GridTools/dace#1).
This became necessary as we have a lot of open PR in DaCe and need some
custom fixes (that can in their current form not be merged into DaCe).
In the long term however, we should switch back to the main DaCe repo.
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 3 times, most recently from 964e84b to 2d85437 Compare May 26, 2025 05:22
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 2 times, most recently from 88c99f4 to d779cd1 Compare June 10, 2025 11:50
@edopao edopao force-pushed the gt4py-next-integration branch from d779cd1 to 4f40029 Compare June 12, 2025 12:46
@edopao edopao force-pushed the gt4py-next-integration branch from 87c77ef to c2a4e42 Compare June 27, 2025 14:08
@edopao edopao force-pushed the gt4py-next-integration branch from 178037a to 9114985 Compare July 14, 2025 08:42
@philip-paul-mueller philip-paul-mueller changed the title Do Not Merge: Integration Branch for GT4Py Do Not Merge: Integration Branch for GT4Py Next Jul 15, 2025
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 4 times, most recently from 33b63a1 to 2417e09 Compare July 21, 2025 07:42
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 4 times, most recently from 3472895 to bed3b0e Compare July 24, 2025 07:24
creavin and others added 3 commits January 20, 2026 14:14
This new type enables DaCe users to perform calculations with stochastic
rounding in single precision.

This change is validated with unit tests.

---------

Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
Allow compilation with latest LLVM versions. Without this change there's
the following error:
```
>               raise cgx.CompilationError('Compiler failure:\n' + ex.output)
E               dace.codegen.exceptions.CompilationError: Compiler failure:
E               [ 20%] Building CXX object CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cpu/vertically_implicit_solver_at_predictor_step.cpp.o
E               [ 40%] Building HIP object CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cuda/hip/vertically_implicit_solver_at_predictor_step_cuda.cpp.o
E               In file included from /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cuda/hip/vertically_implicit_solver_at_predictor_step_cuda.cpp:3:
E               In file included from /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/dace.h:17:
E               /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/math.h:509:37: warning: 'host' attribute only applies to functions [-Wignored-attributes]
E                 509 |         static const __attribute__((host)) __attribute__((device)) typeless_pi pi{};
E                     |                                     ^
E               In file included from /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cuda/hip/vertically_implicit_solver_at_predictor_step_cuda.cpp:3:
E               In file included from /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/dace.h:30:
E               /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/cuda/copy.cuh:772:41: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
E                 772 |                 wcr_custom<T>::template reduce(
E                     |                                         ^
E               /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/cuda/copy.cuh:779:45: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
E                 779 |                     wcr_custom<T>::template reduce(
E                     |                                             ^
E               /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/cuda/copy.cuh:796:49: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
E                 796 |                 wcr_fixed<REDTYPE, T>::template reduce_atomic(
E                     |                                                 ^
E               /capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/.venv/lib/python3.12/site-packages/dace/codegen/../runtime/include/dace/cuda/copy.cuh:803:53: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
E                 803 |                     wcr_fixed<REDTYPE, T>::template reduce_atomic(
E                     |                                                     ^
E               1 warning and 4 errors generated when compiling for gfx942.
E               gmake[2]: *** [CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/build.make:92: CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/capstor/scratch/cscs/ioannmag/beverin/cycle33/ICON4PY/icon4py/dycore_711_res_tmps_rcu/.gt4py_cache/vertically_implicit_solver_at_predictor_step_c2100e7df02fe12899d3acf60eb0b974edf9029c01b12b0414e215257065bc8c/src/cuda/hip/vertically_implicit_solver_at_predictor_step_cuda.cpp.o] Error 1
E               gmake[1]: *** [CMakeFiles/Makefile2:90: CMakeFiles/vertically_implicit_solver_at_predictor_step.dir/all] Error 2
E               gmake: *** [Makefile:91: all] Error 2

.venv/lib/python3.12/site-packages/dace/codegen/compiler.py:254: CompilationError
```
This PR refactors how calling an SDFG works.
[PR#1467](github.com/spcl/pull/1467) introduced the `fast_call()`
API, which allowed to call a compiled SDFG and skipping some tests.
This was done to support the use case of calling the same SDFG with the
same (as in pointers) multiple times.
However, the PR did not introduced a simple way to generate the argument
vector that had to be passed to `fast_call()` without relying on
internal implementation details of the class.
This PR, beside other things, introduces this use case and give access
to the all steps needed to call an SDFG:
- `construct_arguments()`: It accepts Python arguments, such as `int` or
NumPy arrays and turns them into an argument vector in the right order
and converted to the required C type.
- `fast_call()`: Performs the actual call using the _passed_ argument
vectors, if needed it will also run initialization.
Note that this function is not new, but was slightly modified and no
longer handles the return values, see below.
- `convert_return_values()`: This function performs the actual `return`
operation, i.e. composes the specified return type, i.e. either a single
array or a tuple.
Note, before this function was called by `fast_call()` but it was moved
outside to reduce the hot path, because usually return values are passed
using inout or out arguments directly.

Beside these changes the PR also modifies the following things:
- It was possible to pass return values, i.e. `__return`, as ordinary
arguments, this is still supported, but now a warning is returned.
- `CompiledSDFG` was technically able to handle scalar return values,
however, due to technical limitation this is [not
possible](spcl#1609), thus the feature was
removed.
However, the feature was not dropped completely and is still used to
handle `pyobjects` since they are passed as pointer, see below.
- The handling of `pyobject` return values was modified.
Before it was not possible to use `pyobject` instances as return values
that were manged _outside_, i.e. not allocated by, `CompiledSDFG`, now
they are "handled".
It is important that an _array_, i.e. multiple instances, of `pyobject`s
are handled as a single object (this is the correct behaviour and
retained for bug compatibility with the unit tests), however, a warning
is generated.
- It was possible to pass an argument as named argument and as
positional argument, this is now forbidden.
- `safe_call()` is not possible to handle return values, if the method
is called on such an SDFG an error is generated.
- Before it was not possible to return a `tuple` with a single argument,
in that case the value was always directly returned, this has been fixed
and is correctly handled.
- The allocation of return values was inconsistent.
If there was no change in size, then `__call__()` would always return
the _same_ arrays, which might lead to very sudden bugs.
The new behaviour is to always allocate new memory, this is done by
`construct_arguments()`.
- Shared return values.
- Before `CompiledSDFG` had a member `_lastargs` which "cached" the last
pointer arguments that were used to call the SDFG.
It was updated by `_construct_args()` (old version of
`construct_arguments()`), which did not make much sense.
The original intention was to remove it, but this proved to be harder
and it is thus maintained.
However, it is now updated by `__call__()` and `initialize()` to support
the use case for `{get, set}_workspace_size()`.

---------

Co-authored-by: Philipp Schaad <schaad.phil@gmail.com>
Co-authored-by: Tal Ben-Nun <tbennun@gmail.com>
@edopao edopao force-pushed the gt4py-next-integration branch from 513e824 to 139389d Compare January 21, 2026 09:34
tbennun and others added 12 commits January 22, 2026 17:35
This PR stops the mypy/pyright warnings when using `dace.*` types, and
also shows them as numpy arrays for code completion purposes.
Since Python 3.9 is EOL, this PR updates the version span supported by
DaCe.
The PR also removes some deprecated pre-3.9 parsing code.
This PR fixes an error raised inside dead state elimination when a
malformed conditional block is detected. Previously, the error wouldn't
be raised correctly because `InvalidSDFGNodeError` was missing the
required arguments `sdfg`, `state_id`, and `node_id`.

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
The `_is_data_accessed_downstream()` function was missing visited node
tracking in its graph traversal. Even though the dataflow graph is
acyclic, the same node can be reached via multiple paths, causing it to
be processed exponentially many times. For example, in a diamond-shaped
graph `A → B,C → D`, node D gets visited twice (once via B, once via C).
When multiple diamonds are chained together, the visits compound
multiplicatively: if D feeds into another diamond `D → E,F → G`, then G
is visited 4 times (2 paths to D × 2 paths from D). With n such diamonds
in sequence, the final node gets visited `2^n` times. I added a
`visited` set to make sure each node is only processed once. This
reduces map fusion time from 26+ minutes to seconds on a large test
SDFGs.

Co-authored-by: edopao <edoardo.paone@cscs.ch>
Fixes a bug where Dead Dataflow Elimination would eliminate a field used
in a condition
Following up from a discussion on Mattermost, this PR suggests to
improve the error messages around undefined / ambiguous starting blocks.

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
`MapFusion` is deprecated and `MapFusionVertical` should be used
instead. `MapFusion` remains in the cobase as a compatibility layer for
backwards compatibility.

This PR changes the way how `MapFusion` is deprecated. Previously, a
warning would be emitted whenever the class was loaded (e.g. when access
from `dace/transformation/dataflow/__init__.py`. Depending on how
verbose pytest is configured, one would thus see a deprecation warning
even if `MapFusion` was never actually used/instantiated. The PR
suggests to only emit a warning upon class init.

The unrelated change in `tests/buffer_tiling_test.py` (unused import) is
because I used test in there to make sure the warning disappears.

/cc @philip-paul-mueller FYI

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
In particular, this get rid of a bunch of if/else blocks with breaking
changes in python 3.8 and 3.9.

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <>
The changes include reimplementing `DepthCounter` class for AST
traversal, updating tasklet analysis functions to support both work and
depth metrics, extending the SDFG analysis to include interstate edge
work/depth, and adding tests to validate the new logic.
This includes memlets that target symbols, symbol assignment, and
reading/writing arrays or scalars without an appropriate memlet.
romanc and others added 12 commits February 3, 2026 15:18
…2291)

Upstream tests in NDSL started showing the following warning

```none
UserWarning: You passed `None` as `argnames` to `CompiledSDFG`, but the
SDFG you passed has positional arguments. This is allowed but deprecated.
```

I started digging and discovered that the same warning is also present
in DaCe tests (e..g ` test_nested_duplicate_callback()` in test file
`callback_autodetect_test.py`.

This PR passes along the `argnames` from the `DaceProgram` of the parser
through `load_precompiled_sdfg` such that the information is available
in the `CompiledSDFG` class and the warning is silenced.

Co-authored-by: Roman Cattaneo <>
…#2294)

This PR enables enumerations to contain attributes via definition as a
dataclass. It is also better than the previous `aenum`-based approach
for type checkers and IDEs, as it transparently keeps the enumeration
members. This feature will be useful for nesting attributes and methods
into the classes themselves, improving extensibility. Also enables
support for dataclass serialization/deserialization, and removes `aenum`
as a requirement.

The syntax is as follows (for example):

```python
from dace.attr_enum import ExtensibleAttributeEnum
from enum import auto

class ScheduleType(ExtensibleAttributeEnum):
    Default = auto()  #: Scope-default parallel schedule
    Sequential = auto()  #: Sequential code (single-thread)
    MPI = auto()  #: MPI processes

    @DataClass(frozen=True)
    class CPU_Multicore:
        omp_schedule_type: OMPScheduleType = OMPScheduleType.Default

    # ...
```
Setting `CPU_Multicore = CPUData` to an external dataclass is also
possible.

As a result, `ScheduleType.CPU_Multicore` is now a _template_ enum
member, and `CPU_Multicore(OMPScheduleType.Static)` is an instance.
Registering a new template externally looks like:
```python
ScheduleType.register_template("CPU_Multicore", CPUData)
```
A student had a problem because `np.int8` maps to `char`. Char can be
either unsigned or signed according to the C++ standard
(https://en.cppreference.com/w/cpp/language/types.html). I propose we
either use `int8_t` directly or `signed char`, I updated the dictionary
according to this proposal.
After a brief discussion `subsets.Indices` were deprecated last week
with PR spcl#2282. Since then, many Dace
tests emit warnings because of remaining usage of `Indices` in the
offset member functions of `subsets.Range`.

This PR suggests to adapt `Range.from_indices()` to add support for a
sequence of (symbolic) numbers or strings (as suggested in Mattermost).
This allows to remove the remaining usage of `subsets.Indices`
constructors in the DaCe codebase, which gets rid of a bunch of warnings
emitted in test or upstream/user code.

Only hickup that I had doing this was the function `_add_read_slice()` ,
called from `visit_Subscript()` of the `ProgramVisitor` in `newast.py` .
That function would check for subsets to be either ranges or indices.
And if subsets were indices, we'd go another way. That code path
separation is apparently loosely tied to some other place in the
codebase because we'd get errors if we were going the sub-optimal
ranges-path with indices. I do now check if ranges are indices and set
the flag accordingly. That seems to fix issues in tests.

I've also checked (manually) all other cases where we'd go a different
code path in case subsets are indices. There are some and the remaining
ones all "upgrade" indices to ranges. They can be removed once we remove
the deprecated `Indices` class.

---------

Co-authored-by: Roman Cattaneo <>
Co-authored-by: Tal Ben-Nun <tbennun@gmail.com>
This PR replaces `is_start_state` -> `is_start_block` because the former
is deprecated. The PR is part of an ongoing effort to reduce warnings
emitted in tests.

Unrelated to this change, the PR removes unused imports and fixes a
couple of typos in changed files.

---------

Co-authored-by: Roman Cattaneo <>
Updated GitHub Actions dependencies to the latest versions. No breakage
is expected. I've checked the logs and most of them just updated to
node20, which is a breaking changes because it requires an up-to-date
runner. Since we rely GitHub's runners, this should be no problem.

Co-authored-by: Roman Cattaneo <>
Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
Reduces the number of warnings
The default configuration of CUDA MPS does not support the number of
pytest workers (32) used by the CI job. Besides, CUDA MPS is not needed
because the GPU is not configured in exclusive mode.
A student gets a CMake error on some older version.

```
CMake Error at CMakeLists.txt:191 (if):
if given arguments:
     "3.28.3" "VERSION_LESS"
Unknown arguments specified
```

I think it is better to just check if the variable is defined before, so
as not have a missing argument error.
`MapFusionVertical` must create new data (reduced versions of the
intermediate data) and hence name it. Before it was using a naming
scheme based on the node id, which might not be stable.
The new scheme uses the name of the intermediate data and guarantees
stable names for exclusive intermediate nodes and for shared
intermediate nodes under the condition that they are only involved in
one MapFusionVertical operation.

---------

Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
commit 7e05c76
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Wed Jan 7 10:08:42 2026 +0100

    Rename of the Index.

commit 32224bb
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Thu Dec 18 09:53:57 2025 +0100

    Updated the workflow file.

commit ecb2785
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Wed Dec 17 08:19:40 2025 +0100

    Updated the dace updater workflow file.

commit f3198ef
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Wed Dec 17 07:41:26 2025 +0100

    Made the update point to the correct repo.

commit 96f963a
Merge: 8b7cce5 387f1e8
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Wed Dec 17 07:37:48 2025 +0100

    Merge remote-tracking branch 'spcl/main' into automatic_gt4py_deployment

commit 8b7cce5
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Mon Dec 1 09:18:22 2025 +0100

    Restored the original workflow files.

commit 362ab70
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Mon Dec 1 07:41:40 2025 +0100

    Now it has run once, so let's make it less runnable.

commit 81b8cfa
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Mon Dec 1 07:39:09 2025 +0100

    Made it run always.

commit 6d71466
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Mon Dec 1 07:38:11 2025 +0100

    Small update.

commit eb31e6c
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Nov 21 15:23:33 2025 +0100

    Empty commit in the branch containing the workflow file.

commit 2970a75
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Nov 21 15:21:09 2025 +0100

    Next step.

commit f5d3d9d
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Nov 21 15:17:56 2025 +0100

    Let's disable everything.

commit 211e415
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Nov 21 15:10:43 2025 +0100

    Disabled the kickstarter.

commit d012c26
Author: Philip Mueller <philip.mueller@cscs.ch>
Date:   Fri Nov 21 15:05:38 2025 +0100

    Updated everything.
@edopao edopao force-pushed the gt4py-next-integration branch from 87cc16f to c702c0a Compare February 12, 2026 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants