Implement SplitPytatoArrayContext that attempts a trivial parallelization strategy#216
Implement SplitPytatoArrayContext that attempts a trivial parallelization strategy#216kaushikcfd wants to merge 3 commits intomainfrom
Conversation
615e2b5 to
5249a38
Compare
|
Making this a draft until the CI is fixed. |
adf8d2b to
b736f3a
Compare
11583d7 to
fe7517e
Compare
be0878b to
ac39271
Compare
41ecb8d to
bb03c86
Compare
bb03c86 to
b8c991a
Compare
inducer
left a comment
There was a problem hiding this comment.
Thanks for working on this. LGTM other than these (fairly minor) issues.
| actx = actx_factory() | ||
|
|
||
| mat = actx.from_numpy(np.random.randn(10, 10)) | ||
| mat = actx.from_numpy(np.random.randn(16, 16)) |
There was a problem hiding this comment.
Is there a specific reason why these have to be a nice power of two now?
| import pytato | ||
|
|
||
|
|
||
| class SplitPytatoPyOpenCLArrayContext(PytatoPyOpenCLArrayContext): |
There was a problem hiding this comment.
I'm not sure why this thing is named "split". Maybe "generic parallelizing"/"basic parallelizing"?
| Copyright (C) 2023 Andreas Kloeckner | ||
| Copyright (C) 2022 Matthias Diener | ||
| Copyright (C) 2022 Matt Smith |
There was a problem hiding this comment.
All these (other than @kaushikcfd) should be U of I BOT. None of us hold copyright to our work.
| @@ -0,0 +1,143 @@ | |||
| """ | |||
| We deliberately avoid using :class:`pytato.transform.CombineMapper` since | ||
| the mapper's caching structure would still lead to recomputing | ||
| the union of sets for the results of a revisited node. |
There was a problem hiding this comment.
Not sure I understand this comment. Could you explain? And: shouldn't we fix this in CombineMapper (if we can)? (or at least file an issue in pytato?) (also above)
|
|
||
| def _split_reduce_to_scalar_across_work_items( | ||
| kernel: lp.LoopKernel, | ||
| callables: Mapping[str, InKernelCallable], |
There was a problem hiding this comment.
A bit weird that callables is needed here/by precompute_for_single_kernel.
| device: "pyopencl.Device", | ||
| ) -> lp.LoopKernel: | ||
|
|
||
| assert len({kernel.id_to_insn[insn_id].reduction_inames() |
There was a problem hiding this comment.
Could elevate reduction inames to a property of the _LoopNest?
|
|
||
| # collect loop nests of instructions that assign to scalars in the array | ||
| # program. | ||
| insn_id_to_loop_nest: Mapping[str, _LoopNest] = { |
There was a problem hiding this comment.
If insn_id_to_loop_nest isn't used, why not go straight to all_loop_nests?
| This routine **assumes** that the entrypoint in *t_unit* global | ||
| barriers inserted as per :func:`_get_call_kernel_insn_ids`. |
There was a problem hiding this comment.
Grammar? I'm not sure I understand.
| # a mapping from shape to the available base storages from temp variables | ||
| # that were dead. | ||
| shape_to_available_base_storage: Dict[int, Set[str]] = defaultdict(set) |
There was a problem hiding this comment.
| # a mapping from shape to the available base storages from temp variables | |
| # that were dead. | |
| shape_to_available_base_storage: Dict[int, Set[str]] = defaultdict(set) | |
| # a mapping from size in bytes to the available base storages from temp variables | |
| # that were dead. | |
| nbytes_to_available_base_storage: Dict[int, Set[str]] = defaultdict(set) |
Otherwise Intel OpenCL gets its integer arithmetic wrong.
b8c991a to
b4488ba
Compare
Uh oh!
There was an error while loading. Please reload this page.