Fix peer access test synchronization issue by rwgk · Pull Request #1573 · NVIDIA/cuda-python

rwgk · 2026-02-04T00:57:06Z

This Cursor-generated change is expected to resolve failures in QA environments (nvbug 5821337).

Problem

The test TestBufferPeerAccessAfterImport::test_main was failing with assertion errors indicating memory comparison failures. The root cause is most likely a synchronization issue when accessing peer memory.

When dev0 accesses peer memory from dev1, PatternGen.verify_buffer() only synchronizes dev0 (the accessing device) but not dev1 (the resident device). This can cause synchronization issues where dev0 reads peer memory before dev1 has completed all operations, leading to incorrect data being read.

Changes

Added dev1.sync() call after IPC import (Test 1)
Added dev1.sync() call after granting peer access (Test 3) before dev0 accesses peer memory

This follows CUDA best practices: when accessing peer memory, sync the resident device to ensure its operations are complete before the peer device reads the memory.

Add missing device synchronization calls to ensure resident device operations are complete before peer device accesses memory. The test was failing because when dev0 accesses peer memory from dev1, PatternGen only syncs dev0 (the accessing device) but not dev1 (the resident device). This can cause synchronization issues where dev0 reads peer memory before dev1 has completed all operations. Changes: - Sync dev1 after IPC import (Test 1) to ensure import operations complete - Sync dev1 after granting peer access (Test 3) before dev0 accesses peer memory This follows CUDA best practices: when accessing peer memory, sync the resident device to ensure its operations are complete before the peer device reads the memory. Fixes test failures on ARM64 with CUDA 13.2 RC025. Co-authored-by: Cursor <cursoragent@cursor.com>

copy-pr-bot · 2026-02-04T00:57:10Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

rwgk · 2026-02-04T00:57:27Z

/ok to test

github-actions · 2026-02-04T01:07:35Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1573/
https://nvidia.github.io/cuda-python/pr-preview/pr-1573/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1573/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1573/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

Andy-Jost · 2026-02-05T20:29:00Z

cuda_core/tests/memory_ipc/test_peer_access.py

+        # Sync dev1 to ensure IPC import operations are complete
+        dev1.sync()


This should not be needed. For one thing, IPC import APIs are synchronous; for another, verify_buffer in the next line synchronizes device 1

See lines 65 and 82 in

cuda-python/cuda_core/tests/helpers/buffers.py

Lines 61 to 66 in eec6988

def __init__(self, device, size, stream=None):

self.device = device

self.size = size

self.stream = stream if stream is not None else device.create_stream()

self.sync_target = stream if stream is not None else device

self.pattern_buffers = {}

and

cuda-python/cuda_core/tests/helpers/buffers.py

Lines 74 to 83 in eec6988

def verify_buffer(self, buffer, seed=None, value=None):

"""Verify the buffer contents against a sequential pattern."""

assert buffer.size == self.size

scratch_buffer = DummyUnifiedMemoryResource(self.device).allocate(self.size)

ptr_test = self._ptr(scratch_buffer)

pattern_buffer = self._get_pattern_buffer(seed, value)

ptr_expected = self._ptr(pattern_buffer)

scratch_buffer.copy_from(buffer, stream=self.stream)

self.sync_target.sync()

assert libc.memcmp(ptr_test, ptr_expected, self.size) == 0

Andy-Jost · 2026-02-05T20:33:49Z

cuda_core/tests/memory_ipc/test_peer_access.py

+        # Sync dev1 to ensure peer access setup and any pending operations are complete
+        # before dev0 accesses the peer memory
+        dev1.sync()


I think this is also not needed. Peer accessibility APIs are synchronous.

Removed preview folders for the following PRs: - PR #1573

rwgk self-assigned this Feb 4, 2026

rwgk added test Improvements or additions to tests cuda.core Everything related to the cuda.core module labels Feb 4, 2026

rwgk requested a review from Andy-Jost February 4, 2026 01:29

Andy-Jost reviewed Feb 5, 2026

View reviewed changes

rwgk closed this Feb 5, 2026

rwgk deleted the nvbug_5821337_add_missing_syncs branch February 5, 2026 20:38

github-actions bot pushed a commit that referenced this pull request Feb 6, 2026

Clean up PR preview folders for 1 closed/merged PRs

ab59c3f

Removed preview folders for the following PRs: - PR #1573

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix peer access test synchronization issue#1573

Fix peer access test synchronization issue#1573
rwgk wants to merge 1 commit intoNVIDIA:mainfrom
rwgk:nvbug_5821337_add_missing_syncs

rwgk commented Feb 4, 2026

Uh oh!

copy-pr-bot bot commented Feb 4, 2026

Uh oh!

rwgk commented Feb 4, 2026

Uh oh!

github-actions bot commented Feb 4, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Andy-Jost Feb 5, 2026

Uh oh!

Andy-Jost Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Sync dev1 to ensure IPC import operations are complete
		dev1.sync()

	def __init__(self, device, size, stream=None):
	self.device = device
	self.size = size
	self.stream = stream if stream is not None else device.create_stream()
	self.sync_target = stream if stream is not None else device
	self.pattern_buffers = {}

	def verify_buffer(self, buffer, seed=None, value=None):
	"""Verify the buffer contents against a sequential pattern."""
	assert buffer.size == self.size
	scratch_buffer = DummyUnifiedMemoryResource(self.device).allocate(self.size)
	ptr_test = self._ptr(scratch_buffer)
	pattern_buffer = self._get_pattern_buffer(seed, value)
	ptr_expected = self._ptr(pattern_buffer)
	scratch_buffer.copy_from(buffer, stream=self.stream)
	self.sync_target.sync()
	assert libc.memcmp(ptr_test, ptr_expected, self.size) == 0

Conversation

rwgk commented Feb 4, 2026

Problem

Changes

Uh oh!

copy-pr-bot bot commented Feb 4, 2026

Uh oh!

rwgk commented Feb 4, 2026

Uh oh!

github-actions bot commented Feb 4, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Andy-Jost Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants