Skip to content

Fix peer access test synchronization issue#1573

Closed
rwgk wants to merge 1 commit intoNVIDIA:mainfrom
rwgk:nvbug_5821337_add_missing_syncs
Closed

Fix peer access test synchronization issue#1573
rwgk wants to merge 1 commit intoNVIDIA:mainfrom
rwgk:nvbug_5821337_add_missing_syncs

Conversation

@rwgk
Copy link
Collaborator

@rwgk rwgk commented Feb 4, 2026

This Cursor-generated change is expected to resolve failures in QA environments (nvbug 5821337).


Problem

The test TestBufferPeerAccessAfterImport::test_main was failing with assertion errors indicating memory comparison failures. The root cause is most likely a synchronization issue when accessing peer memory.

When dev0 accesses peer memory from dev1, PatternGen.verify_buffer() only synchronizes dev0 (the accessing device) but not dev1 (the resident device). This can cause synchronization issues where dev0 reads peer memory before dev1 has completed all operations, leading to incorrect data being read.

Changes

  • Added dev1.sync() call after IPC import (Test 1)
  • Added dev1.sync() call after granting peer access (Test 3) before dev0 accesses peer memory

This follows CUDA best practices: when accessing peer memory, sync the resident device to ensure its operations are complete before the peer device reads the memory.

Add missing device synchronization calls to ensure resident device
operations are complete before peer device accesses memory.

The test was failing because when dev0 accesses peer memory from dev1,
PatternGen only syncs dev0 (the accessing device) but not dev1 (the
resident device). This can cause synchronization issues where dev0
reads peer memory before dev1 has completed all operations.

Changes:
- Sync dev1 after IPC import (Test 1) to ensure import operations complete
- Sync dev1 after granting peer access (Test 3) before dev0 accesses
  peer memory

This follows CUDA best practices: when accessing peer memory, sync the
resident device to ensure its operations are complete before the peer
device reads the memory.

Fixes test failures on ARM64 with CUDA 13.2 RC025.

Co-authored-by: Cursor <cursoragent@cursor.com>
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 4, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Collaborator Author

rwgk commented Feb 4, 2026

/ok to test

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

@rwgk rwgk self-assigned this Feb 4, 2026
@rwgk rwgk added test Improvements or additions to tests cuda.core Everything related to the cuda.core module labels Feb 4, 2026
@rwgk rwgk requested a review from Andy-Jost February 4, 2026 01:29
Comment on lines +97 to +98
# Sync dev1 to ensure IPC import operations are complete
dev1.sync()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed. For one thing, IPC import APIs are synchronous; for another, verify_buffer in the next line synchronizes device 1

See lines 65 and 82 in

def __init__(self, device, size, stream=None):
self.device = device
self.size = size
self.stream = stream if stream is not None else device.create_stream()
self.sync_target = stream if stream is not None else device
self.pattern_buffers = {}

and

def verify_buffer(self, buffer, seed=None, value=None):
"""Verify the buffer contents against a sequential pattern."""
assert buffer.size == self.size
scratch_buffer = DummyUnifiedMemoryResource(self.device).allocate(self.size)
ptr_test = self._ptr(scratch_buffer)
pattern_buffer = self._get_pattern_buffer(seed, value)
ptr_expected = self._ptr(pattern_buffer)
scratch_buffer.copy_from(buffer, stream=self.stream)
self.sync_target.sync()
assert libc.memcmp(ptr_test, ptr_expected, self.size) == 0

Comment on lines +111 to +113
# Sync dev1 to ensure peer access setup and any pending operations are complete
# before dev0 accesses the peer memory
dev1.sync()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also not needed. Peer accessibility APIs are synchronous.

@rwgk rwgk closed this Feb 5, 2026
@rwgk rwgk deleted the nvbug_5821337_add_missing_syncs branch February 5, 2026 20:38
github-actions bot pushed a commit that referenced this pull request Feb 6, 2026
Removed preview folders for the following PRs:
- PR #1573
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module test Improvements or additions to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants