Add dejagnu tests for cooperative group GWS debugging#116
Conversation
There was a problem hiding this comment.
Pull request overview
Adds new ROCm dejagnu coverage to exercise ROCgdb debugging of cooperative-group HIP kernels that synchronize via GWS, covering both single-device this_grid().sync() and multi-device this_multi_grid().sync() scenarios.
Changes:
- Introduces a single-device cooperative-kernel test that breaks before/after
grid.sync()and validates waves/dispatch visibility. - Introduces a multi-device cooperative-kernel non-stop test that breaks inside a multi-grid kernel and runs through grid + multi-grid barriers to completion.
- Adds two HIP C++ test programs that implement the cooperative-group synchronization patterns and validate results on the host side.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| gdb/testsuite/gdb.rocm/coop-group-grid-sync.exp | DejaGnu test for single-device cooperative kernel debugging around this_grid().sync(). |
| gdb/testsuite/gdb.rocm/coop-group-grid-sync.cpp | HIP program implementing single-device cooperative grid.sync() and host-side validation. |
| gdb/testsuite/gdb.rocm/coop-group-multi-grid-sync.exp | DejaGnu non-stop test for multi-device cooperative kernel debugging through this_grid().sync() + this_multi_grid().sync(). |
| gdb/testsuite/gdb.rocm/coop-group-multi-grid-sync.cpp | HIP program implementing multi-device cooperative launch with cross-device aggregation and validation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
af78fac to
33f1926
Compare
lancesix
left a comment
There was a problem hiding this comment.
Hi,
Thanks a lot for this, this is a great starting point.
My main concern for now is gfx110x. We do not support debugging cooperative group on those (documented limitation), the testcase should look for them to not FAIL. This is known that the test will not pass even if the arch do support GWS.
I have added a couple of small comments, I'll get back to a more detailed review after the gfx11 concern has been addressed.
Addressed. Added a supports_cooperative_groups helper in lib/rocm.exp that excludes gfx1100/1101/1102/1103, and both .exp files now require it, so on gfx110x the run reports UNSUPPORTED: …: require failed: supports_cooperative_groups instead of FAIL. Mirrors the existing hip_devices_support_debug_multi_process pattern in the same lib. |
33f1926 to
73b1b20
Compare
| CHECK (hipGetDeviceProperties (&props, device_id)); | ||
| if (!props.cooperativeLaunch) | ||
| { | ||
| printf ("Device does not support cooperative launch, skipping.\n"); |
There was a problem hiding this comment.
Output the device id so when this happens it is clear what device we're using.
There was a problem hiding this comment.
Adjusted the existing skip message to include the device id range that was actually checked, so it’s clear which devices were evaluated
There was a problem hiding this comment.
Doesn't it say 0..n though? If the last ones fails in a 8 GPU setup, we will report 0..7, which doesn't seem very helpful.
There was a problem hiding this comment.
Fair point — the 0..N framing did read like a single-device range when it really meant "we checked all of them and none qualified." Reworded to drop the range and just report how many devices were checked:
None of the %d HIP device(s) support cooperative launch, skipping.
So in an 8 GPU setup it now prints "None of the 8 HIP device(s) support cooperative launch, skipping.", which is unambiguous about both the count and the fact that nothing matched.
| # First, advance to the eligibility marker so we can read the | ||
| # final value of "n_gpus" (the count of GPUs the inferior is | ||
| # actually going to use for the cooperative dispatch). If the | ||
| # program self-skips before reaching the marker (insufficient | ||
| # GPUs or no cooperative-multi-device support on some device), | ||
| # mark the test UNSUPPORTED. |
There was a problem hiding this comment.
Should the requires above catch the case where GWS support is not there?
Also, since this uses multiple GPU's, what are the chances of some system being busy with other workloads and capacity just not being there to run these tests and spawn the waves we want. Have we considered this case? Do we need some detection to make sure the test doesn't fail randomly?
There was a problem hiding this comment.
-
Yes — with require supports_cooperative_groups (architecture gate) and require hip_devices_support_debug_multi_process already in place above, plus the recent .cpp change that picks 2 GWS-capable devices via cooperativeMultiDeviceLaunch, the runtime "or skip" branch was redundant. Dropped it; the marker advance now uses the standard gdb_continue_to_breakpoint and bails on failure. The awkward [^\r\n]\r\n[^\r\n]\r\n regex goes away with it.
-
On the busy-system / capacity question: any HIP API failure during launch or sync is checked by the CHECK (...) macro in the inferior, which terminates with a non-zero exit. That surfaces as a loud test FAIL (mismatched exit), not a silent skip or random pass — so we won't get random false skips just because the system is loaded. We could add an explicit launch-time timeout, but that wasn't the historical pattern in gdb.rocm tests. Updated the comment in the .exp to make this rationale explicit.
|
|
||
| set eligible 0 | ||
| gdb_test_multiple "continue" "advance to eligibility marker or skip" { | ||
| -re "hit Breakpoint $::decimal,\[^\r\n\]*main \\(\\)\[^\r\n\]*\r\n\[^\r\n\]*\r\n$::gdb_prompt " { |
There was a problem hiding this comment.
Another one of those patterns I'm not sure we need: [^\r\n]\r\n[^\r\n]\r\n
There was a problem hiding this comment.
It would be really helpful if you could give a hint of what was changed. I can't really tell what this was replaced with.
There was a problem hiding this comment.
The whole gdb_test_multiple "continue" block (with the awkward [^\r\n]\r\n[^\r\n]\r\n$gdb_prompt regex and the eligible flag) was replaced with the standard gdb_continue_to_breakpoint helper from lib/gdb.exp. The block now reads:
gdb_breakpoint \
[gdb_get_line_number "n-gpus-final line"] allow-pending
if {[gdb_continue_to_breakpoint "advance to n-gpus-final"] != 0} {
return
}
There was a problem hiding this comment.
Was this change pushed? I don't see it.
| proc supports_cooperative_groups {} { | ||
| set unsupported_targets { | ||
| gfx1100 gfx1101 gfx1102 gfx1103 | ||
| } | ||
|
|
||
| set targets [find_amdgpu_devices] | ||
| if { [llength $targets] == 0 } { | ||
| return 0 | ||
| } | ||
|
|
||
| foreach target $targets { | ||
| if { [lsearch -exact $unsupported_targets $target] != -1 } { | ||
| return 0 | ||
| } | ||
| } | ||
| return 1 | ||
| } | ||
|
|
There was a problem hiding this comment.
How do we check that each GPU supports GWS? It seems this code assumes we do support it unless we're using one of the unsupported targets.
Should this check at least validate in some way that gdb does the right thing instead of letting potentially unsupported cases going through with testing and possibly causing issues?
Also, if we can check per-GPU, I think that would be nice given we have the single GPU test as well, and we need to select a particular device for that. If we have a list of GPU's and entries saying if each of those support GWS, then we can pick a good one.
There was a problem hiding this comment.
Per-GPU predicate: refactored — extracted target_supports_cooperative_groups ; supports_cooperative_groups now iterates and delegates per-target. Same caller-visible behavior, but the per-target seam is now reusable for future filtering.
Validating gdb actually works on a given GPU: I checked amd-dbgapi.h. There's no per-agent capability query for "supports cooperative-group debug" — the API only enforces the restriction at attach time after a coop queue is created (AMD_DBGAPI_STATUS_ERROR_RESTRICTION). So we can't probe it cheaply from a requires gate today. The static denylist matches exactly what the header documents as unsupported (Restrictions §9: gfx1100/1101/1102) and mirrors the hip_devices_support_debug_multi_process pattern just above. Long-term fix is a per-agent capability bit on amd-dbgapi; once that lands, target_supports_cooperative_groups is the single place to swap in a runtime query.
Picking a supported device in the single-GPU test: the natural follow-up — needs us to plumb the supported-target allowlist into the inferior so the .cpp can pick a debugger-supported device, not just a cooperativeLaunch-supporting one. Worth a separate change, ideally after (2) lands so we plumb agent IDs instead of gfx names.
Side note: our denylist also includes gfx1103, which isn't in the public header. Drop it or keep it?
There was a problem hiding this comment.
Seems like a bit too much work for now. OK. I'm happy not doing the heavy-handed check and instead go with the assumption about supported features in specific gfx's. Then the code you have to handle lack of working GWS stays in place and my comments there can be ignore.
ae4f615 to
5aa9e19
Compare
| @@ -26,7 +26,7 @@ | |||
|
|
|||
| load_lib rocm.exp | |||
|
|
|||
| require allow_hipcc_tests | |||
| require allow_hip_tests | |||
There was a problem hiding this comment.
Sorry — these 4 allow_hipcc_tests → allow_hip_tests edits are unrelated to this PR and the rename target doesn't exist (only allow_hipcc_tests is defined in lib/rocm.exp). They leaked in from a different local WIP branch during a rebase. Reverted in all 4 files; the next force-push will drop them from the PR.
| @@ -20,7 +20,7 @@ | |||
|
|
|||
| load_lib rocm.exp | |||
|
|
|||
| require allow_hipcc_tests | |||
| require allow_hip_tests | |||
There was a problem hiding this comment.
Sorry — these 4 allow_hipcc_tests → allow_hip_tests edits are unrelated to this PR and the rename target doesn't exist (only allow_hipcc_tests is defined in lib/rocm.exp). They leaked in from a different local WIP branch during a rebase. Reverted in all 4 files; the next force-push will drop them from the PR.
| @@ -17,7 +17,7 @@ | |||
|
|
|||
| load_lib rocm.exp | |||
|
|
|||
| require allow_hipcc_tests | |||
| require allow_hip_tests | |||
There was a problem hiding this comment.
Sorry — these 4 allow_hipcc_tests → allow_hip_tests edits are unrelated to this PR and the rename target doesn't exist (only allow_hipcc_tests is defined in lib/rocm.exp). They leaked in from a different local WIP branch during a rebase. Reverted in all 4 files; the next force-push will drop them from the PR.
| array unset gpu_seen | ||
| set distinct_gpus 0 | ||
| gdb_test_multiple "" "breakpoint hit on every participating GPU" { | ||
| -re "Thread $::decimal\[^\r\n\]* hit Breakpoint $::decimal\\.($::decimal),\[^\r\n\]*\r\n" { |
There was a problem hiding this comment.
I think there is risk here that the implicit ".*" at the beginning of the regexp may match an already printed "Thread ... hit Breakpoint" line. We may have to do line-by-line matching to avoid this possibility.
There was a problem hiding this comment.
Anchored the regex with a leading \r\n so each match must consume a fresh newline first. That forces line-by-line consumption and removes the risk of the implicit "scan whole buffer" behavior re-matching an already-seen Thread N … hit Breakpoint X.Y … line. In non-stop mode every such notification is preceded by a newline, so the anchor is always satisfied for new hits.
| # cooperative dispatch's multi-workgroup property is exercised. | ||
| set blocks {} | ||
| gdb_test_multiple "info threads" "blocks present pre-sync" -lbl { | ||
| -re "AMDGPU Wave \[^(\r\n\]*\\((\[^()\r\n\]+)\\)" { |
There was a problem hiding this comment.
Please note that there is an implicit .* at the beginning of the regexp and the regexp is not matching until the end of the line. To make sure we match such lines one by one, we may have to use ^ at the regexp start and go until the we see \r\n.
There was a problem hiding this comment.
Anchored to a single line. Two changes to the regex:
Added ^[^\r\n]* at the start so the match starts at the beginning of the current line (in -lbl mode the framework already prepends ^, but the explicit [^\r\n]* makes it clear no \r\n can be skipped before AMDGPU Wave).
Added [^\r\n]*\r\n at the end so the entire matching line is consumed, leaving the next info threads row at the buffer's start for the next iteration.
Net effect: one row
There was a problem hiding this comment.
Looked again — the original regex is actually correct here because the -lbl flag on gdb_test_multiple already gives us line-by-line consumption: it prepends ^ to each pattern and auto-consumes any line that no pattern matches. So adding ^[^\r\n]* / [^\r\n]*\r\n to the pattern is redundant and, when I tried it, broke the match entirely (the test FAILed because no blocks were collected). Reverted to the original pattern; the line-by-line guarantee is provided by -lbl.
5aa9e19 to
d541e7d
Compare
d541e7d to
28f6ae0
Compare
| @@ -0,0 +1,168 @@ | |||
| /* Copyright 2026 Free Software Foundation, Inc. | |||
There was a problem hiding this comment.
Could you please add a description to the commit message? There is currently only the title. It would also help if a couple tags are added to the commit title, like so: "gdb, testsuite: add gdb.rocm tests for cooperative group debugging"
|
|
||
| set eligible 0 | ||
| gdb_test_multiple "continue" "advance to eligibility marker or skip" { | ||
| -re "hit Breakpoint $::decimal,\[^\r\n\]*main \\(\\)\[^\r\n\]*\r\n\[^\r\n\]*\r\n$::gdb_prompt " { |
There was a problem hiding this comment.
Was this change pushed? I don't see it.
Summary
Add dejagnu coverage for debugging AMD GPU cooperative-group kernels —
i.e. kernels launched via
hipLaunchCooperativeKernel/hipLaunchCooperativeKernelMultiDevicethat synchronize at the grid /multi-grid level. On AMD GPUs these synchronization primitives are
implemented in hardware via Global Wave Sync (GWS), and they have a
distinct wave/scheduling model that has historically only been covered by
out-of-tree tests. This PR brings that coverage into the dejagnu testsuite
so it runs as part of the regular ROCgdb regression suite.
Tests added
gdb.rocm/coop-group-grid-sync.{cpp,exp}cooperative_groups::this_grid().sync()(intra-device GWS), launched viahipLaunchCooperativeKernel.gdb.rocm/coop-group-multi-grid-sync.{cpp,exp}this_grid().sync()andcooperative_groups::this_multi_grid().sync()(intra + cross-device GWS), launched viahipLaunchCooperativeKernelMultiDevice.What gets verified
coop-group-grid-sync.exp— two sub-tests:test_break_around_grid_syncgrid.sync()inside a cooperative dispatch.AMDGPU Wavethreads are stopped (waves participatingin the GWS barrier).
info dispatcheslists the cooperative dispatch.grid.sync()and continue: it mustfire (proves GWS-protected code can be debugged across the barrier).
test_threads_in_coop_kernelAMDGPU Waveparked inside the kernel, switch to it andconfirm
bt 1reports a frame insidecoop_grid_sync_kernel.coop-group-multi-grid-sync.exp— runs in non-stop mode:continue -a &, confirm a kernel-side breakpoint fires insidecoop_multi_grid_sync_kernel. Per-GPU child breakpoint instances(
Breakpoint X.Y) are observed for every participating GPU.this_grid().sync()andthis_multi_grid().sync()release correctlyunder the debugger.
The host-side post-conditions in the
.cppprograms additionally validatethe cooperative semantics numerically (cross-workgroup data dependency for
the single-device case, cross-device sum aggregation for the multi-device
case), so any regression in GWS behavior under the debugger turns into a
test failure rather than a silent miscompare.
Skip / unsupported handling
The tests degrade cleanly on systems that cannot run them:
cooperativeLaunch; if unsupported theprogram prints a recognizable message and exits, and the
.expmarksthe test
UNSUPPORTED.>= 2GPUs andcooperativeMultiDeviceLaunchon every device. It is also gated bythe existing
hip_devices_support_debug_multi_processrequirement.Any of those missing →
UNSUPPORTED.No new dejagnu helpers are required; both
.expfiles use existinginfrastructure in
lib/rocm.exp.Out of scope / follow-ups
Intentionally left out of this PR; happy to extend if reviewers ask:
next/step/stepi) acrossgrid.sync()/mgrid.sync()boundaries.lane apply/ per-lane register inspection while waves are parked atthe GWS barrier.