Skip to content

WIP: Lane divergence support#124

Open
palves wants to merge 32 commits into
amd-stagingfrom
users/palves/lane-divergence
Open

WIP: Lane divergence support#124
palves wants to merge 32 commits into
amd-stagingfrom
users/palves/lane-divergence

Conversation

@palves
Copy link
Copy Markdown
Collaborator

@palves palves commented May 13, 2026

Lane divergence support prototype.

This prototypes support for lane divergence using DW_AT_LLVM_lane_pc from https://llvm.org/docs/AMDGPUUsage.html#dw-at-llvm-lane-pc .

It includes a testcase hand-written using the DWARF assembler from the testsuite. It's actually the first example of a DWARF-assembler based GPU testcase, showing the contortions we need to go through. Those could could/should probably be extracted out making it possible to write other DWARF-assembler based tests using that framework.

palves and others added 30 commits June 25, 2024 20:46
clang-offload-bundler -inputs is deprecated, pass multiple -input
(singular) arguments instead.

Fixes gdb.dwarf2/ testcases with the hip board, like:

 $ make check RUNTESTFLAGS="--target_board=hip" TESTS="gdb.dwarf2/bad-regnum.exp"
 ...
 builtin_spawn -ignore SIGHUP /opt/rocm/llvm/bin/clang-offload-bundler -type=o -targets=hip-amdgcn-amd-amdhsa-gfx906,host-x86_64-unknown-linux-gnu -outputs=/home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.dwarf2/bad-regnum/bad-regnum1.o -inputs=/home/pedro/rocm/gdb/build/gdb/testsuite/temp/3503025/bad-regnum1.o.tmp.o,/home/pedro/rocm/gdb/build/gdb/testsuite/empty-host.o
 /opt/rocm/llvm/bin/clang-offload-bundler: warning: -inputs is deprecated, use -input instead
 /opt/rocm/llvm/bin/clang-offload-bundler: warning: -outputs is deprecated, use -output instead
 UNTESTED: gdb.dwarf2/bad-regnum.exp: failed to prepare
 ...

Change-Id: I3768276c2ddea680fdb6243270fcb0390423a17e
Currently, passing "hip" to gdb_compile has the effect of injecting
-ggdb in the compiler (hipcc) options.

We don't want to hardcode -ggdb for all HIP compilations, though,
because some tests want to exercise testing without debug info, while
others want to compile their binary without debug info or with
-gline-info-only, and then use the DWARF assembler to generate their
own info.

The -ggdb option was originally needed because it enables an LLVM
option that is essential for debugging (-mllvm
-amdgpu-spill-cfi-saved-regs).  Nowadays, -g enables that LLVM option
too, so we can drop the -ggdb from all compilations, and instead rely
on -g being added via testcases passing "debug" as gdb_compile option.

There's one wrinkle, though.  Said LLVM option added by -g (-mllvm
-amdgpu-spill-cfi-saved-regs) also affects code generation,
unfortunately.  Some features of the DWARF assembler machinery rely on
code generated by the compiler without -g to be the exact same as the
code generated with -g.

So, instead of just removing -ggdb, explicitly compile with the LLVM
option that -g/-ggdb would enable.

Change-Id: I64a938edf97c85d0ea9fc34b37071daa560de7e7
After the previous patch that removed the hardcoded -ggdb, a lot of
the gdb.dwarf2/ testcases started failing when tested against
--target_board=hip, like so:

  Thread 5 "data-loc" hit Breakpoint 1, with lane 0, 0x00007ffff5877098 in main () from file:///home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.dwarf2/data-loc/data-loc#offset=8192&size=52360
  (gdb) FAIL: gdb.dwarf2/data-loc.exp: running to main in runto

The problem is the "with lane 0" part, which isn't expected by runto.
This pattern only matches when there's no debug info, that's why it
went unnoticed with most testcases.

Fix it by optionally expecting the "with lane" part, and add a regular
HIP testcase that would fail without the fix.

The fix has this effect on gdb.dwarf2/*.exp tests, with
--target_board=hip:

 -# of expected passes           311
 -# of unexpected failures       368
 +# of expected passes           704
 +# of unexpected failures       469

Note failures go up because more tests manage to run.

Change-Id: I51253e1f91deec007071201c05973f5c1be75204
Fixes, for example:

 $ make check RUNTESTFLAGS="--target_board=hip" TESTS="gdb.dwarf2/void-type.exp"
 ...
 DUPLICATE: gdb.dwarf2/void-type.exp: set print asm-demangle on
 DUPLICATE: gdb.dwarf2/void-type.exp: set print demangle on
 ...

Passing "" as second argument to gdb_test_no_output no longer works to
stop it from issuing a PASS nowadays.  Use -nopass instead.

Change-Id: Ia1a41b7d17e62736b3b74e605163800908dc1b90
Read in the DW_AT_LLVM_lane_pc attribute, and store the expressions as
dynamic properties similarly to how static_link is stored.

Expose DW_AT_LLVM_lane_pc as a $lane_pc user register.  The
implementation finds the current function, and evaluates the
corresponding DW_AT_LLVM_lane_pc expression.

This also adds a $__lane_pc_array user register, that shows the
evaluation of DW_AT_LLVM_lane_pc in its array form, with one element
per lane.

Also suppports DW_AT_LLVM_lane_pc described with both
DW_FORM_sec_offset and DW_FORM_loclistx.  The latter motivates the
attr_to_dynamic_prop change.

Change-Id: If7ed8aef1ffffbe95f632f3dab68cfb6eb43e37f
For "bt", "up", "down".  E.g.:

 (gdb) bt
 warning: lane is divergent, skipping inactive frames
 #1  func1 () at kernel.cc:114
 #2  0x00007ffff5805c3c in kernel () at kernel.cc:151
 (gdb) frame
 #1  func1 () at kernel.cc:114
 (gdb) down
 Bottom (innermost) frame selected; you cannot go down.

It is still possible to force GDB to select an inactive frame using
the "frame" command.  E.g., with "frame 0".

Change-Id: Ib3c87bbc739d32822b39fc91d445fb27d62a5e60
When displaying the lane's current location, use the logical/lane PC
instead of the physical/wave PC.

Add and make use of a new get_frame_lane_pc routine that returns the
frame's lane PC, as a counterpart to get_frame_pc, which returns the
physical PC.

Change-Id: I305b738bc524b25f19d3b1385e1fed127240e501
Currently, "info lanes", either shows the wave's current frame, or
"(inactive)", like so:

 (gdb) info lanes
   Id   State Target Id                              Frame
 * 0    A     AMDGPU Lane 1:2:1:1/0 (0,0,0)[0,0,0]   lane_pc_test (gid=0) at dw2-lane-pc.cc:102
   1    I     AMDGPU Lane 1:2:1:1/1 (0,0,0)[1,0,0]   (inactive)

This patch makes it so that if we have divergence debug info, we print
the lane's logical location instead, and show state as "D"
(divergent).

For example, here, lane 1 is divergent in an if/then/else:

 (gdb) info lanes 0-2
   Id   State Target Id                            Frame
 * 0    A     AMDGPU Lane 1:2:1:1/0 (0,0,0)[0,0,0] lane_pc_test (gid=0) at dw2-lane-pc.cc:102
   1    D     AMDGPU Lane 1:2:1:1/1 (0,0,0)[1,0,0] lane_pc_test (gid=1) at dw2-lane-pc.cc:97

and here, lane 1 called a function while lane 0 is divergent, so lane 0's
last active frame is not the wave's current frame:

 (gdb) info lanes 0-2
   Id   State Target Id                            Frame
   0    D     AMDGPU Lane 1:2:1:1/0 (0,0,0)[0,0,0] lane_pc_test (gid=0) at dw2-lane-pc.cc:114
 * 1    A     AMDGPU Lane 1:2:1:1/1 (0,0,0)[1,0,0] foo (gid=1) dw2-lane-pc.cc:83

Change-Id: I523b666f77da586512cd9a69f81ec07a60b3fcf2
Change-Id: I42207b5b90ae50d0e9675c990eeb65987ad9a7e9
The lane-stepping support will need lane-specific breakpoints.  This
adds the funcionality to the breakpoints module.  It doesn't, however,
expose a user-visible way to create such breakpoints.

Change-Id: Ib44646f1a30557c3a41f4c82e91a1e39a672236e
A later patch in the series will add a new call to
thread_info::active_simd_lanes_mask(), and that has the unfortunate
consequence of flushing the frame cache, which in turn has the
unfortunate consequence of changing the annotations output, causing a
regression in gdb.cp/annota2.exp -- a "frames-invalid" appears at a
spot that isn't expected.

The frame cache flush is caused by
thread_info::active_simd_lanes_mask() calling
switch_to_inferior_no_thread.  If we call switch_to_thread instead, we
spare the cache flush if the THIS thread is already the current
thread, thus avoiding the annotation churn.  This was already done in
thread_info::has_simd_lanes(), but missed here.

That change surprisingly regresses gdb.base/annota1.exp (i.e., to fix
annota2.exp we break annota1.exp), but there's a catch -- it only
regresses it because an earlier lane debugging commit changed the
testcase:

 commit bab976b
 Commit:     Pedro Alves <pedro@palves.net>
 CommitDate: Fri Aug 20 12:12:05 2021 +0100

    Base lane debugging support

The log of that commit says:

 "gdb.base/annota1.exp had to be tweaked because the patch has the
  side effect of changing the order of a frames-invalid and a
  breakpoints-invalid annotion."

Reverting the gdb.base/annota1.exp hunk from that commit fixes the
regression caused by this commit.

Change-Id: I15e857e113195fbddf21173a6471495e435641e6
This teaches execution commands like next/step to handle when the case
of the current lane being divergent.  In such case, the frame
considered for "current source line", etc. should be the first active
frame, not the thread's current frame, which may inactive for the
current lane.

Also, don't start range stepping if the current lane is not active, as
the current line range of an inactive lane is unrelated to the current
line range of active lanes.  It's not worth it to try to optimize this
case, as GDB will immediately start skipping the inactive code region,
and that will be done by running to a breakpoint after the following
patch.

Change-Id: Idddcc4f3e457ab39321c6c3ef91e8ac848e97a79
GDB currently is able to step/next divergent lanes.  When the thread
is stepping, if the current lane becomes inactive, then GDB continues
stepping until it becomes active, at which point GDB resumes the
normal stepping algorithm (check whether we reached a different line).

However, single-stepping the whole divergent range can be slow, as
there may be many instructions to step, e.g., because the inactive
code calls functions.  There's a simple way to avoid the instruction
single-stepping though -- the logical lane PC is supposed to point at
the next instruction the lane will execute when it becomes active, so
just set a breakpoint there, and run to it.

In case the compiler emits bad DW_AT_LLVM_lane_pc info, add a command
to disable the stepping over divergent regions with a breakpoint:
"maint set skip-divergent-regions-with-breakpoint on/off".  When off,
GDB skips the divergent regions by doing the slower single stepping.

Change-Id: Id04d1707b31fb015c2c1c26561c7c633b072c8a6
A following patch will change the DWARF assembler's get_func_range
function to wrap function names with single quotes, like,

 (gdb) disassemble '$f'

instead of

 (gdb) disassemble $f

so that it handles C++ function names with arguments better.  That
causes a funny regression in a couple testcases however.  For example,
with gdb.dwarf2/atomic-type.exp, we go from this:

 (gdb) disassemble f
 Dump of assembler code for function f:
    0x0000000000001129 <+0>:     endbr64
    0x000000000000112d <+4>:     push   %rbp
    0x000000000000112e <+5>:     mov    %rsp,%rbp
    0x0000000000001131 <+8>:     mov    %rdi,-0x8(%rbp)
    0x0000000000001135 <+12>:    mov    $0x0,%eax
    0x000000000000113a <+17>:    pop    %rbp
    0x000000000000113b <+18>:    ret
 End of assembler dump.

To this:

 (gdb) disassemble 'f'
 No function contains specified address.

The reason the latter command doesn't find the function is that
"disassemble"'s argument is an expression, and 'f' is ambiguous -- it
is being interpreted as an 'f' character:

 (gdb) p 'f'
 $1 = 102 'f'

I don't think there's a way around this ambiguity, so just rename
affected single-character functions to avoid it.  This affects
gdb.dwarf2/atomic-type.exp and
gdb.dwarf2/dw2-bad-mips-linkage-name.exp.

Change-Id: I70aae4971313f1afb529eb881aefaea90b5b715c
In:

 Commit:     Pedro Alves <pedro@palves.net>
 CommitDate: Sat May 15 18:57:31 2021 +0100

     Make DWARF assembler machinery work against the GPU

function_range was tweaked to use a different method to find the
function's ranges when testing for amdgcn.  That method involved
running the program under GDB, and then unrelocating the function
addresses, using the starting address of the device code's DSO, as
listed in "info shared".

I've since found a better way to handle this -- extract the device
code DSO bundle out of the main HIP program binary, and load _that_
into GDB, as if it were a program binary.  Then whatever functions
addresses we get out of GDB are already unrelocated, just what we
need.

With this, function_range is against much closer to what the upstream
version looks like.

I'm extracting the device code DSO manually, using TCL, using the
format described by
https://clang.llvm.org/docs/ClangOffloadBundler.html.

I wrote that extraction code before I knew of
roc-obj-extract/roc-obj-ls.  I have yet to try using those instead,
but I think it should work.  So TBD.

Change-Id: Ie340b4e9493d097c15bbc97677d910640a12ec5a
function_range prints the function's name to figure out its starting
address.  If the function's name is a C++ function name with arguments
and qualifiers, the printing fails, like so:

  (gdb) p /u &__HIP_BlockDim::operator()(unsigned int) const
  No symbol "__HIP_BlockDim" in current context.

We can fix that by wrapping the function name in single quotes, so we
end up with this instead:

  (gdb) p /u &'__HIP_BlockDim::operator()(unsigned int) const'
  $7 = 5468

Another issue is that the function name includes special regexp
characters, like '(' and ')', so when matching the function name in a
regexp, those characters need to be escaped, otherwise, we get this,
for example:

  (gdb) x/2i '__HIP_BlockDim::operator()(unsigned int) const'+272
     0x166c <__HIP_BlockDim::operator()(unsigned int) const+272>: s_setpc_b64 s[22:23]
     0x1670 <__HIP_Coordinates<__HIP_BlockDim>::__X::operator unsigned int() const>:      s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
  (gdb) FAIL: gdb.rocm/dw2-lane-pc.exp: x/2i '__HIP_BlockDim::operator()(unsigned int) const'+272

Fix that by running the function name via string_to_regexp.

Change-Id: Ia276f977cc930243cf62c6e0cf9bb5cbc2816ff1
DWARF::assemble internally can call function_range, and that function
compiles the testcase's source file with debug information, in order
to be able to extract function start/end.  That relies on that built
program's executable code being exactly the same as the code built for
the program without debug information that will be linked with the
generated DWARF.  HIP programs that use the DWARF::assemble framework
are more complicated to build: they need the "hip" option, they need
-fgpu-rdc, etc., which function_range doesn't know anything about.

To address this, teach DWARF::assemble about a new "-exe
TEMPLATE_EXECUTABLE" option, that let's you pass down a pre-built
executable that we can extract function ranges from.

In addition, add a "-target TARGET" option, that let's us pass down
the target we're building the DWARF for.  This is needed because
function_range behaves a little differently depending on which target
we're generating DWARF for (e.g., for amdgcn, we don't use function
labels to find function ranges), and when building a HIP program, the
DejaGnu target is the host.

Change-Id: Ia1fd7f23827af9c8901f96615572ce2cae31deb6
This adds a DW_AT_LLVM_lane_pc testcase that uses the DWARF assembler
framework to generate the DWARF.

This automatically extracts addresses for DW_AT_LLVM_lane_pc in the
if/then/else tests using "info line".

In order to be able to step through the testcase's program, we need
line info.  Writing the line tables manually would be a lot of work,
so we instead rely on Clang's -gline-tables-only.

The main trouble is then how to connect the generated line table entry
with DW_AT_stmt_list for the compile unit we generate, without hand
editing the compiler-generated DWARF to include some label or some
such.  I managed to make it work by compiling the program twice.
After compiling the first time, I extract the needed .debug_line
offset from the binary, and then re-assemble using that value as
DW_AT_stmt_list offset.

One extra complication is that with -gline-tables-only, Clang still
outputs one mostly-empty compilation unit.  This compile unit has a
ranges list, and this confuses GDB, because block/function look ups by
PC may find this bare compilation unit, instead of our manually
written compilation unit.  To address this, we zap away Clang's
compilation unit.  We do this by finding it in the embedded/bundled
device ELF file in the executable, then finding the relevant
compilation unit within it, and overwriting its DIE with 0, the
padding byte.  This is valid, because consumers just skip over padding
bytes silently.

Some of these helper routines are put in lib/rocm.exp because they'll
probably be reused by other HIP testcases that use the DWARF
assembler.

The testcase also uses DW_OP_LLVM_push_lane to describe function
parameters.

Currently this has only been tested on Vega 20.  For other GPUs, we
may need to adjust register numbers, but otherwise, the testcase is
pretty much AMDGPU agnostic.

Change-Id: I786bc563348a1f81ed5772c3f3c18f035b48a93a
commit-id: 71f64ee0
Workaround llvm-dwarfdump warnings mentioned in SWDEV-320473.

Without this, we get:

 executing: /opt/rocm/llvm/bin/llvm-dwarfdump /home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.rocm/dw2-lane-pc/dw2-lane-pc.device.so | grep -B 5 "AT_producer.*clang" |
  grep "Compile Unit:"
 ERROR: tcl error sourcing /home/pedro/rocm/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.rocm/dw2-lane-pc.exp.
 ERROR: comp unit range failed: 1, "0x000004bd: Compile Unit: length = 0x00000028, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x01f7, addr
 _size = 0x08 (next unit at 0x000004e9)
 warning: DWARF unit from offset 0x00000000 incl. to offset 0x0000000b excl. tries to read DIEs at offset 0x0000000b
 warning: DWARF unit from offset 0x000004b2 incl. to offset 0x000004bd excl. tries to read DIEs at offset 0x000004bd"
     while executing
 "error "comp unit range failed: $status, \"$output\"""

There may be an actual bug in the generated DWARF.  I haven't
investigated deeply whether the warnings truly are correct or not.

Bug: https://ontrack-internal.amd.com/browse/SWDEV-320473
Change-Id: I551e2a4f29218ad121966abd6ec76286be63ff6b
Document the $lane_pc user register.

The "D" state in "info lanes".

Give example of "info lanes" when we have divergence debug info.

Document that GDB skips inactive frames.

Document "maint set skip-divergent-regions-with-breakpoint".

Update restrictions section, since we now make use of debug
information describing an inactive lane's logical current PC, if
available.

Change-Id: If125a118799d0caf159226ee3b96b0bc9e6ae05b
  $ objdump --section=.hip_fatbin -h /home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.rocm/dw2-lane-pc/dw2-lane-pc
  objdump: Warning: Unrecognized form: 0x23

  /home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.rocm/dw2-lane-pc/dw2-lane-pc:     file format elf64-x86-64

  Sections:
  Idx Name          Size      VMA               LMA               File off  Algn
   11 .hip_fatbin   00008108  0000000000201000  0000000000201000  00001000  2**12
		    CONTENTS, ALLOC, LOAD, READONLY, DATA

Change-Id: Idb8c903153473625dcf009fd0a2f1b5eefcdef7f
Change-Id: Ia7a4c3a1bfd542e2127443d572e44d74bc2be630
commit-id: 947d9a36
(cherry picked from commit 5240248
to allow building against current rocm-systems dbgapi)

ROCm-dbgapi version 0.79 adds process_id and wave_id arguments to the
amd_dbgapi_address_dependency function.

The amd_dbgapi_address_dependency function is called from
amdgpu_address_scope which receives a ptid_t argument showing the
current thread.  It is however possible that this ptid matches a CPU
thread and not a GPU wave.

When we set a watchpoint from a GPU context on a global variable, that
address is valid on both from the CPU and GPU.  When we resume
execution, we need to insert a watchpoint at that address on both GPU
waves and CPU threads, as both contexts could update that memory.  This
case is however only possible for global addresses, so if the given ptid
is not a GPU thread, we can use AMD_DBGAPI_WAVE_NONE when doing the
call.

As for the test, A GPU global variable can be made visible to the host,
in which case the host can modify it.  Add a testcase that checks that
when we set a watchpoint on a GPU global, we also have a CPU watchpoint
that would trigger if the host side of the program was to modify that
memory location.

Change-Id: Ia90b7194af31b873b270bcdb29bb3a3b4aba32a0
commit-id: a90c52f8
Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf
commit-id: 2a008d92
Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf
commit-id: 2a008d92
"gid" is in v8 on gfx1030 with current compiler.

Used to be in v0 on gfx906 when this was originally written.

Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf
commit-id: 2a008d92
Change-Id: Iee30e934299c98d201e30c32dc4bafd26a2abce9
commit-id: c8e3f2c6
- Handle multiple compile units emitted by Clang
- Fix zapping range length

Change-Id: I51302b57197d260b34668c761b2825b4f80a7d3a
commit-id: 2566ca96
dwarf_assemble currently bundles the GPU .o files in a CPU+GPU bundle.
For some reason, that is making it so that the .debug_info section in
the bundled GPU .o file is not making it into the final linked binary.
Might be related to -fgpu-rdc (LTO).  Thankfully, linking the GPU .o
file directly with -Xoffload-linker works.

Change-Id: I79b0da8ee14898a2ced37352dac059edc9303fc5
commit-id: af0ddc38
Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf
commit-id: 2a008d92
With current clang, we have three compilation units.  Return the
DW_AT_stmt_list of the last one.  This needs a better fix.

Change-Id: I2b3a2b9ac95bc2fee6d0b00689e658ac7930cb72
commit-id: fb8a315e
@palves palves requested a review from a team as a code owner May 13, 2026 12:42
@palves palves added the ci:skip Skip all pre-commit / CI jobs while the label is up label May 13, 2026
@palves
Copy link
Copy Markdown
Collaborator Author

palves commented May 13, 2026

Currently based on amd-staging from 2024. Needs to be properly rebased. I've made it work with current compilers though. Good enough for UI discussions.

This makes GDB's disassembly show:

 - leading "=>" if both logical/lane PC and physical PC are the same address.

 - leading "L>" for the logical PC if logical PC and physical PC are different.

 - leading "P>" for the physical PC if logical PC and physical PC are different.

E.g.:

- "set disassemble-next-line on" when lane is divergent:

 (gdb) set disassemble-next-line on
 (gdb) frame
 #0  lane_pc_test (gid=1, in=..., out=<error reading variable: Cannot access memory at address 0x0>) at ../../../src/gdb/testsuite/gdb.dwarf2/dw2-lane-pc.cc:138
 138           elem = const_array[gid] + 1;              /* if_1_then */
 L> 0x00007ffff5808a98 <lane_pc_test+452>:       00 00 50 dc 0d 00 00 00 flat_load_dword v0, v[13:14]
    0x00007ffff5808aa0 <lane_pc_test+460>:       80 00 8f be     s_mov_b32 s15, 0
    0x00007ffff5808aa4 <lane_pc_test+464>:       0f 02 02 7e     v_mov_b32_e32 v1, s15
    0x00007ffff5808aa8 <lane_pc_test+468>:       82 00 90 be     s_mov_b32 s16, 2

- "set disassemble-next-line on" when lane is active:

 (gdb) si
 138           elem = const_array[gid] + 1;              /* if_1_then */
 => 0x00007ffff5808a98 <lane_pc_test+452>:       00 00 50 dc 0d 00 00 00 flat_load_dword v0, v[13:14]
    0x00007ffff5808aa0 <lane_pc_test+460>:       80 00 8f be     s_mov_b32 s15, 0
    0x00007ffff5808aa4 <lane_pc_test+464>:       0f 02 02 7e     v_mov_b32_e32 v1, s15
    0x00007ffff5808aa8 <lane_pc_test+468>:       82 00 90 be     s_mov_b32 s16, 2

- disassembling a divergent lane:

 (gdb) disassemble
 Dump of assembler code for function lane_pc_test:
 ....
    0x00007ffff5808a94 <+448>:   s_cbranch_execz 91  # 0x7ffff5808c04 <lane_pc_test+816>
 P> 0x00007ffff5808a98 <+452>:   flat_load_dword v0, v[13:14]
    0x00007ffff5808aa0 <+460>:   s_mov_b32 s15, 0
 ....
    0x00007ffff5808c04 <+816>:   s_or_b64 exec, exec, s[4:5]
 L> 0x00007ffff5808c08 <+820>:   flat_load_dword v0, v[13:14]
    0x00007ffff5808c10 <+828>:   s_mov_b32 s4, 1
 ....
 End of assembler dump.

 (gdb) x /2i $pc
 P> 0x7ffff5808a98 <lane_pc_test+452>:   flat_load_dword v0, v[13:14]
    0x7ffff5808aa0 <lane_pc_test+460>:   s_mov_b32 s15, 0
 (gdb) x /2i $lane_pc
 L> 0x7ffff5808c08 <lane_pc_test+820>:   flat_load_dword v0, v[13:14]
    0x7ffff5808c10 <lane_pc_test+828>:   s_mov_b32 s4, 1

- disassembling an active lane:

 (gdb) disassemble
 Dump of assembler code for function lane_pc_test:
 ....
    0x00007ffff5808a94 <+448>:   s_cbranch_execz 91  # 0x7ffff5808c04 <lane_pc_test+816>
 => 0x00007ffff5808a98 <+452>:   flat_load_dword v0, v[13:14]
    0x00007ffff5808aa0 <+460>:   s_mov_b32 s15, 0
 ....
    0x00007ffff5808c04 <+816>:   s_or_b64 exec, exec, s[4:5]
    0x00007ffff5808c08 <+820>:   flat_load_dword v0, v[13:14]
    0x00007ffff5808c10 <+828>:   s_mov_b32 s4, 1
 ....
 End of assembler dump.

 (gdb) x /2i $lane_pc
 => 0x7ffff5808a98 <lane_pc_test+452>:   flat_load_dword v0, v[13:14]
    0x7ffff5808aa0 <lane_pc_test+460>:   s_mov_b32 s15, 0
 (gdb) x /2i $pc
 => 0x7ffff5808a98 <lane_pc_test+452>:   flat_load_dword v0, v[13:14]
    0x7ffff5808aa0 <lane_pc_test+460>:   s_mov_b32 s15, 0

Change-Id: I85b303ab48ea920a98626605fa7f6851bd71acac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:skip Skip all pre-commit / CI jobs while the label is up

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants