WIP: Lane divergence support by palves · Pull Request #124 · ROCm/ROCgdb

palves · 2026-05-13T12:42:40Z

Lane divergence support prototype.

This prototypes support for lane divergence using DW_AT_LLVM_lane_pc from https://llvm.org/docs/AMDGPUUsage.html#dw-at-llvm-lane-pc .

It includes a testcase hand-written using the DWARF assembler from the testsuite. It's actually the first example of a DWARF-assembler based GPU testcase, showing the contortions we need to go through. Those could could/should probably be extracted out making it possible to write other DWARF-assembler based tests using that framework.

clang-offload-bundler -inputs is deprecated, pass multiple -input (singular) arguments instead. Fixes gdb.dwarf2/ testcases with the hip board, like: $ make check RUNTESTFLAGS="--target_board=hip" TESTS="gdb.dwarf2/bad-regnum.exp" ... builtin_spawn -ignore SIGHUP /opt/rocm/llvm/bin/clang-offload-bundler -type=o -targets=hip-amdgcn-amd-amdhsa-gfx906,host-x86_64-unknown-linux-gnu -outputs=/home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.dwarf2/bad-regnum/bad-regnum1.o -inputs=/home/pedro/rocm/gdb/build/gdb/testsuite/temp/3503025/bad-regnum1.o.tmp.o,/home/pedro/rocm/gdb/build/gdb/testsuite/empty-host.o /opt/rocm/llvm/bin/clang-offload-bundler: warning: -inputs is deprecated, use -input instead /opt/rocm/llvm/bin/clang-offload-bundler: warning: -outputs is deprecated, use -output instead UNTESTED: gdb.dwarf2/bad-regnum.exp: failed to prepare ... Change-Id: I3768276c2ddea680fdb6243270fcb0390423a17e

Currently, passing "hip" to gdb_compile has the effect of injecting -ggdb in the compiler (hipcc) options. We don't want to hardcode -ggdb for all HIP compilations, though, because some tests want to exercise testing without debug info, while others want to compile their binary without debug info or with -gline-info-only, and then use the DWARF assembler to generate their own info. The -ggdb option was originally needed because it enables an LLVM option that is essential for debugging (-mllvm -amdgpu-spill-cfi-saved-regs). Nowadays, -g enables that LLVM option too, so we can drop the -ggdb from all compilations, and instead rely on -g being added via testcases passing "debug" as gdb_compile option. There's one wrinkle, though. Said LLVM option added by -g (-mllvm -amdgpu-spill-cfi-saved-regs) also affects code generation, unfortunately. Some features of the DWARF assembler machinery rely on code generated by the compiler without -g to be the exact same as the code generated with -g. So, instead of just removing -ggdb, explicitly compile with the LLVM option that -g/-ggdb would enable. Change-Id: I64a938edf97c85d0ea9fc34b37071daa560de7e7

After the previous patch that removed the hardcoded -ggdb, a lot of the gdb.dwarf2/ testcases started failing when tested against --target_board=hip, like so: Thread 5 "data-loc" hit Breakpoint 1, with lane 0, 0x00007ffff5877098 in main () from file:///home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.dwarf2/data-loc/data-loc#offset=8192&size=52360 (gdb) FAIL: gdb.dwarf2/data-loc.exp: running to main in runto The problem is the "with lane 0" part, which isn't expected by runto. This pattern only matches when there's no debug info, that's why it went unnoticed with most testcases. Fix it by optionally expecting the "with lane" part, and add a regular HIP testcase that would fail without the fix. The fix has this effect on gdb.dwarf2/*.exp tests, with --target_board=hip: -# of expected passes 311 -# of unexpected failures 368 +# of expected passes 704 +# of unexpected failures 469 Note failures go up because more tests manage to run. Change-Id: I51253e1f91deec007071201c05973f5c1be75204

Fixes, for example: $ make check RUNTESTFLAGS="--target_board=hip" TESTS="gdb.dwarf2/void-type.exp" ... DUPLICATE: gdb.dwarf2/void-type.exp: set print asm-demangle on DUPLICATE: gdb.dwarf2/void-type.exp: set print demangle on ... Passing "" as second argument to gdb_test_no_output no longer works to stop it from issuing a PASS nowadays. Use -nopass instead. Change-Id: Ia1a41b7d17e62736b3b74e605163800908dc1b90

Read in the DW_AT_LLVM_lane_pc attribute, and store the expressions as dynamic properties similarly to how static_link is stored. Expose DW_AT_LLVM_lane_pc as a $lane_pc user register. The implementation finds the current function, and evaluates the corresponding DW_AT_LLVM_lane_pc expression. This also adds a $__lane_pc_array user register, that shows the evaluation of DW_AT_LLVM_lane_pc in its array form, with one element per lane. Also suppports DW_AT_LLVM_lane_pc described with both DW_FORM_sec_offset and DW_FORM_loclistx. The latter motivates the attr_to_dynamic_prop change. Change-Id: If7ed8aef1ffffbe95f632f3dab68cfb6eb43e37f

For "bt", "up", "down". E.g.: (gdb) bt warning: lane is divergent, skipping inactive frames #1 func1 () at kernel.cc:114 #2 0x00007ffff5805c3c in kernel () at kernel.cc:151 (gdb) frame #1 func1 () at kernel.cc:114 (gdb) down Bottom (innermost) frame selected; you cannot go down. It is still possible to force GDB to select an inactive frame using the "frame" command. E.g., with "frame 0". Change-Id: Ib3c87bbc739d32822b39fc91d445fb27d62a5e60

When displaying the lane's current location, use the logical/lane PC instead of the physical/wave PC. Add and make use of a new get_frame_lane_pc routine that returns the frame's lane PC, as a counterpart to get_frame_pc, which returns the physical PC. Change-Id: I305b738bc524b25f19d3b1385e1fed127240e501

Currently, "info lanes", either shows the wave's current frame, or "(inactive)", like so: (gdb) info lanes Id State Target Id Frame * 0 A AMDGPU Lane 1:2:1:1/0 (0,0,0)[0,0,0] lane_pc_test (gid=0) at dw2-lane-pc.cc:102 1 I AMDGPU Lane 1:2:1:1/1 (0,0,0)[1,0,0] (inactive) This patch makes it so that if we have divergence debug info, we print the lane's logical location instead, and show state as "D" (divergent). For example, here, lane 1 is divergent in an if/then/else: (gdb) info lanes 0-2 Id State Target Id Frame * 0 A AMDGPU Lane 1:2:1:1/0 (0,0,0)[0,0,0] lane_pc_test (gid=0) at dw2-lane-pc.cc:102 1 D AMDGPU Lane 1:2:1:1/1 (0,0,0)[1,0,0] lane_pc_test (gid=1) at dw2-lane-pc.cc:97 and here, lane 1 called a function while lane 0 is divergent, so lane 0's last active frame is not the wave's current frame: (gdb) info lanes 0-2 Id State Target Id Frame 0 D AMDGPU Lane 1:2:1:1/0 (0,0,0)[0,0,0] lane_pc_test (gid=0) at dw2-lane-pc.cc:114 * 1 A AMDGPU Lane 1:2:1:1/1 (0,0,0)[1,0,0] foo (gid=1) dw2-lane-pc.cc:83 Change-Id: I523b666f77da586512cd9a69f81ec07a60b3fcf2

Change-Id: I42207b5b90ae50d0e9675c990eeb65987ad9a7e9

The lane-stepping support will need lane-specific breakpoints. This adds the funcionality to the breakpoints module. It doesn't, however, expose a user-visible way to create such breakpoints. Change-Id: Ib44646f1a30557c3a41f4c82e91a1e39a672236e

A later patch in the series will add a new call to thread_info::active_simd_lanes_mask(), and that has the unfortunate consequence of flushing the frame cache, which in turn has the unfortunate consequence of changing the annotations output, causing a regression in gdb.cp/annota2.exp -- a "frames-invalid" appears at a spot that isn't expected. The frame cache flush is caused by thread_info::active_simd_lanes_mask() calling switch_to_inferior_no_thread. If we call switch_to_thread instead, we spare the cache flush if the THIS thread is already the current thread, thus avoiding the annotation churn. This was already done in thread_info::has_simd_lanes(), but missed here. That change surprisingly regresses gdb.base/annota1.exp (i.e., to fix annota2.exp we break annota1.exp), but there's a catch -- it only regresses it because an earlier lane debugging commit changed the testcase: commit bab976b Commit: Pedro Alves <pedro@palves.net> CommitDate: Fri Aug 20 12:12:05 2021 +0100 Base lane debugging support The log of that commit says: "gdb.base/annota1.exp had to be tweaked because the patch has the side effect of changing the order of a frames-invalid and a breakpoints-invalid annotion." Reverting the gdb.base/annota1.exp hunk from that commit fixes the regression caused by this commit. Change-Id: I15e857e113195fbddf21173a6471495e435641e6

This teaches execution commands like next/step to handle when the case of the current lane being divergent. In such case, the frame considered for "current source line", etc. should be the first active frame, not the thread's current frame, which may inactive for the current lane. Also, don't start range stepping if the current lane is not active, as the current line range of an inactive lane is unrelated to the current line range of active lanes. It's not worth it to try to optimize this case, as GDB will immediately start skipping the inactive code region, and that will be done by running to a breakpoint after the following patch. Change-Id: Idddcc4f3e457ab39321c6c3ef91e8ac848e97a79

GDB currently is able to step/next divergent lanes. When the thread is stepping, if the current lane becomes inactive, then GDB continues stepping until it becomes active, at which point GDB resumes the normal stepping algorithm (check whether we reached a different line). However, single-stepping the whole divergent range can be slow, as there may be many instructions to step, e.g., because the inactive code calls functions. There's a simple way to avoid the instruction single-stepping though -- the logical lane PC is supposed to point at the next instruction the lane will execute when it becomes active, so just set a breakpoint there, and run to it. In case the compiler emits bad DW_AT_LLVM_lane_pc info, add a command to disable the stepping over divergent regions with a breakpoint: "maint set skip-divergent-regions-with-breakpoint on/off". When off, GDB skips the divergent regions by doing the slower single stepping. Change-Id: Id04d1707b31fb015c2c1c26561c7c633b072c8a6

A following patch will change the DWARF assembler's get_func_range function to wrap function names with single quotes, like, (gdb) disassemble '$f' instead of (gdb) disassemble $f so that it handles C++ function names with arguments better. That causes a funny regression in a couple testcases however. For example, with gdb.dwarf2/atomic-type.exp, we go from this: (gdb) disassemble f Dump of assembler code for function f: 0x0000000000001129 <+0>: endbr64 0x000000000000112d <+4>: push %rbp 0x000000000000112e <+5>: mov %rsp,%rbp 0x0000000000001131 <+8>: mov %rdi,-0x8(%rbp) 0x0000000000001135 <+12>: mov $0x0,%eax 0x000000000000113a <+17>: pop %rbp 0x000000000000113b <+18>: ret End of assembler dump. To this: (gdb) disassemble 'f' No function contains specified address. The reason the latter command doesn't find the function is that "disassemble"'s argument is an expression, and 'f' is ambiguous -- it is being interpreted as an 'f' character: (gdb) p 'f' $1 = 102 'f' I don't think there's a way around this ambiguity, so just rename affected single-character functions to avoid it. This affects gdb.dwarf2/atomic-type.exp and gdb.dwarf2/dw2-bad-mips-linkage-name.exp. Change-Id: I70aae4971313f1afb529eb881aefaea90b5b715c

In: Commit: Pedro Alves <pedro@palves.net> CommitDate: Sat May 15 18:57:31 2021 +0100 Make DWARF assembler machinery work against the GPU function_range was tweaked to use a different method to find the function's ranges when testing for amdgcn. That method involved running the program under GDB, and then unrelocating the function addresses, using the starting address of the device code's DSO, as listed in "info shared". I've since found a better way to handle this -- extract the device code DSO bundle out of the main HIP program binary, and load _that_ into GDB, as if it were a program binary. Then whatever functions addresses we get out of GDB are already unrelocated, just what we need. With this, function_range is against much closer to what the upstream version looks like. I'm extracting the device code DSO manually, using TCL, using the format described by https://clang.llvm.org/docs/ClangOffloadBundler.html. I wrote that extraction code before I knew of roc-obj-extract/roc-obj-ls. I have yet to try using those instead, but I think it should work. So TBD. Change-Id: Ie340b4e9493d097c15bbc97677d910640a12ec5a

function_range prints the function's name to figure out its starting address. If the function's name is a C++ function name with arguments and qualifiers, the printing fails, like so: (gdb) p /u &__HIP_BlockDim::operator()(unsigned int) const No symbol "__HIP_BlockDim" in current context. We can fix that by wrapping the function name in single quotes, so we end up with this instead: (gdb) p /u &'__HIP_BlockDim::operator()(unsigned int) const' $7 = 5468 Another issue is that the function name includes special regexp characters, like '(' and ')', so when matching the function name in a regexp, those characters need to be escaped, otherwise, we get this, for example: (gdb) x/2i '__HIP_BlockDim::operator()(unsigned int) const'+272 0x166c <__HIP_BlockDim::operator()(unsigned int) const+272>: s_setpc_b64 s[22:23] 0x1670 <__HIP_Coordinates<__HIP_BlockDim>::__X::operator unsigned int() const>: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) (gdb) FAIL: gdb.rocm/dw2-lane-pc.exp: x/2i '__HIP_BlockDim::operator()(unsigned int) const'+272 Fix that by running the function name via string_to_regexp. Change-Id: Ia276f977cc930243cf62c6e0cf9bb5cbc2816ff1

DWARF::assemble internally can call function_range, and that function compiles the testcase's source file with debug information, in order to be able to extract function start/end. That relies on that built program's executable code being exactly the same as the code built for the program without debug information that will be linked with the generated DWARF. HIP programs that use the DWARF::assemble framework are more complicated to build: they need the "hip" option, they need -fgpu-rdc, etc., which function_range doesn't know anything about. To address this, teach DWARF::assemble about a new "-exe TEMPLATE_EXECUTABLE" option, that let's you pass down a pre-built executable that we can extract function ranges from. In addition, add a "-target TARGET" option, that let's us pass down the target we're building the DWARF for. This is needed because function_range behaves a little differently depending on which target we're generating DWARF for (e.g., for amdgcn, we don't use function labels to find function ranges), and when building a HIP program, the DejaGnu target is the host. Change-Id: Ia1fd7f23827af9c8901f96615572ce2cae31deb6

This adds a DW_AT_LLVM_lane_pc testcase that uses the DWARF assembler framework to generate the DWARF. This automatically extracts addresses for DW_AT_LLVM_lane_pc in the if/then/else tests using "info line". In order to be able to step through the testcase's program, we need line info. Writing the line tables manually would be a lot of work, so we instead rely on Clang's -gline-tables-only. The main trouble is then how to connect the generated line table entry with DW_AT_stmt_list for the compile unit we generate, without hand editing the compiler-generated DWARF to include some label or some such. I managed to make it work by compiling the program twice. After compiling the first time, I extract the needed .debug_line offset from the binary, and then re-assemble using that value as DW_AT_stmt_list offset. One extra complication is that with -gline-tables-only, Clang still outputs one mostly-empty compilation unit. This compile unit has a ranges list, and this confuses GDB, because block/function look ups by PC may find this bare compilation unit, instead of our manually written compilation unit. To address this, we zap away Clang's compilation unit. We do this by finding it in the embedded/bundled device ELF file in the executable, then finding the relevant compilation unit within it, and overwriting its DIE with 0, the padding byte. This is valid, because consumers just skip over padding bytes silently. Some of these helper routines are put in lib/rocm.exp because they'll probably be reused by other HIP testcases that use the DWARF assembler. The testcase also uses DW_OP_LLVM_push_lane to describe function parameters. Currently this has only been tested on Vega 20. For other GPUs, we may need to adjust register numbers, but otherwise, the testcase is pretty much AMDGPU agnostic. Change-Id: I786bc563348a1f81ed5772c3f3c18f035b48a93a commit-id: 71f64ee0

Workaround llvm-dwarfdump warnings mentioned in SWDEV-320473. Without this, we get: executing: /opt/rocm/llvm/bin/llvm-dwarfdump /home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.rocm/dw2-lane-pc/dw2-lane-pc.device.so | grep -B 5 "AT_producer.*clang" | grep "Compile Unit:" ERROR: tcl error sourcing /home/pedro/rocm/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.rocm/dw2-lane-pc.exp. ERROR: comp unit range failed: 1, "0x000004bd: Compile Unit: length = 0x00000028, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x01f7, addr _size = 0x08 (next unit at 0x000004e9) warning: DWARF unit from offset 0x00000000 incl. to offset 0x0000000b excl. tries to read DIEs at offset 0x0000000b warning: DWARF unit from offset 0x000004b2 incl. to offset 0x000004bd excl. tries to read DIEs at offset 0x000004bd" while executing "error "comp unit range failed: $status, \"$output\""" There may be an actual bug in the generated DWARF. I haven't investigated deeply whether the warnings truly are correct or not. Bug: https://ontrack-internal.amd.com/browse/SWDEV-320473 Change-Id: I551e2a4f29218ad121966abd6ec76286be63ff6b

Document the $lane_pc user register. The "D" state in "info lanes". Give example of "info lanes" when we have divergence debug info. Document that GDB skips inactive frames. Document "maint set skip-divergent-regions-with-breakpoint". Update restrictions section, since we now make use of debug information describing an inactive lane's logical current PC, if available. Change-Id: If125a118799d0caf159226ee3b96b0bc9e6ae05b

$ objdump --section=.hip_fatbin -h /home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.rocm/dw2-lane-pc/dw2-lane-pc objdump: Warning: Unrecognized form: 0x23 /home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.rocm/dw2-lane-pc/dw2-lane-pc: file format elf64-x86-64 Sections: Idx Name Size VMA LMA File off Algn 11 .hip_fatbin 00008108 0000000000201000 0000000000201000 00001000 2**12 CONTENTS, ALLOC, LOAD, READONLY, DATA Change-Id: Idb8c903153473625dcf009fd0a2f1b5eefcdef7f

Change-Id: Ia7a4c3a1bfd542e2127443d572e44d74bc2be630 commit-id: 947d9a36

(cherry picked from commit 5240248 to allow building against current rocm-systems dbgapi) ROCm-dbgapi version 0.79 adds process_id and wave_id arguments to the amd_dbgapi_address_dependency function. The amd_dbgapi_address_dependency function is called from amdgpu_address_scope which receives a ptid_t argument showing the current thread. It is however possible that this ptid matches a CPU thread and not a GPU wave. When we set a watchpoint from a GPU context on a global variable, that address is valid on both from the CPU and GPU. When we resume execution, we need to insert a watchpoint at that address on both GPU waves and CPU threads, as both contexts could update that memory. This case is however only possible for global addresses, so if the given ptid is not a GPU thread, we can use AMD_DBGAPI_WAVE_NONE when doing the call. As for the test, A GPU global variable can be made visible to the host, in which case the host can modify it. Add a testcase that checks that when we set a watchpoint on a GPU global, we also have a CPU watchpoint that would trigger if the host side of the program was to modify that memory location. Change-Id: Ia90b7194af31b873b270bcdb29bb3a3b4aba32a0 commit-id: a90c52f8

Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf commit-id: 2a008d92

"gid" is in v8 on gfx1030 with current compiler. Used to be in v0 on gfx906 when this was originally written. Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf commit-id: 2a008d92

Change-Id: Iee30e934299c98d201e30c32dc4bafd26a2abce9 commit-id: c8e3f2c6

- Handle multiple compile units emitted by Clang - Fix zapping range length Change-Id: I51302b57197d260b34668c761b2825b4f80a7d3a commit-id: 2566ca96

dwarf_assemble currently bundles the GPU .o files in a CPU+GPU bundle. For some reason, that is making it so that the .debug_info section in the bundled GPU .o file is not making it into the final linked binary. Might be related to -fgpu-rdc (LTO). Thankfully, linking the GPU .o file directly with -Xoffload-linker works. Change-Id: I79b0da8ee14898a2ced37352dac059edc9303fc5 commit-id: af0ddc38

Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf commit-id: 2a008d92

With current clang, we have three compilation units. Return the DW_AT_stmt_list of the last one. This needs a better fix. Change-Id: I2b3a2b9ac95bc2fee6d0b00689e658ac7930cb72 commit-id: fb8a315e

palves · 2026-05-13T12:43:58Z

Currently based on amd-staging from 2024. Needs to be properly rebased. I've made it work with current compilers though. Good enough for UI discussions.

This makes GDB's disassembly show: - leading "=>" if both logical/lane PC and physical PC are the same address. - leading "L>" for the logical PC if logical PC and physical PC are different. - leading "P>" for the physical PC if logical PC and physical PC are different. E.g.: - "set disassemble-next-line on" when lane is divergent: (gdb) set disassemble-next-line on (gdb) frame #0 lane_pc_test (gid=1, in=..., out=<error reading variable: Cannot access memory at address 0x0>) at ../../../src/gdb/testsuite/gdb.dwarf2/dw2-lane-pc.cc:138 138 elem = const_array[gid] + 1; /* if_1_then */ L> 0x00007ffff5808a98 <lane_pc_test+452>: 00 00 50 dc 0d 00 00 00 flat_load_dword v0, v[13:14] 0x00007ffff5808aa0 <lane_pc_test+460>: 80 00 8f be s_mov_b32 s15, 0 0x00007ffff5808aa4 <lane_pc_test+464>: 0f 02 02 7e v_mov_b32_e32 v1, s15 0x00007ffff5808aa8 <lane_pc_test+468>: 82 00 90 be s_mov_b32 s16, 2 - "set disassemble-next-line on" when lane is active: (gdb) si 138 elem = const_array[gid] + 1; /* if_1_then */ => 0x00007ffff5808a98 <lane_pc_test+452>: 00 00 50 dc 0d 00 00 00 flat_load_dword v0, v[13:14] 0x00007ffff5808aa0 <lane_pc_test+460>: 80 00 8f be s_mov_b32 s15, 0 0x00007ffff5808aa4 <lane_pc_test+464>: 0f 02 02 7e v_mov_b32_e32 v1, s15 0x00007ffff5808aa8 <lane_pc_test+468>: 82 00 90 be s_mov_b32 s16, 2 - disassembling a divergent lane: (gdb) disassemble Dump of assembler code for function lane_pc_test: .... 0x00007ffff5808a94 <+448>: s_cbranch_execz 91 # 0x7ffff5808c04 <lane_pc_test+816> P> 0x00007ffff5808a98 <+452>: flat_load_dword v0, v[13:14] 0x00007ffff5808aa0 <+460>: s_mov_b32 s15, 0 .... 0x00007ffff5808c04 <+816>: s_or_b64 exec, exec, s[4:5] L> 0x00007ffff5808c08 <+820>: flat_load_dword v0, v[13:14] 0x00007ffff5808c10 <+828>: s_mov_b32 s4, 1 .... End of assembler dump. (gdb) x /2i $pc P> 0x7ffff5808a98 <lane_pc_test+452>: flat_load_dword v0, v[13:14] 0x7ffff5808aa0 <lane_pc_test+460>: s_mov_b32 s15, 0 (gdb) x /2i $lane_pc L> 0x7ffff5808c08 <lane_pc_test+820>: flat_load_dword v0, v[13:14] 0x7ffff5808c10 <lane_pc_test+828>: s_mov_b32 s4, 1 - disassembling an active lane: (gdb) disassemble Dump of assembler code for function lane_pc_test: .... 0x00007ffff5808a94 <+448>: s_cbranch_execz 91 # 0x7ffff5808c04 <lane_pc_test+816> => 0x00007ffff5808a98 <+452>: flat_load_dword v0, v[13:14] 0x00007ffff5808aa0 <+460>: s_mov_b32 s15, 0 .... 0x00007ffff5808c04 <+816>: s_or_b64 exec, exec, s[4:5] 0x00007ffff5808c08 <+820>: flat_load_dword v0, v[13:14] 0x00007ffff5808c10 <+828>: s_mov_b32 s4, 1 .... End of assembler dump. (gdb) x /2i $lane_pc => 0x7ffff5808a98 <lane_pc_test+452>: flat_load_dword v0, v[13:14] 0x7ffff5808aa0 <lane_pc_test+460>: s_mov_b32 s15, 0 (gdb) x /2i $pc => 0x7ffff5808a98 <lane_pc_test+452>: flat_load_dword v0, v[13:14] 0x7ffff5808aa0 <lane_pc_test+460>: s_mov_b32 s15, 0 Change-Id: I85b303ab48ea920a98626605fa7f6851bd71acac

palves and others added 30 commits June 25, 2024 20:46

Make "maint set lane-divergence-support off" disable lane PC support

960df93

Change-Id: I42207b5b90ae50d0e9675c990eeb65987ad9a7e9

Fix assertion

62c6908

Change-Id: Ia7a4c3a1bfd542e2127443d572e44d74bc2be630 commit-id: 947d9a36

Fix DW_AT_LLVM_lane_pc constant

a299f36

Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf commit-id: 2a008d92

Switch to DW_OP_LLVM_user OPs

e11c7a0

Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf commit-id: 2a008d92

Fix gid location

ffefe1e

"gid" is in v8 on gfx1030 with current compiler. Used to be in v0 on gfx906 when this was originally written. Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf commit-id: 2a008d92

Adjust for gfx1030

69a1939

Change-Id: Iee30e934299c98d201e30c32dc4bafd26a2abce9 commit-id: c8e3f2c6

Fix zapping

cecd426

- Handle multiple compile units emitted by Clang - Fix zapping range length Change-Id: I51302b57197d260b34668c761b2825b4f80a7d3a commit-id: 2566ca96

-mwavefrontsize64

70f8306

Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf commit-id: 2a008d92

Make hip_extract_stmt_list handle multiple CUs

264edf9

With current clang, we have three compilation units. Return the DW_AT_stmt_list of the last one. This needs a better fix. Change-Id: I2b3a2b9ac95bc2fee6d0b00689e658ac7930cb72 commit-id: fb8a315e

palves requested a review from a team as a code owner May 13, 2026 12:42

palves added the ci:skip Skip all pre-commit / CI jobs while the label is up label May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Lane divergence support#124

WIP: Lane divergence support#124
palves wants to merge 32 commits into
amd-stagingfrom
users/palves/lane-divergence

palves commented May 13, 2026

Uh oh!

palves commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

palves commented May 13, 2026

Uh oh!

palves commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants