WIP: Lane divergence support#124
Open
palves wants to merge 32 commits into
Open
Conversation
clang-offload-bundler -inputs is deprecated, pass multiple -input (singular) arguments instead. Fixes gdb.dwarf2/ testcases with the hip board, like: $ make check RUNTESTFLAGS="--target_board=hip" TESTS="gdb.dwarf2/bad-regnum.exp" ... builtin_spawn -ignore SIGHUP /opt/rocm/llvm/bin/clang-offload-bundler -type=o -targets=hip-amdgcn-amd-amdhsa-gfx906,host-x86_64-unknown-linux-gnu -outputs=/home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.dwarf2/bad-regnum/bad-regnum1.o -inputs=/home/pedro/rocm/gdb/build/gdb/testsuite/temp/3503025/bad-regnum1.o.tmp.o,/home/pedro/rocm/gdb/build/gdb/testsuite/empty-host.o /opt/rocm/llvm/bin/clang-offload-bundler: warning: -inputs is deprecated, use -input instead /opt/rocm/llvm/bin/clang-offload-bundler: warning: -outputs is deprecated, use -output instead UNTESTED: gdb.dwarf2/bad-regnum.exp: failed to prepare ... Change-Id: I3768276c2ddea680fdb6243270fcb0390423a17e
Currently, passing "hip" to gdb_compile has the effect of injecting -ggdb in the compiler (hipcc) options. We don't want to hardcode -ggdb for all HIP compilations, though, because some tests want to exercise testing without debug info, while others want to compile their binary without debug info or with -gline-info-only, and then use the DWARF assembler to generate their own info. The -ggdb option was originally needed because it enables an LLVM option that is essential for debugging (-mllvm -amdgpu-spill-cfi-saved-regs). Nowadays, -g enables that LLVM option too, so we can drop the -ggdb from all compilations, and instead rely on -g being added via testcases passing "debug" as gdb_compile option. There's one wrinkle, though. Said LLVM option added by -g (-mllvm -amdgpu-spill-cfi-saved-regs) also affects code generation, unfortunately. Some features of the DWARF assembler machinery rely on code generated by the compiler without -g to be the exact same as the code generated with -g. So, instead of just removing -ggdb, explicitly compile with the LLVM option that -g/-ggdb would enable. Change-Id: I64a938edf97c85d0ea9fc34b37071daa560de7e7
After the previous patch that removed the hardcoded -ggdb, a lot of the gdb.dwarf2/ testcases started failing when tested against --target_board=hip, like so: Thread 5 "data-loc" hit Breakpoint 1, with lane 0, 0x00007ffff5877098 in main () from file:///home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.dwarf2/data-loc/data-loc#offset=8192&size=52360 (gdb) FAIL: gdb.dwarf2/data-loc.exp: running to main in runto The problem is the "with lane 0" part, which isn't expected by runto. This pattern only matches when there's no debug info, that's why it went unnoticed with most testcases. Fix it by optionally expecting the "with lane" part, and add a regular HIP testcase that would fail without the fix. The fix has this effect on gdb.dwarf2/*.exp tests, with --target_board=hip: -# of expected passes 311 -# of unexpected failures 368 +# of expected passes 704 +# of unexpected failures 469 Note failures go up because more tests manage to run. Change-Id: I51253e1f91deec007071201c05973f5c1be75204
Fixes, for example: $ make check RUNTESTFLAGS="--target_board=hip" TESTS="gdb.dwarf2/void-type.exp" ... DUPLICATE: gdb.dwarf2/void-type.exp: set print asm-demangle on DUPLICATE: gdb.dwarf2/void-type.exp: set print demangle on ... Passing "" as second argument to gdb_test_no_output no longer works to stop it from issuing a PASS nowadays. Use -nopass instead. Change-Id: Ia1a41b7d17e62736b3b74e605163800908dc1b90
Read in the DW_AT_LLVM_lane_pc attribute, and store the expressions as dynamic properties similarly to how static_link is stored. Expose DW_AT_LLVM_lane_pc as a $lane_pc user register. The implementation finds the current function, and evaluates the corresponding DW_AT_LLVM_lane_pc expression. This also adds a $__lane_pc_array user register, that shows the evaluation of DW_AT_LLVM_lane_pc in its array form, with one element per lane. Also suppports DW_AT_LLVM_lane_pc described with both DW_FORM_sec_offset and DW_FORM_loclistx. The latter motivates the attr_to_dynamic_prop change. Change-Id: If7ed8aef1ffffbe95f632f3dab68cfb6eb43e37f
For "bt", "up", "down". E.g.: (gdb) bt warning: lane is divergent, skipping inactive frames #1 func1 () at kernel.cc:114 #2 0x00007ffff5805c3c in kernel () at kernel.cc:151 (gdb) frame #1 func1 () at kernel.cc:114 (gdb) down Bottom (innermost) frame selected; you cannot go down. It is still possible to force GDB to select an inactive frame using the "frame" command. E.g., with "frame 0". Change-Id: Ib3c87bbc739d32822b39fc91d445fb27d62a5e60
When displaying the lane's current location, use the logical/lane PC instead of the physical/wave PC. Add and make use of a new get_frame_lane_pc routine that returns the frame's lane PC, as a counterpart to get_frame_pc, which returns the physical PC. Change-Id: I305b738bc524b25f19d3b1385e1fed127240e501
Currently, "info lanes", either shows the wave's current frame, or "(inactive)", like so: (gdb) info lanes Id State Target Id Frame * 0 A AMDGPU Lane 1:2:1:1/0 (0,0,0)[0,0,0] lane_pc_test (gid=0) at dw2-lane-pc.cc:102 1 I AMDGPU Lane 1:2:1:1/1 (0,0,0)[1,0,0] (inactive) This patch makes it so that if we have divergence debug info, we print the lane's logical location instead, and show state as "D" (divergent). For example, here, lane 1 is divergent in an if/then/else: (gdb) info lanes 0-2 Id State Target Id Frame * 0 A AMDGPU Lane 1:2:1:1/0 (0,0,0)[0,0,0] lane_pc_test (gid=0) at dw2-lane-pc.cc:102 1 D AMDGPU Lane 1:2:1:1/1 (0,0,0)[1,0,0] lane_pc_test (gid=1) at dw2-lane-pc.cc:97 and here, lane 1 called a function while lane 0 is divergent, so lane 0's last active frame is not the wave's current frame: (gdb) info lanes 0-2 Id State Target Id Frame 0 D AMDGPU Lane 1:2:1:1/0 (0,0,0)[0,0,0] lane_pc_test (gid=0) at dw2-lane-pc.cc:114 * 1 A AMDGPU Lane 1:2:1:1/1 (0,0,0)[1,0,0] foo (gid=1) dw2-lane-pc.cc:83 Change-Id: I523b666f77da586512cd9a69f81ec07a60b3fcf2
Change-Id: I42207b5b90ae50d0e9675c990eeb65987ad9a7e9
The lane-stepping support will need lane-specific breakpoints. This adds the funcionality to the breakpoints module. It doesn't, however, expose a user-visible way to create such breakpoints. Change-Id: Ib44646f1a30557c3a41f4c82e91a1e39a672236e
A later patch in the series will add a new call to thread_info::active_simd_lanes_mask(), and that has the unfortunate consequence of flushing the frame cache, which in turn has the unfortunate consequence of changing the annotations output, causing a regression in gdb.cp/annota2.exp -- a "frames-invalid" appears at a spot that isn't expected. The frame cache flush is caused by thread_info::active_simd_lanes_mask() calling switch_to_inferior_no_thread. If we call switch_to_thread instead, we spare the cache flush if the THIS thread is already the current thread, thus avoiding the annotation churn. This was already done in thread_info::has_simd_lanes(), but missed here. That change surprisingly regresses gdb.base/annota1.exp (i.e., to fix annota2.exp we break annota1.exp), but there's a catch -- it only regresses it because an earlier lane debugging commit changed the testcase: commit bab976b Commit: Pedro Alves <pedro@palves.net> CommitDate: Fri Aug 20 12:12:05 2021 +0100 Base lane debugging support The log of that commit says: "gdb.base/annota1.exp had to be tweaked because the patch has the side effect of changing the order of a frames-invalid and a breakpoints-invalid annotion." Reverting the gdb.base/annota1.exp hunk from that commit fixes the regression caused by this commit. Change-Id: I15e857e113195fbddf21173a6471495e435641e6
This teaches execution commands like next/step to handle when the case of the current lane being divergent. In such case, the frame considered for "current source line", etc. should be the first active frame, not the thread's current frame, which may inactive for the current lane. Also, don't start range stepping if the current lane is not active, as the current line range of an inactive lane is unrelated to the current line range of active lanes. It's not worth it to try to optimize this case, as GDB will immediately start skipping the inactive code region, and that will be done by running to a breakpoint after the following patch. Change-Id: Idddcc4f3e457ab39321c6c3ef91e8ac848e97a79
GDB currently is able to step/next divergent lanes. When the thread is stepping, if the current lane becomes inactive, then GDB continues stepping until it becomes active, at which point GDB resumes the normal stepping algorithm (check whether we reached a different line). However, single-stepping the whole divergent range can be slow, as there may be many instructions to step, e.g., because the inactive code calls functions. There's a simple way to avoid the instruction single-stepping though -- the logical lane PC is supposed to point at the next instruction the lane will execute when it becomes active, so just set a breakpoint there, and run to it. In case the compiler emits bad DW_AT_LLVM_lane_pc info, add a command to disable the stepping over divergent regions with a breakpoint: "maint set skip-divergent-regions-with-breakpoint on/off". When off, GDB skips the divergent regions by doing the slower single stepping. Change-Id: Id04d1707b31fb015c2c1c26561c7c633b072c8a6
A following patch will change the DWARF assembler's get_func_range
function to wrap function names with single quotes, like,
(gdb) disassemble '$f'
instead of
(gdb) disassemble $f
so that it handles C++ function names with arguments better. That
causes a funny regression in a couple testcases however. For example,
with gdb.dwarf2/atomic-type.exp, we go from this:
(gdb) disassemble f
Dump of assembler code for function f:
0x0000000000001129 <+0>: endbr64
0x000000000000112d <+4>: push %rbp
0x000000000000112e <+5>: mov %rsp,%rbp
0x0000000000001131 <+8>: mov %rdi,-0x8(%rbp)
0x0000000000001135 <+12>: mov $0x0,%eax
0x000000000000113a <+17>: pop %rbp
0x000000000000113b <+18>: ret
End of assembler dump.
To this:
(gdb) disassemble 'f'
No function contains specified address.
The reason the latter command doesn't find the function is that
"disassemble"'s argument is an expression, and 'f' is ambiguous -- it
is being interpreted as an 'f' character:
(gdb) p 'f'
$1 = 102 'f'
I don't think there's a way around this ambiguity, so just rename
affected single-character functions to avoid it. This affects
gdb.dwarf2/atomic-type.exp and
gdb.dwarf2/dw2-bad-mips-linkage-name.exp.
Change-Id: I70aae4971313f1afb529eb881aefaea90b5b715c
In:
Commit: Pedro Alves <pedro@palves.net>
CommitDate: Sat May 15 18:57:31 2021 +0100
Make DWARF assembler machinery work against the GPU
function_range was tweaked to use a different method to find the
function's ranges when testing for amdgcn. That method involved
running the program under GDB, and then unrelocating the function
addresses, using the starting address of the device code's DSO, as
listed in "info shared".
I've since found a better way to handle this -- extract the device
code DSO bundle out of the main HIP program binary, and load _that_
into GDB, as if it were a program binary. Then whatever functions
addresses we get out of GDB are already unrelocated, just what we
need.
With this, function_range is against much closer to what the upstream
version looks like.
I'm extracting the device code DSO manually, using TCL, using the
format described by
https://clang.llvm.org/docs/ClangOffloadBundler.html.
I wrote that extraction code before I knew of
roc-obj-extract/roc-obj-ls. I have yet to try using those instead,
but I think it should work. So TBD.
Change-Id: Ie340b4e9493d097c15bbc97677d910640a12ec5a
function_range prints the function's name to figure out its starting
address. If the function's name is a C++ function name with arguments
and qualifiers, the printing fails, like so:
(gdb) p /u &__HIP_BlockDim::operator()(unsigned int) const
No symbol "__HIP_BlockDim" in current context.
We can fix that by wrapping the function name in single quotes, so we
end up with this instead:
(gdb) p /u &'__HIP_BlockDim::operator()(unsigned int) const'
$7 = 5468
Another issue is that the function name includes special regexp
characters, like '(' and ')', so when matching the function name in a
regexp, those characters need to be escaped, otherwise, we get this,
for example:
(gdb) x/2i '__HIP_BlockDim::operator()(unsigned int) const'+272
0x166c <__HIP_BlockDim::operator()(unsigned int) const+272>: s_setpc_b64 s[22:23]
0x1670 <__HIP_Coordinates<__HIP_BlockDim>::__X::operator unsigned int() const>: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
(gdb) FAIL: gdb.rocm/dw2-lane-pc.exp: x/2i '__HIP_BlockDim::operator()(unsigned int) const'+272
Fix that by running the function name via string_to_regexp.
Change-Id: Ia276f977cc930243cf62c6e0cf9bb5cbc2816ff1
DWARF::assemble internally can call function_range, and that function compiles the testcase's source file with debug information, in order to be able to extract function start/end. That relies on that built program's executable code being exactly the same as the code built for the program without debug information that will be linked with the generated DWARF. HIP programs that use the DWARF::assemble framework are more complicated to build: they need the "hip" option, they need -fgpu-rdc, etc., which function_range doesn't know anything about. To address this, teach DWARF::assemble about a new "-exe TEMPLATE_EXECUTABLE" option, that let's you pass down a pre-built executable that we can extract function ranges from. In addition, add a "-target TARGET" option, that let's us pass down the target we're building the DWARF for. This is needed because function_range behaves a little differently depending on which target we're generating DWARF for (e.g., for amdgcn, we don't use function labels to find function ranges), and when building a HIP program, the DejaGnu target is the host. Change-Id: Ia1fd7f23827af9c8901f96615572ce2cae31deb6
This adds a DW_AT_LLVM_lane_pc testcase that uses the DWARF assembler framework to generate the DWARF. This automatically extracts addresses for DW_AT_LLVM_lane_pc in the if/then/else tests using "info line". In order to be able to step through the testcase's program, we need line info. Writing the line tables manually would be a lot of work, so we instead rely on Clang's -gline-tables-only. The main trouble is then how to connect the generated line table entry with DW_AT_stmt_list for the compile unit we generate, without hand editing the compiler-generated DWARF to include some label or some such. I managed to make it work by compiling the program twice. After compiling the first time, I extract the needed .debug_line offset from the binary, and then re-assemble using that value as DW_AT_stmt_list offset. One extra complication is that with -gline-tables-only, Clang still outputs one mostly-empty compilation unit. This compile unit has a ranges list, and this confuses GDB, because block/function look ups by PC may find this bare compilation unit, instead of our manually written compilation unit. To address this, we zap away Clang's compilation unit. We do this by finding it in the embedded/bundled device ELF file in the executable, then finding the relevant compilation unit within it, and overwriting its DIE with 0, the padding byte. This is valid, because consumers just skip over padding bytes silently. Some of these helper routines are put in lib/rocm.exp because they'll probably be reused by other HIP testcases that use the DWARF assembler. The testcase also uses DW_OP_LLVM_push_lane to describe function parameters. Currently this has only been tested on Vega 20. For other GPUs, we may need to adjust register numbers, but otherwise, the testcase is pretty much AMDGPU agnostic. Change-Id: I786bc563348a1f81ed5772c3f3c18f035b48a93a commit-id: 71f64ee0
Workaround llvm-dwarfdump warnings mentioned in SWDEV-320473.
Without this, we get:
executing: /opt/rocm/llvm/bin/llvm-dwarfdump /home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.rocm/dw2-lane-pc/dw2-lane-pc.device.so | grep -B 5 "AT_producer.*clang" |
grep "Compile Unit:"
ERROR: tcl error sourcing /home/pedro/rocm/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.rocm/dw2-lane-pc.exp.
ERROR: comp unit range failed: 1, "0x000004bd: Compile Unit: length = 0x00000028, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x01f7, addr
_size = 0x08 (next unit at 0x000004e9)
warning: DWARF unit from offset 0x00000000 incl. to offset 0x0000000b excl. tries to read DIEs at offset 0x0000000b
warning: DWARF unit from offset 0x000004b2 incl. to offset 0x000004bd excl. tries to read DIEs at offset 0x000004bd"
while executing
"error "comp unit range failed: $status, \"$output\"""
There may be an actual bug in the generated DWARF. I haven't
investigated deeply whether the warnings truly are correct or not.
Bug: https://ontrack-internal.amd.com/browse/SWDEV-320473
Change-Id: I551e2a4f29218ad121966abd6ec76286be63ff6b
Document the $lane_pc user register. The "D" state in "info lanes". Give example of "info lanes" when we have divergence debug info. Document that GDB skips inactive frames. Document "maint set skip-divergent-regions-with-breakpoint". Update restrictions section, since we now make use of debug information describing an inactive lane's logical current PC, if available. Change-Id: If125a118799d0caf159226ee3b96b0bc9e6ae05b
$ objdump --section=.hip_fatbin -h /home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.rocm/dw2-lane-pc/dw2-lane-pc objdump: Warning: Unrecognized form: 0x23 /home/pedro/rocm/gdb/build/gdb/testsuite/outputs/gdb.rocm/dw2-lane-pc/dw2-lane-pc: file format elf64-x86-64 Sections: Idx Name Size VMA LMA File off Algn 11 .hip_fatbin 00008108 0000000000201000 0000000000201000 00001000 2**12 CONTENTS, ALLOC, LOAD, READONLY, DATA Change-Id: Idb8c903153473625dcf009fd0a2f1b5eefcdef7f
Change-Id: Ia7a4c3a1bfd542e2127443d572e44d74bc2be630 commit-id: 947d9a36
(cherry picked from commit 5240248 to allow building against current rocm-systems dbgapi) ROCm-dbgapi version 0.79 adds process_id and wave_id arguments to the amd_dbgapi_address_dependency function. The amd_dbgapi_address_dependency function is called from amdgpu_address_scope which receives a ptid_t argument showing the current thread. It is however possible that this ptid matches a CPU thread and not a GPU wave. When we set a watchpoint from a GPU context on a global variable, that address is valid on both from the CPU and GPU. When we resume execution, we need to insert a watchpoint at that address on both GPU waves and CPU threads, as both contexts could update that memory. This case is however only possible for global addresses, so if the given ptid is not a GPU thread, we can use AMD_DBGAPI_WAVE_NONE when doing the call. As for the test, A GPU global variable can be made visible to the host, in which case the host can modify it. Add a testcase that checks that when we set a watchpoint on a GPU global, we also have a CPU watchpoint that would trigger if the host side of the program was to modify that memory location. Change-Id: Ia90b7194af31b873b270bcdb29bb3a3b4aba32a0 commit-id: a90c52f8
Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf commit-id: 2a008d92
Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf commit-id: 2a008d92
"gid" is in v8 on gfx1030 with current compiler. Used to be in v0 on gfx906 when this was originally written. Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf commit-id: 2a008d92
Change-Id: Iee30e934299c98d201e30c32dc4bafd26a2abce9 commit-id: c8e3f2c6
- Handle multiple compile units emitted by Clang - Fix zapping range length Change-Id: I51302b57197d260b34668c761b2825b4f80a7d3a commit-id: 2566ca96
dwarf_assemble currently bundles the GPU .o files in a CPU+GPU bundle. For some reason, that is making it so that the .debug_info section in the bundled GPU .o file is not making it into the final linked binary. Might be related to -fgpu-rdc (LTO). Thankfully, linking the GPU .o file directly with -Xoffload-linker works. Change-Id: I79b0da8ee14898a2ced37352dac059edc9303fc5 commit-id: af0ddc38
Change-Id: I7a2372c2d5f07c50d806ed38333f3d0078cd48cf commit-id: 2a008d92
With current clang, we have three compilation units. Return the DW_AT_stmt_list of the last one. This needs a better fix. Change-Id: I2b3a2b9ac95bc2fee6d0b00689e658ac7930cb72 commit-id: fb8a315e
Collaborator
Author
|
Currently based on amd-staging from 2024. Needs to be properly rebased. I've made it work with current compilers though. Good enough for UI discussions. |
This makes GDB's disassembly show:
- leading "=>" if both logical/lane PC and physical PC are the same address.
- leading "L>" for the logical PC if logical PC and physical PC are different.
- leading "P>" for the physical PC if logical PC and physical PC are different.
E.g.:
- "set disassemble-next-line on" when lane is divergent:
(gdb) set disassemble-next-line on
(gdb) frame
#0 lane_pc_test (gid=1, in=..., out=<error reading variable: Cannot access memory at address 0x0>) at ../../../src/gdb/testsuite/gdb.dwarf2/dw2-lane-pc.cc:138
138 elem = const_array[gid] + 1; /* if_1_then */
L> 0x00007ffff5808a98 <lane_pc_test+452>: 00 00 50 dc 0d 00 00 00 flat_load_dword v0, v[13:14]
0x00007ffff5808aa0 <lane_pc_test+460>: 80 00 8f be s_mov_b32 s15, 0
0x00007ffff5808aa4 <lane_pc_test+464>: 0f 02 02 7e v_mov_b32_e32 v1, s15
0x00007ffff5808aa8 <lane_pc_test+468>: 82 00 90 be s_mov_b32 s16, 2
- "set disassemble-next-line on" when lane is active:
(gdb) si
138 elem = const_array[gid] + 1; /* if_1_then */
=> 0x00007ffff5808a98 <lane_pc_test+452>: 00 00 50 dc 0d 00 00 00 flat_load_dword v0, v[13:14]
0x00007ffff5808aa0 <lane_pc_test+460>: 80 00 8f be s_mov_b32 s15, 0
0x00007ffff5808aa4 <lane_pc_test+464>: 0f 02 02 7e v_mov_b32_e32 v1, s15
0x00007ffff5808aa8 <lane_pc_test+468>: 82 00 90 be s_mov_b32 s16, 2
- disassembling a divergent lane:
(gdb) disassemble
Dump of assembler code for function lane_pc_test:
....
0x00007ffff5808a94 <+448>: s_cbranch_execz 91 # 0x7ffff5808c04 <lane_pc_test+816>
P> 0x00007ffff5808a98 <+452>: flat_load_dword v0, v[13:14]
0x00007ffff5808aa0 <+460>: s_mov_b32 s15, 0
....
0x00007ffff5808c04 <+816>: s_or_b64 exec, exec, s[4:5]
L> 0x00007ffff5808c08 <+820>: flat_load_dword v0, v[13:14]
0x00007ffff5808c10 <+828>: s_mov_b32 s4, 1
....
End of assembler dump.
(gdb) x /2i $pc
P> 0x7ffff5808a98 <lane_pc_test+452>: flat_load_dword v0, v[13:14]
0x7ffff5808aa0 <lane_pc_test+460>: s_mov_b32 s15, 0
(gdb) x /2i $lane_pc
L> 0x7ffff5808c08 <lane_pc_test+820>: flat_load_dword v0, v[13:14]
0x7ffff5808c10 <lane_pc_test+828>: s_mov_b32 s4, 1
- disassembling an active lane:
(gdb) disassemble
Dump of assembler code for function lane_pc_test:
....
0x00007ffff5808a94 <+448>: s_cbranch_execz 91 # 0x7ffff5808c04 <lane_pc_test+816>
=> 0x00007ffff5808a98 <+452>: flat_load_dword v0, v[13:14]
0x00007ffff5808aa0 <+460>: s_mov_b32 s15, 0
....
0x00007ffff5808c04 <+816>: s_or_b64 exec, exec, s[4:5]
0x00007ffff5808c08 <+820>: flat_load_dword v0, v[13:14]
0x00007ffff5808c10 <+828>: s_mov_b32 s4, 1
....
End of assembler dump.
(gdb) x /2i $lane_pc
=> 0x7ffff5808a98 <lane_pc_test+452>: flat_load_dword v0, v[13:14]
0x7ffff5808aa0 <lane_pc_test+460>: s_mov_b32 s15, 0
(gdb) x /2i $pc
=> 0x7ffff5808a98 <lane_pc_test+452>: flat_load_dword v0, v[13:14]
0x7ffff5808aa0 <lane_pc_test+460>: s_mov_b32 s15, 0
Change-Id: I85b303ab48ea920a98626605fa7f6851bd71acac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Lane divergence support prototype.
This prototypes support for lane divergence using DW_AT_LLVM_lane_pc from https://llvm.org/docs/AMDGPUUsage.html#dw-at-llvm-lane-pc .
It includes a testcase hand-written using the DWARF assembler from the testsuite. It's actually the first example of a DWARF-assembler based GPU testcase, showing the contortions we need to go through. Those could could/should probably be extracted out making it possible to write other DWARF-assembler based tests using that framework.