Commit d0fbeb8
GCN: use byref instead of byval+lower_byval for kernel arguments
On AMDGPU, kernel arguments already reside in the read-only kernarg
segment. The current pipeline adds `byval` attributes and then
`lower_byval` expands them into first-class aggregates (FCAs), which
forces LLVM to extractvalue every field and store the entire struct
into scratch memory via alloca — even when only a few fields are used.
For large structs (e.g. Oceananigans' ImmersedBoundaryGrid), this
produces dozens of dead scratch stores.
Using `byref` instead keeps the pointer semantics, allowing LLVM to
generate scalar loads directly from the kernarg segment on demand.
The invariant.load and TBAA metadata that Julia emits remain valid
since the kernarg memory is immutable.
The byref pointer parameters are rewritten to addrspace(4) (AMDGPU
constant/kernarg address space), with addrspacecasts inserted so the
function body can continue using generic pointers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent c1b651a commit d0fbeb8
3 files changed
Lines changed: 97 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
43 | 127 | | |
44 | 128 | | |
45 | 129 | | |
| |||
48 | 132 | | |
49 | 133 | | |
50 | 134 | | |
51 | | - | |
52 | | - | |
| 135 | + | |
| 136 | + | |
53 | 137 | | |
54 | 138 | | |
55 | 139 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
272 | 272 | | |
273 | 273 | | |
274 | 274 | | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
275 | 281 | | |
276 | 282 | | |
277 | 283 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
97 | | - | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
98 | 102 | | |
99 | 103 | | |
100 | 104 | | |
| |||
0 commit comments