Skip to content

[CIR] Lower constant block addresses for goto#201644

Open
adams381 wants to merge 2 commits into
mainfrom
users/adams381/cir-computed-goto-blockaddress
Open

[CIR] Lower constant block addresses for goto#201644
adams381 wants to merge 2 commits into
mainfrom
users/adams381/cir-computed-goto-blockaddress

Conversation

@adams381

@adams381 adams381 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

GNU computed-goto code that takes a label's address in a constant
context -- the common static dispatch-table idiom
static const void *tbl[] = {&&L1, &&L2}; goto *tbl[i]; -- hit
errorNYI in ConstantLValueEmitter::VisitAddrLabelExpr, and the
follow-on goto *tbl[i] then tripped the indirectGotoBlock
assertion in emitIndirectGotoStmt because the label was never
registered as address-taken. The runtime form (void *p = &&L; goto *p;) already worked; only the constant form was missing.

Label addresses had no constant representation: cir.block_address
existed only as an operation, which cannot appear inside a
#cir.const_array initializer. Add a #cir.block_address constant
attribute and lower it to MLIR's LLVM::BlockAddressAttr, reusing the
same BlockTagOp resolution the operation form already uses
(threading the pass-owned LLVMBlockAddressInfo into the constant
value lowering and the Global/Constant lowering patterns).

CIRGen's ConstantLValueEmitter::VisitAddrLabelExpr now emits the
attribute and records the label so finishIndirectBranch adds it to
the cir.indirect_br successors and the indirect-goto block is
instantiated. Because such labels have no function-local
BlockAddressOp, GotoSolver was deleting them as unused; it now
also keeps labels referenced by a constant #cir.block_address
anywhere in the module.

The lowered LLVM matches classic codegen (a [N x ptr] of
blockaddress constants plus an indirectbr). New test
goto-address-label-table.c covers CIR, the CIR-lowered LLVM, and
classic OGCG; label-values.c (the runtime form) is unchanged.

GNU computed-goto code that takes a label's address in a constant
context -- the common static dispatch-table idiom
`static const void *tbl[] = {&&L1, &&L2}; goto *tbl[i];` -- hit
`errorNYI` in `ConstantLValueEmitter::VisitAddrLabelExpr`, and the
follow-on `goto *tbl[i]` then tripped the `indirectGotoBlock`
assertion in `emitIndirectGotoStmt` because the label was never
registered as address-taken.  The runtime form (`void *p = &&L;
goto *p;`) already worked; only the constant form was missing.

Label addresses had no constant representation: `cir.block_address`
existed only as an operation, which cannot appear inside a
`#cir.const_array` initializer.  Add a `#cir.block_address` constant
attribute and lower it to MLIR's `LLVM::BlockAddressAttr`, reusing the
same `BlockTagOp` resolution the operation form already uses
(threading the pass-owned `LLVMBlockAddressInfo` into the constant
value lowering and the Global/Constant lowering patterns).

CIRGen's `ConstantLValueEmitter::VisitAddrLabelExpr` now emits the
attribute and records the label so `finishIndirectBranch` adds it to
the `cir.indirect_br` successors and the indirect-goto block is
instantiated.  Because such labels have no function-local
`BlockAddressOp`, `GotoSolver` was deleting them as unused; it now
also keeps labels referenced by a constant `#cir.block_address`
anywhere in the module.

The lowered LLVM matches classic codegen (a `[N x ptr]` of
`blockaddress` constants plus an `indirectbr`).  New test
`goto-address-label-table.c` covers CIR, the CIR-lowered LLVM, and
classic OGCG; `label-values.c` (the runtime form) is unchanged.
@llvmorg-github-actions llvmorg-github-actions Bot added clang Clang issues not falling into any other category ClangIR Anything related to the ClangIR project labels Jun 4, 2026

adams381 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

How to use the Graphite Merge Queue

Add the label FP Bundles to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@llvmorg-github-actions

llvmorg-github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clangir

Author: adams381

Changes

[mlir][spirv][tosa] Add remaining TOSA 1.0 SPIR-V TOSA ops (#200383)

Add conversion patterns for additional TOSA 1.0 operations targeting the
SPIR-V TOSA extended instruction set.

This covers pooling and convolution ops, FFT/RFFT, matmul, concat, pad,
rescale, const, const_shape, and identity. Concat is split into
conservative chunks to avoid producing SPIR-V instructions with too many
operands.

Add a multi-result conversion pattern for FFT/RFFT and share the
convolution replacement logic for conv2d, conv3d, and depthwise_conv2d
while keeping transpose_conv2d explicit because it has different
attributes.

Also share constant attribute conversion for const and const_shape,
including integer element type conversions such as index to i32, i4 to
i8, and i48 to i64, and preserve the empty const_shape edge case.

Add conversion tests for the newly covered operations.

Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>

[LV] Fix missing MetaData for histogram instructions (#134241)

[libc] Add netinet/udp.h containing struct udphdr (#200839)

This patch adds a generated <netinet/udp.h> containing the udphdr
structure definition.

There are two styles ("linux" and "BSD") of udphdr field names (and both
of them can be found in the wild), so I follow the glibc and bionic
approach of using an anonymous union. (musl uses a #define on the field
names, which doesn't seem that great).

I've added the target to include/CMakeLists.txt and registered it
under target lists in headers.txt for the supported Linux platforms
(x86_64, aarch64, and riscv).

To verify layout and alignment correctness, I've added a layout and
field compatibility unit test under test/src/netinet/udp_test.cpp.

Assisted by Gemini.

[lldb][Windows] Use captured error in ConnectionGenericFile::Read (#200803)

Use the captured value on both branches so the reported error matches
the one that was tested against.

[cuda][flang] Diagnose missing CUDA intrinsic modules in Flang semantics (#200509)

  • Replace CUDA intrinsic module CHECKs with actionable diagnostics
    when cudadevice or __cuda_builtins cannot be read.
  • Avoid dereferencing missing CUDA module scopes during implicit CUDA
    symbol import.
  • Add a semantics test covering the missing CUDA intrinsic module
    diagnostic.

[NFC][clang-sycl-linker] Apply LLVM coding standards to ClangSYCLLinker.cpp (#200543)

Bring the file in line with llvm/docs/CodingStandards.rst without
changing
behavior:

  • Restore the canonical //===---===// file-header banner.
  • Move free functions out of the anonymous namespace and mark them
    static; keep only types (LinkerOptTable, LinkResult, SplitModule,
    IRSplitMode, EntryPointCategorizer) inside anonymous namespaces.
  • Rename a local OutputFile in createTempFile to Path to stop it
    shadowing the file-scope OutputFile.
  • Rename the inner Err in runCodeGen to MatErr to stop it shadowing
    the surrounding SMDiagnostic Err.
  • Normalize parameter-name comments to the /*Name=*/value form.
  • Strip quotes from Doxygen \param 'Name' directives.

Co-Authored-By: Claude

[CIR] Implement __builtin_astype for vec4 to vec3 (#199374)

Implement __builtin_astype for vec4 to vec3

Issue #192311

[LLDB] Detect cycles during Type resolution (#200304)

I got LLDB crash reports from the Swift plugin where (presumably
malformed) debug info sends lldb_private::Type into an infite recursion.
Most likely this is a bug in the DWARF parser, however, even malformed
inputs shouldn't crash LLDB so this patch adds cycle detection.

rdar://177856769

Assisted-by: claude

[lldb-dap] Cleanup InstructionBreakpoint (#200228)

Added mutex like in other breakpoints
(PR). Also removed
unused m_offset field.

[libc] Add GPU build-only to fullbuild precommit CIs. (#200593)

  • Add build-only CI for AMDGPU.
  • Also pass correct flags to other targets.

[SROA] Canonicalize homogeneous structs to fixed vectors (opt-in, after memcpyopt) (#165159)

SROA sometimes keeps temporary allocas around for homogeneous structs
like
{ i32, i32, i32, i32 } because the partition has only memcpy/memset
traffic
and no scalar typed users to drive vector promotion. On targets like
AMDGPU
these allocas turn into scratch memory and hurt performance. This PR
adds a
helper tryCanonicalizeStructToVector that converts such a partition to
a
fixed vector type when every non-debug, non-lifetime user is a memory
intrinsic, so the alloca can promote through normal vector load/store
paths.
The element-shape rule accepts any homogeneous element count, any
integer
width, any FP type, and integral pointer types, as long as the struct is
tightly packed.

Canonicalization is gated behind a new per-pass option
canonicalize-struct-to-vector on SROAOptions, off by default. Only
the
late SROA passes in addVectorPasses (new PM, non-LTO and FullLTO) and
the
two legacy-PM SROAs in NVPTX enable it, so it always runs after
MemCpyOptPass. Running it earlier can hide memcpy chains that
memcpyopt
would otherwise collapse, and can also emit wide stores whose suffix
lanes
are undef when only part of a struct was initialized. Both hazards are
covered by new tests struct-to-vector-before-memcpyopt.ll and
struct-to-vector-fp-store-only-tail.ll. The opt-in design and the
"after memcpyopt" placement come from @YonahGoldberg's refactor, which
removed every regression reported in earlier benchmark runs (see
dtcxzyw/llvm-opt-benchmark-nightly#306). AMDGPU inherits the new-PM
opt-ins
automatically; other targets keep upstream-main SROA behavior unless
they
opt in.

Co-authored-by: Yonah Goldberg <ygoldberg@nvidia.com>


Co-authored-by: Yonah Goldberg <ygoldberg@nvidia.com>

[AMDGPU] Verify data size of load-to-LDS intrinsics (#200587)

An out-of-range size immarg (e.g. 0) produced an illegal i0 memory type
during SelectionDAG building and crashed the backend instead of being
rejected up front

[DirectX] Implement lowering of llvm.dx.resource.samplebias to the SampleBias DXIL Op (#199745)

Fixes #192548

This PR implements the lowering of the llvm.dx.resource.samplebias
intrinsic to the SampleBias DXIL Op.

Although I reckon that other lowerSample* functions in
DXILOpLowering.cpp will have shared logic, this is the first one to be
implemented. Consolidating common logic between future lowerSample*
functions can be left to a later PR implementing the second or other
lowerSample* function.

Assisted-by: Claude Opus 4.6

[lldb-dap] Use SetTarget for launch and attach commands (#200133)

Without this patch event listener registration was skipped, as a result
Modules view in UI was not displayed in case of launching target via
launchCommands or attachCommands.

[PGO][HIP] Fix profile-only Windows link by gating ROCm interceptor macro (#200859)

PR #200111 stops compiling InstrProfilingPlatformROCm.cpp (which defines
the
HIP GPU helper __llvm_profile_hip_collect_device_data) in profile-only
builds.
But the compile define -DCOMPILER_RT_BUILD_PROFILE_ROCM=1 was still
added
whenever the COMPILER_RT_BUILD_PROFILE_ROCM option was on (the default),
so
InstrProfilingFile.c still referenced the helper from
__llvm_profile_write_file
even though it was never built.

On ELF the declaration is weak, so the undefined symbol folds to null
and the
address-guarded call is skipped. COFF/Windows has no such fallback:

error LNK2019: unresolved external symbol
__llvm_profile_hip_collect_device_data referenced in function
__llvm_profile_write_file

Add the define only when PROFILE_HAS_HIP_INTERCEPTOR is true, i.e. the
same
condition that keeps InstrProfilingPlatformROCm.cpp in the archive, so
the
macro is defined iff the helper is actually compiled in.

Reported by zmodem:
#200111 (comment)

[offload][LIT] Disable two tests failing on new Intel GPU driver (#200856)

One new consistent failure and one which causes instability.

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>

[LV] Add VPlan printing test wih UDiv SCEV expansion. (NFC) (#200845)

[X86] Fix X86FixupLEAs displacement check for other types of operands (#200705)

This has already bitten us
before
with
BlockAddresses, but another case popped up recently: under complex
conditions, LEAs in the three-operand case with symbolic displacements
could be miscompiled due to MO_MCSymbol not being handled in a similar
way. To avoid other issues in the future, just be more conservative
about the symbol type and only return false if we know for a fact that
the offset is zero.

Fixes #200707

[LLVM] Nominate Ehsan as a DA maintainer (#200375)

This is related to #200335. I would like to nominate Ehsan as a
maintainer for DependenceAnalysis as I am aware he expressed interest in
that. I am happy that Ryotaro became a maintainer, and if we get one
more maintainer with Ehsan, that is a really good sign of a healthy loop
optimisation community; I think this is a good thing, and support this.


Co-authored-by: Ehsan Amiri <ehsan.amiri@huawei.com>

Reland "[flang][OpenMP] Fix lowering of LINEAR iteration variables (#188851)" (#194623)

Linear iteration variables were being treated as private. This fixes
one of the issues reported in #170784.

The regressions in the OpenMP V&V and Fujitsu testsuites happened
because the users iterator was apparently becoming invalid, after one of
its uses was replaced. This was fixed by making a copy of the list of
users.

[libc++] Applied [[nodiscard]] to optional::iterator (#198489)

Towards #172124

[LoopUnroll] Support parallel reductions for minmax (#182473)

This patch

  • Supports parallel reductions for min/max operations in LoopUnroller.
  • Adds relevant test (including intrinsics).
  • Renames flag -unroll-add-parallel-reduction to
    -unroll-parallel-reduction.
  • Relaxes check in IVDescriptors.cpp (getMinMaxRecurrence) to handle
    out-of-loop uses.

Planning to take support for vector types in the next patch.

[libc][math] Add missing math.yaml entries for acospif and atan2f16 (#199442)

Fixes #199266

This PR adds missing math.yaml entries for acospif and atan2f16.

[LifetimeSafety] Store cleanup expressions for temporaries (#200568)

Now in CFGFullExprCleanup we also store a cleanup expressions to be
able to get an accurate location where destruction happened.

This helps user understand lifetime semantics of objects better.

Closes #195503

[VPlan] Assert operand correctness at construction. (NFC) (#200686)

Update VPWidenPHIRecipe, VPBlendRecipe and VPReductionRecipe to assert
type correctness at construction.

PR: #200686

[Support] Remove unused argument of DataExtractor constructor (NFC) (#197121)

AddressSize parameter is not used by DataExtractor and will be
removed in the future. See #190519 for more context.

[clang] Fix assertion crash in alloc_size structural equivalence check (#199407) (#199980)

Fixes #199407

Remove ParamIdx from the USE_DEFAULT_EQUALITY default-comparison path
and add an explicit equalAttrArgs&lt;ParamIdx&gt; specialization that safely
handles invalid (unset) values.

[AMDGPU] Use S_MOV_B64_IMM_PSEUDO when moving 64-bit VGPR const to SGPR (#200576)

S_MOV_B64 only encodes a 32-bit literal, so rematerializing a non-inline
64-bit immediate through it silently dropped the high 32 bits

[AMDGPU] Use fpext to widen sub DWORD FP printf args (#200870)

Widening half/bfloat printf varargs via bitcast+sext corrupted the FP
bit pattern for negative values

Extend by value-preserving fpext to float instead

Revert "[LoopUnroll] Support parallel reductions for minmax (#182473)" (#200892)

This reverts commit 1e79ea1.

Make tests added in rd5a24ef work in read-only source filesystems. (#200883)

[flang][OpenMP] Simplify checks for type-parameter inquiry (#198217)

Remove the no longer needed IsDataRefTypeParamInquiry.

[Testing] Allow custom markers in llvm::Annotations (#195570)

The current annotation markers can conflict with several language
constructs. Notably, [[ ]] and ^ collide with C++ attributes (e.g.,
[[nodiscard]]), the C++26 reflection operator (^^int), and
Objective-C blocks (void (^foo)(void)). Similarly, $ can conflict
with identifiers that also use $ with -fdollars-in-identifiers, as
well as with C++26 code that uses $ as raw-string delimiters or as a
preprocessing token
(P2558R2).

Because the markers are currently hardcoded in llvm::Annotations,
existing workarounds have to rely on digraphs or macro substitution via
implicit #defines. These approaches reduce readability and make tests
cumbersome to write. This PR alleviates these issues by adding support
for custom markers. It adds an overloaded Annotations constructor that
accepts a new Annotations::Markers struct. For example, to use ~ for
point, @ for name, and {{/}} for range:

Annotations Example(R"cpp(
  @<!-- -->name(payload){{[[nodiscard]] int foo(int x);}}~
)cpp", {"~", "@", "{{", "}}"});

Alternatively, we can use setters to customize the markers individually:

Annotations Example(R"cpp(
  @<!-- -->name(payload){{[[nodiscard]] int foo(int x);}}~
)cpp", Annotations::Markers().setPoint("~")
                             .setName("@@")
                             .setRangeBegin("{{")
                             .setRangeEnd("}}"));

Using longer markers:

Annotations Example(R"cpp(
  $$name(payload)[[[[[nodiscard]] int foo(int x);]]]^^
)cpp", {"^^", "$$", "[[[", "]]]"});

Using multi-byte characters:

Annotations Example(R"cpp(
  🏷️name(payload)👉[[nodiscard]] int foo(int x);👈🎯
)cpp", {"🎯", "🏷️", "👉", "👈"});

The existing single-argument constructor delegates to the new overload,
preserving backward compatibility.

PS: The original code has a FIXME comment mentioning alternative
approaches, such as escaping and changing the default syntax. While
valid, escaping would increase visual noise, and changing the default
syntax would break existing tests. The approach proposed here provides
the flexibility to choose a syntax that is clean for a specific context
and is backward compatible. See https://reviews.llvm.org/D59814 for
earlier discussion about these alternative design choices.

[NVPTX] Fix fptosi/fptoui to i1. (#200718)

The langref says:

> The 'fptosi' instruction converts its floating-point operand into the
> nearest (rounding towards zero) signed integer value. If the value
> cannot fit in ty2, the result is a poison value.

Previously fptosi to i1 and fptoui to i1 were lowered as x == 0.0,
which is clearly incorrect.

Because the conversion truncates toward zero, the only results that are
not poison are:

  • 0 and -1 for the signed case, and
  • 0 and 1 for the unsigned case.

So the i1 result is fully determined by a single fp compare:

  • fptosi x to i1 == x &lt;= -1.0
  • fptoui x to i1 == x &gt;= 1.0

with any value being acceptable for the poison (out-of-range) inputs.

[libc][math] Adding constexpr to tests for Double-type math functions (#200681)

Similar to

  constexpr uint64_t X_COUNT = 123;
  constexpr uint64_t X_START = FPBits(0.25).uintval();
  constexpr uint64_t X_STOP = FPBits(4.0).uintval();
  constexpr uint64_t X_STEP = (X_STOP - X_START) / X_COUNT;

  constexpr uint64_t Y_COUNT = 137;
  constexpr uint64_t Y_START = FPBits(0.25).uintval();
  constexpr uint64_t Y_STOP = FPBits(4.0).uintval();
  constexpr uint64_t Y_STEP = (Y_STOP - Y_START) / Y_COUNT;

in atan2_test.cpp
This PR tends to add constexpr to all double function tests-only

Additonally, replacing

LIBC_NAMESPACE::fputil::FPBits&lt;double&gt;(x).uintval();

to

FPBits(x).uintval();

Assisted using Copilot.

[IR] Reorder checks in isInterposable() (NFC) (#200862)

The isDSOLocal() check is a lot cheaper than getSemanticInterposition(),
so perform it first.

[LV] Add users to header phis in tests (NFC). (#200890)

Make sure the header phis in various tests are actually used, to make
them more robust w.r.t. to future simplification changes. Those dead
phis would be cleaned up before LV in the regular pipeline.

[VectorCombine] Don't fold non-idempotent shuffle reductions when shuffle duplicates element (#200778)

This is a small correctness fix for foldShuffleChainsToReduce. For odd /
non-power-of-2 vector sizes the parity-mask scheme duplicates a lane,
which is only sound when the reduction op is idempotent. For
non-idempotent ops (e.g. add, xor) the duplicated lane changes the
result, so I track a HasLaneDuplication flag and bail out of the fold in
that case. Tests cover non-foldable add/xor and a still-foldable
idempotent smax.

[CIR] Fix ordering of lifetime-extended cleanups (#200874)

We had a bug that was causing any lifetime-extended cleanups that
occurred within a full-expression cleanup scope to be emitted
prematurely when the expression also required deferred conditional
cleanups. This was, in some cases, causing a dangling reference to
temporaries that had already been destructed. Luckily, it was also
causing us to not emit a return at the end of the function in one case,
leading the verifier to draw attention to this problem.

This change introduces new functions in RunCleanupsScope to allow
"ordinary" EH stack cleanups to be force-emitted separately from
lifetime-extended cleanups. Classic codegen doesn't need this capability
because it handles deferred conditional cleanups very differently than
CIR due to its flat/branching approach.

The testing for this fix did uncover a significant issue wherein CIR is
calling destructors in the wrong order even after the fix in this PR.
However, that's a pre-existing issue that will require changes beyond
the scope of this fix, so I'll handle it in a follow-up.

Assisted-by: Cursor / claude-opus-4.7

[AIX] Set the ifunc constructor's priority to 100, to run before any user code (#200893)

Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>

[X86] lowerShuffleAsBitBlend - use getConstVector to create selection mask (#200877)

Avoids wasteful SDValue creation if the shuffle matching fails, handles
any i64 legalisation and makes it easier to add UNDEF element handling
in the future.

Revert "[llvm-objcopy] Strip header from DXContainer's ILDB part during --dump-section" (#200867)

Reverts llvm/llvm-project#198578
Failed build: https://lab.llvm.org/buildbot/#/builders/190/builds/43332

[OFFLOAD][L0] Refactor AsyncQueues (#200650)

This PR introduces a major refactor on how L0 queues are used in the
plugin as the current design is too tied to OpenMP behavior. There are
two major changes:

  • We no longer have a per-thread queue cache as this resulted in a
    single logical queue backed up by multiple L0 queues. We now have a per
    device cache which should have a similar level of reuse performance.
  • The AsyncQueueTy type has been largely extended to hide the logic of
    the different queues types (which are now subclasses of AsyncQueueTy).
    This has greatly simplified the L0Device implementation.

As part of this refactor a number of other changes happened:

  • Copy command lists were removed in favor of the
    ZE_COMMAND_QUEUE_FLAG_COPY_OFFLOAD_HINT driver hint.
  • Support for inorder queues was added (can be selected using
    LIBOMPTARGET_LEVEL_ZERO_COMMAND_MODE=inorder).
  • Sync queues now use inorder queues.
  • Queue operations conditionally only use events when necessary (OMPT,
    profiling or queue logic).
  • MemFill now support asynchronous operations

There are a few more things that still need to be adjusted and cleanup
but as this PR is already big enough as it is, I'll prepare a follow-up
to fix them.

[mlir][Linalg] Enable lowering/decomposing scalable pack ops (#200216)

Enables lowering/decomposing linalg.pack ops with dynamic inner tiles
to a sequence of tensor.pad -> tensor.expand_shape ->
linalg.transpose ops.


Signed-off-by: Ege Beysel <beyselege@gmail.com>

[libc][ci] Use lld for linking in precommit CIs. (#200897)

[clang-tidy] Fix cert-err33-c inheriting CheckedReturnTypes from bugp… (#200169)

…rone-unused-return-value

The cert-err33-c alias did not override CheckedReturnTypes, causing it
to inherit the default from bugprone-unused-return-value. This made it
flag any function returning std::error_code, std::expected, etc. That is
outside the scope of CERT ERR33-C (a fixed list of C standard library
functions).
Set CheckedReturnTypes to empty so the alias only checks its intended
function list.


Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com>

[analyzer] Normalize sub-array indices in RegionStore initializer res… (#200044)

…olution

After #198346, alpha.unix.cstring.UninitializedRead reports a false
positive when a pointer into a fully-initialized const multidimensional
array is advanced past an inner dimension boundary and used as a source
argument to memcpy. The root cause is in
convertOffsetsFromSvalToUnsigneds in RegionStore, which returned
UndefinedVal for any element index exceeding its sub-array extent,
conflating pointer arithmetic legality with memory initializedness.

This patch separates the two concerns. The RegionStore now normalizes
indices that overflow an inner dimension by carrying into the outer
dimension via divmod, the same way arr[0][5] in int arr[4][3]
denotes the same memory as arr[1][2]. UndefinedVal is returned only
when the computed flat offset exceeds the total array allocation.
Whether cross-subobject pointer arithmetic constitutes undefined
behavior per C/C++ standards is a separate concern for individual
checkers to diagnose. No existing checker flags sub-array boundary
crossing as UB, verified both before and after #198346.

Fixes #199271

[gn build] Port commits (#200910)

3d24f9a
7963f45
7a435ca
866945c
a0ac752

[MLIR][GPU] Support synchronous gpu.alloc and gpu.dealloc in gpu-to-llvm (#191661)

The gpu-to-llvm conversion patterns for gpu.alloc and gpu.dealloc
previously required async tokens for non-host shared operations. This
prevented lowering synchronous device memory allocation and deallocation
to runtime calls.

Changes:

  • gpu.alloc: drop the isAsyncWithOneDependency guard for non-shared
    allocs. Cap the number of async dependencies at one to preserve the
    prior single-dependency invariant. Cast the runtime-returned pointer to
    the memref's address space when they differ, so the descriptor's pointer
    slots type-check for memref<..., N>.
  • gpu.dealloc: drop the async requirement entirely. Use a null stream
    when no async dependencies are present. Use eraseOp instead of replaceOp
    for sync deallocs (which have no results). Cap the number of async
    dependencies at one.
  • Add a unit test for the synchronous alloc/dealloc lowering and a test
    that more than one async dependency leaves the op unconverted.
  • Update lower-launch-func-bare-ptr.mlir, which previously asserted
    gpu.alloc survived the conversion (because the pattern bailed for sync
    non-shared allocs); it now asserts the full lowering through
    mgpuMemAlloc + addrspacecast.

Assisted-by: Claude

[OFFLOAD][L0] Add support for dynamic l0 fallbacks (#200517)

The PR adds support to define fallbacks for DLWRAP routines that are not
found when loading the library.
It implements a fallback for
zeCommandListAppendLaunchKernelWithArguments introduced in #194333 which
might not be available in older drivers.

[openacc] Attach Parallelism Levels to Auto Loops (#200884)

Auto loops are analyzed by the compiler in later compilation stages to
determine whether they can be parallelized. These loops may carry
parallelism levels (this does not guarantee that they are parallelizable,
compiler should still analyze them). However, if the loop is parallelized,
the parallelism levels specified in the source should be respected. This
change attaches the parallelism level to auto loops, which enables their
propagation through next compilation steps.

[X86] lowerShuffleAsBitMask - use getConstVector to create bitmask (#200889)

Avoids wasteful SDValue creation if the shuffle matching fails, handles
any i64 legalisation, avoid issues with later folds not recognising fp
'allones' masks and makes it easier to add UNDEF element handling in the
future.

[lldb-dap] Mark source deemphasize if path doesn't exist (#194702)

LLDB-DAP has a problem with sanitizers in GCC. When we stop in
sanitizer's code, lldb-dap sends stack frames with path (sanitizer's
build dir path) that doesn't exist on machine. It leads to problems in
VS Code UI (see issue below).

Fixes #184789

[gn build] Port 7964b66 (#200914)

[lldb] Skip minidump case in TestDynamicValue.py for arm64e (#200047)

The minidump format does not currently have a way to distinguish arm64e
from arm64.

[mlir][ROCDL][AMDGPU] Add result arguments to buffer atomics (#198596)

Buffer atomic operations were failing LLVM validation because they
weren't declared as having a result (the old value). This commit updates
those operations to fix the error. (Note: if you don't need the result,
you just don't use it, and the compiler backend emits an atomic that
doesn't return the old value.)

[VPlan] Propagate print flags for VPInstructionWithType. (#200838)

Update VPInstructionWithType::print to include printing flags.

[BOLT] Add perf2bolt pre-aggregated profile output

Add a pre-aggregated profile output format (--profile-format=preagg)
so perf.data can be pre-parsed/aggregated and used as input with -pa.

Supports branch (T traces) and basic samples (S records).

Currently only covers main binary, can be extended to cover multi-DSO.

Test Plan: Updated perf_test.test, added perf_brstack.test

Reviewers:
yota9, ayermolo, yozhu, maksfb, yavtuk, paschalis-mpeis, rafaelauler

Reviewed By: paschalis-mpeis

Pull Request: #199465

[OFFLOAD][L0][NFC] Remove Device TLS table (#200923)

After #200650 the Device TLS table is not used anymore so it can be
removed.

[OFFLOAD][L0][NFC] Rename AsyncQueueTy struct to L0QueueTy (#200921)

L0QueueTy is more descriptive after the changes in #200650.

Also renamed the header name and one internal field to be more
descriptive.

[CIR] Spill and reload values across deferred cleanup scopes (#200904)

The valuesToReload handling in our RunCleanupScope::forceCleanup()
function was not taking into account cleanup scopes for deferred
conditional cleanups that get created when we call
forceDeactivation(). This was leading to a CIR verification error in
cases where a deferred cleanup was used in an expression that returns a
value.

This change adds code to spill values ahead of the forceDeactivation()
call when we see that there are cleanups on the deferred stack.

Assisted-by: Cursor / claude-opus-4.7

[lldb] Fix TestBranchIslands.py for arm64e (#200498)

Need to pass CFLAGS to clang when building the asm files, otherwise the
triple isn't used and they're automatically compiled for the host
platform.

[Docs] Update coding standard for TD files (#200848)

This PR proposes an update to the coding standards document to make
explicit that we do not want unnecessary formatting changes to TD files.

This is in response to this merged PR (#199346), which lead to this RFC
(https://discourse.llvm.org/t/80-column-limit-for-td-files/90950/).


Co-authored-by: Vlad Serebrennikov <serebrennikov.vladislav@gmail.com>

[lldb] Strip objc superclass pointer in trampoline handler (#200490)

The pointer needs to be stripped before being handed off to any objc
runtime functions. Otherwise the utility expression will hit a PAC
exception and the thread plan will fail to execute correctly.

This fixes TestObjCStepping.py on arm64e.

[SelectionDAG] Widen <2 x T> vector types for atomic store (#197618)

Vector types of 2 elements must be widened. This change does this
for vector types of atomic store in SelectionDAG so that it can
translate aligned vectors of >1 size.

Store-side counterpart to #148897. Stacked on top of #197166; and below
of #197619.

[LLVM] Fix style issue in Maintainers file (#200917)

AMDGPU/GlobalISel: Switch some tests to -new-reg-bank-select (#200853)

[SelectOpt] Preserve Profile Information (#200680)

If at least one of the SelectLike instructions in the group has profile
metadata, we can propagate it given they all share the same condition.

AMDGPU/GlobalISel: Remove redundant -global-isel from -run-pass MIR tests (NFC) (#200857)

[OFFLOAD][L0][NFC] Add struct for deferred memory operations (#200928)

Improve readibility a bit by using a well-defined struct instead of
tuples.

[HotColdSplit] Consolidate pass pipeline (#200941)

Codegenprepare was added for more of a E2E test in
f0f68c6. The pipeline was split in
cb5e48d to allow for the removal of the
HotColdSplit legacy pass. Now that CodeGenPrepare has been ported to the
NewPM in f1ec0d1 (which even touched
this test), we can use a single pass pipeline and simplify the run line
a little bit.

[lld] Remove unused DenseMapInfo::getTombstoneKey (#200636)

#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.

[lldb] Remove unused DenseMapInfo::getTombstoneKey (#200635)

#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.

[BOLT] Remove unused DenseMapInfo::getTombstoneKey (#200637)

#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.

[lld][WebAssembly] Refine type used for internal TLS-related symbols. NFC (#200899)

I noticed this while reviewing #200855.

[lldb] Add static_assert to catch increases to size of Symbol (#200919)

[flang] Remove unused DenseMapInfo::getTombstoneKey (#200632)

#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.

[mlir] Remove unused DenseMapInfo::getTombstoneKey (#200633)

#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.

[X86] Remove extra MOV after widening atomic store (#197619)

This change adds patterns to optimize out an extra MOV present after
widening the atomic store. Covers &lt;2 x i8&gt; (SSE4.1+), &lt;2 x i16&gt;,
&lt;4 x i8&gt;, &lt;2 x i32&gt;, &lt;2 x float&gt;, &lt;4 x i16&gt;,
&lt;2 x ptr addrspace(270)&gt;.

Store-side counterpart to #148898. Stacked on top of #197618; and below
of #197860.

[libc] Add FENV_ACCESS pragma with CMake compiler feature detection (#200268)

Related to #199009

Added compiler feature detection for STDC FENV_ACCESS pragma. It is
used to conditionally add function-scoped #pragma STDC FENV_ACCESS ON
to libc/src/__support/FPUtil/FEnvAccess.h, whenever functions from the
&lt;fenv.h&gt; header are called and the target supports the pragma.

[gn build] Port 7a907089 (#200953)

[AMDGPU] Fix disasm of i16 operands fp inline constants (#200944)

[libc] Renaming Float128 (DyadicFloat<128>) to DFloat128 (#200907)

This is to be able to use the Float128 for emulated float128 type.
#200565

[NVPTX][clang] Remove nvvm scoped atomic intrinsics; use atomicrmw/cmpxchg (#200735)

The
llvm.nvvm.atomic.{add,exch,max,min,inc,dec,and,or,xor,cas}.gen.{i,f}.{cta,sys}
intrinsics are redundant; we can use atomicrmw / cmpxchg with a syncscope.

Moreover, the nvvm atomics are problematic because they don't have
unsigned min/max opcodes. Clang uses these intrinsics and currently emits
signed min/max for what should be unsigned operations!

Fix by doing the following.

  • Remove the nvvm intrinsics.
  • Auto-upgrade the removed intrinsics to atomicrmw/cmpxchg.
  • Make clang Clang emits atomicrmw/cmpxchg directly.

[RISCV] Improve shrinkDemandedConstant. (#196585)

Teach shrinkDemandedConstant to restore a constant that can be
materialized as:
lui a0, hi20
addi(w) a0, a0, lo12
slli a1, a0, 32
add a0, a0, a1

or:
lui a0, hi20
addi(w) a0, a0, lo12
pack a0, a0, a0

This fixes a regression between clang 18 and 19 on this test case
https://godbolt.org/z/Ma746a8xP

[Polly] Remove unused DenseMapInfo::getTombstoneKey (#200963)

#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.

[AMDGPU] Simplicy the logic in checkWMMACoexecutionHazards, NFC (#200717)

[lldb/test] Fix variant double-expansion in LLDBTestCaseFactory (#200943)

LLDBTestCaseFactory generates the actual test methods that get run: for
every test* method on a TestBase subclass, it stamps out one copy per
debug-info format (dwarf, dsym, ...) and, if a variant like
swift_clang's clang/noclang is registered, one copy per variant value on
top. When two test methods in the same class have names that share a
prefix and the longer one is declared first in the file, the shorter
method's expansion ends up re-stamping every copy already produced for
the longer one.

The variant-expansion helper decides what to copy by name-prefix match:
when processing test_expr, it grabs test_expr, test_expr_dwarf,
test_expr_dsym, ... and produces a clang/noclang suffix for each one.
The factory was handing it the full running dict of already-synthesized
methods, so by the time test_expr's turn came, the dict also held
test_expr_stripped_dwarf and test_expr_stripped_dsym from
test_expr_stripped. Those names start with test_expr_ too, so they
pick up a second clang/noclang suffix and inherit test_expr's
xfail/skip predicates.

HiddenIvarsTestCase trips this twice in a row (test_expr_stripped before
test_expr; test_frame_variable_stripped before test_frame_variable).
Upstream _test_variants is empty so the bug is latent on mainline, but
any registered variant exposes it.

With swift_clang (values clang/noclang) registered, test_expr_stripped +
test_expr should produce eight methods (4 dwarf/dsym × 2 clang/noclang
per original). Before this patch the factory emits twelve, four of them
junk:

  test_expr_stripped_dsym_clang_clang
  test_expr_stripped_dsym_clang_noclang
  test_expr_stripped_dsym_noclang_clang
  test_expr_stripped_dsym_noclang_noclang
  test_expr_stripped_dwarf_clang_clang
  test_expr_stripped_dwarf_clang_noclang
  test_expr_stripped_dwarf_noclang_clang
  test_expr_stripped_dwarf_noclang_noclang
  test_expr_dsym_clang
  test_expr_dsym_noclang
  test_expr_dwarf_clang
  test_expr_dwarf_noclang

Track each method's expansion in a local dict and merge it back into the
shared dict only once that method is fully processed, so the variant
helper never sees another method's copies.

[lldb] Skip libc++ category tests on Darwin when no in-tree libc++ is built (#199262)

canRunLibcxxTests() previously short-circuited with "libc++ always
present" for all Darwin targets, meaning the "libc++" test category was
never skipped on macOS — even when LLDB_HAS_LIBCXX is OFF and no
--libcxx-include-dir / --libcxx-library-dir are passed to dotest.
The tests would silently run against the system libc++ instead of an
in-tree build, producing results inconsistent with what the suite is
designed to validate.

This fixes canRunLibcxxTests() to apply the same libcxx_include_dir
/ libcxx_library_dir guard on Darwin that Linux already uses. When
those dirs are absent (i.e. no in-tree libc++ was built), the function
returns False and checkLibcxxSupport() appends libc++ to
skip_categories — skipping those tests exactly as Linux does.

On the CMake side, the SEND_ERROR for LLDB_HAS_LIBCXX=OFF is
downgraded to a WARNING so downstreams that intentionally skip the
runtimes build can keep LLDB_INCLUDE_TESTS=ON for the tests they
actually want to run. The warning is also gated on the new
LLDB_ENABLE_LIBCXX_TESTS option (default ON): setting it to OFF
acknowledges the deliberate choice and silences the warning without
requiring either an in-tree libc++ build or disabling all tests.

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>

[Object][DirectX] Fix UB when parsing a DXContainer from a null buffer (#200865)

Adds an additional check that Src != nullptr before doing other
operations on it, which caused a UB in one test with #200413.

[RISCV] Use sspush/sspopchk mnemonics for shadow stack codegen (#200182)

After PR #178609, SSPUSH/SSPOPCHK/C_SSPUSH became encodable under the
Zimop/Zcmop predicates alone (old Zicfiss requirement was relaxed to
Zimop).
Before that, the hw-shadow-stack codegen path introduced in PR #152251
needed PseudoMOP_* wrappers that expand to the base
MOP_RR_7/MOP_R_28/C_MOP_1, because the real SSPUSH/SSPOPCHK/C_SSPUSH
were gated by Zicfiss while the codegen path only required Zimop.
As a side effect, the assembler printed mop.rr.7 zero, zero, ra/mop.r.28 zero, ra for hw-shadow-stack functions instead of the
proper sspush ra/sspopchk ra mnemonics from the CFI RISC-V spec
(while the disassembler already printed proper sspush/sspopchk.

With predicates now aligned, the pseudos are just plain wrappers with
identical predicates to the real instructions, so they became redundant.
This patch removes them and emits SSPUSH/SSPOPCHK/C_SSPUSH directly.

[clang] fix transformation of SubstNonTypeTemplateParmExpr nodes from type alias templates and concepts (#200850)

This makes sure SubstNonTypeTemplateParmExpr produced from
non-specialization decls (Type alias templates and concepts) are
correctly transformed.

This makes the SubstNonTypeTemplateParmExpr store the parameter type
directly, and uses that instead of relying on the AssociatedDecl.

Fixes #191738
Fixes #196375

[libc] Add a placeholder for swprintf function (#200895)

Add a declaration and stub implementation for the swprintf function.
Only enable it when LLVM_LIBC_ENABLE_EXPERIMENTAL_ENTRYPOINTS is
specified to clarify that the implementation is not ready yet.

We're singling out swprintf among the other wide-character formatting
functions because it's used in libc++ to implement std::to_wstring for
floating point values
(

wstring to_wstring(float val) { return as_string(get_swprintf(), initial_string<wstring>()(), L"%f", val); }
),
and is the last remaining piece of functionality preventing us from
turning on wide character support in libc++ built against llvm-libc.
Adding the stub function would allow us to test the compilation with
_LIBCPP_HAS_WIDE_CHARACTERS enabled, and start keeping track of what
tests from libc++ test suite are working / not working yet.

[libc++][type_traits] Applied [[nodiscard]] (#200760)

[[nodiscard]] should be applied to functions where discarding the
return value is most likely a correctness issue.

[NVPTX] Fix sext(shl nsw x, topbit) miscompile. (#200924)

Consider the following IR.

i64 f(i32 %x) {
  %s = shl nsw i32 %x, 31
  %e = sext i32 %s to i64
  ret %e
}

combineMulWide (renamed in this patch to combineSZExtToMulWide) rewrites
this into:

mul.wide.s32 %dst, %x, -2147483648

The LLVM IR is only meaningful for %x == 0 or -1; all other inputs
result in poison. Therefore to check whether this rewrite is correct, we
just need to ask if it generates the correct output when %x is 0 and
when %x is -1.

  • When x == 0, IR and PTX both produce 0. OK.
  • When x == -1:
    • IR produces sext(INT32_MIN), whereas
    • PTX produces sext(-1) * sext(INT32_MIN) = -sext(INT32_MIN)

Therefore the transformation is not correct.

This only happens when we're shifting by N-1 bits (where N is the narrow
integer width). In other cases, the multiplier is positive and it works
fine.

[CIR] Split RecordType into StructType/UnionType

Union tail padding was stored as the last element of the members array
with a padded = true flag. Every pass touching RecordType had to
special-case unions: skip the last member when iterating, call
getLargestMember() to find the storage type, check isUnion() before
almost every layout query. This spread union-specific logic across
CIRTypes.cpp, LowerToLLVM.cpp, CXXABILowering.cpp,
CIRGenRecordLayoutBuilder.cpp, and CIRGenExpr.cpp.

This PR moves union tail padding to a dedicated padding field on
RecordTypeStorage and introduces two C++ view classes: UnionType and
StructType, both subclassing RecordType via classof-based
dispatch. mlir::dyn_cast&lt;UnionType&gt; and mlir::isa&lt;StructType&gt; work
naturally.

UnionType owns the union-specific API: getPadding() returns the
tail-padding type (null if none), and getUnionStorageType(DataLayout)
returns the highest-alignment variant. RecordType::getLargestMember()
and the old RecordType::getPadding() (which returned the last member)
are removed.

The CIR text syntax for padded unions changes from union "Name" padded {members..., padType} to union "Name" padded {members...}, padding = {padType}, separating padding from the member list.

CIRGenRecordLayoutBuilder::appendPaddingBytes now routes union tail
padding to a new unionPadding field instead of appending to
fieldTypes. LowerToLLVM and CXXABILowering use the new UnionType
API directly.

[Flang] Fix device-side module lookup (#200863)

When invoking flang with device-offloading (eg. flang modfile.f90 -fopenmp --offload-arch=gfx90a), it will invoke the frontend twice:
once for the host architecture, and a second time for the architecture
specified with --offload-arch. However, both frontend invocations are
going to write modfile.mod (or whatever the module name in
modfile.f90), and as a result the second one for gfx90a will be what
the file contains after the driver invocation returns. Until #171515
both version of the file were identical, but now both files are using a
different set of builtin modules. Since Flang's mod files store the
checksums of used module files in them, this can result in a checksum
mismatch error. For instance, modfile.mod being the gfx90a version, and
then using it to compile with flang modfile.f90 --target=x86_64-linux-gnu) will have a checksum mismath.

flang -fc1 host x86_64 --> modfile.mod --> lib/clang/23/finclude/flang/x86_64-linux-gnu/iso_fortran_env.mod
/ / \
flang -fc1 -foffload-device nvptx / \ lib/clang/23/finclude/flang/nvptx64-nvidia-cuda/iso_fortran_env.mod
/
flang -fc1 -foffload-device amdgcn lib/clang/23/finclude/flang/amdgcn-amd-amdhsa/iso_fortran_env.mod

We fix this by

  1. Not overwriting the --target host module file with the
    --offload-arch module; the auxiliary target is the canonical version
    for its contents; and

  2. Ignore checksum errors when using an intrinsic module during
    offloading. The device version should be compatible with the host
    version, just with definitions which the .mod file will eventually
    import from the intrinsic module at compile-time.

[LFI][AArch64] Emit .tlsdesccall with the associated blr (#200903)

This PR fixes an issue in the LFI control-flow rewrites if a blr is
marked with .tlsdesccall. The .tlsdesccall should be moved after
rewriting so that it is associated with the rewritten blr, rather than
the guard instruction.

[LLVM] Add flags to crash the opt/codegen pipeline (#200967)

Will be used for testing crash reduction.

[DirectX] Disable DCE and DSE for -O0 (#192520)

These are optimisation passes which are inappropriate to run when the
user has requested no optimisations, and which make it more difficult to
write tests.

Co-authored-by: Andrew Savonichev <andrew.savonichev@gmail.com>

[Instrumentor] Improve the config wizard script (#199108)

This makes the config wizard script more generic as we grow
instrumentation opportunities. Better output, e.g., clear paths, are
also displayed now.

Prepared with Claude (AI) and tested by me afterwards.

[NFC][SPIR-V] Fix unused-variable in SPIRVBuiltins (#200842)

[clang][HeuristicResolver] Handle non-dependent TemplateSpecializationType gracefully (#200714)

Fixes #197716

[clang] fix getTemplateInstantiationArgs (#199528)

This implements a new strategy for collecting the template arguments, by
relying on the qualifiers and template parameter lists to navigate the
template context of out-of-line definitions.

This greatly simplifies the signature of that function, by removing a
bunch of workarounds, and simpliffying a couple that weren't removed
yet.

Since this now relies on qualifiers and template parameter lists, this
patch expends most of its effort making sure these are placed,
transformed and propagated to template instantiations.

Also makes the explicit specialization AST nodes stop abusing the
template parameter lists by storing it's own template parameter list,
creating a dedicated field for them, similar to partial specializations.

Fixes #101330

Revert "[LLVM] Add flags to crash the opt/codegen pipeline" (#200977)

Reverts llvm/llvm-project#200967

Test failing on some buildbots:
https://lab.llvm.org/buildbot/#/builders/11/builds/41237

[clang][clang-tools-extra] Remove unused DenseMapInfo::getTombstoneKey (#200634)

#200595 changed DenseMap to no longer create tombstone buckets, so
DenseMapInfo<T>::getTombstoneKey() is never called. Remove dead
definitions and dead tombstone branches.

[VPlan] Move IV predicate handling to VPlan. (#192876)

[clang-tidy] Fix false positive in bugprone-use-after-move for std::tie (#192895)

std::tie(a, b) = expr reinitializes all variables passed to std::tie
because the tuple assignment operator writes back through the stored
references. The check was not recognizing this pattern, causing a false
positive on the second std::tie assignment in loops like:

std::tie(a, b) = foo(std::move(a), std::move(b));
std::tie(a, b) = foo(std::move(a), std::move(b)); // false positive

Add std::tie assignment as a reinitialization case in
makeReinitMatcher().

Fixes #136105.


AI Disclosure: Claude (Anthropic) was used to assist in diagnosing
the CI test failure and identifying the off-by-one line number in the
CHECK-NOTES.


Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Zeyi Xu <mitchell.xu2@gmail.com>

[RISCV] Disable Zilsd CSR-pair generation when push/pop or save-restore is enabled (#200623)

We were generating duplicate/worse code due to the generation of the
Zilsd load/store doubles for handling CSR's when Zcmp/Xqccmp or
Save/Restore Libcalls were enabled.

[clang-tidy] Add bugprone-missing-end-comparison check (#182543)

This PR introduces a new check bugprone-missing-end-comparison.

It detects instances where the result of a standard algorithm is used
directly in a boolean context without being compared against the
corresponding end iterator.

Currently the check can't handle algorithms returning std::pair and
std::ranges::mismatch_result, but it should be a good enough starting
point for future improvements.

As of AI-Usage: Assisted by Gemini CLI (for pre-commit reviewing,
documentation and some code refactor/cleanup)
Closes #178731


Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com>

[SandboxVec][LoadStoreVec][AMDGPU] Remove early reject of mixed types (#200523)

Up until now mixing floats and non-floats was disabled in the legality
checks. This patch changes this. We are now eagerly vectorizing mixed
types, but we are also checking the cost model to make sure we don't
regress on targets where this is expensive.

[ORC] Simplify DylibManager::lookupSymbols, remove LookupRequest. (#195954)

DylibManager::lookupSymbols used to take an array of LookupRequests,
where each request specified a handle and list of symbols to lookup
within that handle.

This commit replaces the array of lookup requests with a single handle
and list of symbols passed directly to lookupSymbols.

In practice all clients were passing a singlton array anyway, and
simplifying this signature significantly simplifies implementations.

[ORC] Fix header comment. NFC. (#200980)

[clang][bytecode] Improve getType() (#200342)

We previously often fell back to the type of the declaration, which is
wrong if we're pointing e.g. to a nested array.

Add a new unit test to vaildate this.

[CIR] Fix cir.call_llvm_intrinsic lowering for 0-result ops (#199516)

cir.call_llvm_intrinsic declares Optional&lt;CIR_AnyType&gt;:$result, but
the lowering indexed op-&gt;getResultTypes()[0] unconditionally and OOBed
on void calls.
Guard with getNumResults() and pick the void overload of
LLVM::CallIntrinsicOp::create in createCallLLVMIntrinsicOp.

[CIR][AMDGPU] Implement lowering for __builtin_amdgcn_dispatch_ptr (#199880)

Port emitAMDGPUDispatchPtr from OGCG. Emits the amdgcn.dispatch.ptr
intrinsic and inserts an address-space cast when the builtin's expected
return type differs.

[llvm-debuginfo-analyzer] Add support for LLVM IR format. (#200603)

llvm-debuginfo-analyzer is a command line tool that processes debug
info contained in a binary file and produces a debug information
format agnostic “Logical View”, which is a high-level semantic
representation of the debug info, independent of the low-level format.

Add support for the LLVM IR format and be able to generate logical
views. Both textual representation (.ll) and bitcode (.bc) formats
are supported.

This relands #135440, which was
reverted in #199890.
It includes the fixes for the buildbots problems.

[SelectionDAG] Don't over-claim alignment on vector splice/compress stack MMOs (#200622)

expandVectorSplice and expandVECTOR_COMPRESS allocate their scratch slot
on the stack with getReducedAlign, but the memory accesses they generate
touching this slot use the type's natural alignment, which may be
larger!

[scudo] Return nullptr if a remap fails on linux. (#200537)

Add a check if a fixed address mmap doesn't return the expected address.

Allow a remap call to fail if the mmap fails and returns a nullptr to
the caller.

Fix a place where if remap fails in the secondary, it didn't do
anything. Now it will unmap the original entry on failure.

[LLVM][IR] Make sure that DILabel's line is always printed (#200846)

This commit ensures that the textual IR of the DILabel always contains
the line information. This is required as the absence of a line causes
a parsing failure, i.e., this change fixes the print + parse roundtrip.

[X86] Fold OR of constant splats into GF2P8AFFINEQB (#194330)

Fold OR of a constant byte splat into X86ISD::GF2P8AFFINEQB when the
affine matrix is known at compile time.

For bits forced to 1 by the OR mask, zero the corresponding matrix rows
(in reverse row order within each 64-bit lane) and set the same bits in
the immediate. This turns:

gf2p8affineqb(x, M, imm) | C

into:

gf2p8affineqb(x, M & KeepMask, imm | C)

where KeepMask clears the rows for output bits selected by C.

This removes a separate OR after GF2P8AFFINEQB for constant-matrix cases
and extends the existing GFNI combine coverage beyond XOR folds.

Fixes: #191173

workflows/upload-release-artifact: Fix typo introduced in 01cfbaa (#200730)

Revert "[llvm-debuginfo-analyzer] Add support for LLVM IR format. (#200603)" (#201019)

This reverts commit 4cef4ef.

There are link issues with some buildbots.

[RISCV] Ensure LPAD alignment for calls to returns_twice functions (#177515)

When Zicfilp is enabled, all LPAD instructions must be 4-byte aligned,
including those following a returns_twice call. Linker relaxation can
convert calls to c.jal (RV32C) or cm.jalt (Zcmt), causing LPAD
misalignment.

This patch handles calls to returns_twice functions (e.g., setjmp) when
Zicfilp CFI is enabled:

  1. In ISel (LowerCall), detect
    CLI.CB->hasFnAttr(Attribute::ReturnsTwice)
    and select LPAD_CALL or LPAD_CALL_INDIRECT ISD opcodes. Custom ISel
    in RISCVISelDAGToDAG adds the landing pad label as a pseudo operand.

  2. RISCVAsmPrinter::emitLpadAlignedCall handles assembly output: when
    Zca
    is enabled, emit .p2align 2 before the call. When linker relaxation is
    also enabled, wrap with .option push/exact/pop to prevent relaxation.
    The LPAD instruction is emitted immediately after the call.

  3. RISCVMCCodeEmitter::expandFunctionCallLpad handles object output:
    expands the pseudo to AUIPC+JALR+LPAD (direct) or JALR+LPAD (indirect),
    without R_RISCV_RELAX relocation.

Note: Both AsmPrinter and MCCodeEmitter need separate expansion logic
because GNU assembler doesn't yet support call.lpad/jalr.lpad syntax.

  1. RISCVIndirectBranchTracking no longer needs to scan for returns_twice
    calls since LPAD is now emitted directly by the pseudo expansion.

The existing lpad.ll test is updated to add cf-protection-branch module
flag, as the test_returns_twice function was intended to test
CFI-enabled
behavior. Without this flag, LPAD insertion after setjmp would not be
triggered.

Known issue: When Large Code Model and Zicfilp are both enabled, direct
calls to returns_twice functions use SW_GUARDED_CALL which takes
priority
over LPAD_CALL. This combination is not yet handled and will be
addressed
in a follow-up patch.

Signed-off-by: Jerry Zhang Jian <jerry.zhangjian@sifive.com>

[CMake][Release] Use llvm-bitcode-strip on Darwin for build with -ffat-lto-objects (#200764)

Building with --fat-lto-objects was added in #140381 (cff9ae7). On
macOS the tests are failing when building release binaries with many
"The file was not recognized as a valid object file" errors, e.g.:

2026-01-16T12:54:26.0928880Z

/Users/runner/work/llvm-project/llvm-project/build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/_CPack_Packages/Darwin/TXZ/LLVM-22.1.0-rc1-macOS-ARM64/bin/llvm-strip:
error:

'/Users/runner/work/llvm-project/llvm-project/build/tools/clang/stage2-instrumented-bins/tools/clang/stage2-bins/_CPack_Packages/Darwin/TXZ/LLVM-22.1.0-rc1-macOS-ARM64/lib/libLLVMAArch64AsmParser.a(AArch64AsmParser.cpp.o)':
The file was not recognized as a valid object file

It's assuming bitcode is embedded in section .llvm.lto
(llvm/lib/Object/ObjectFile.cpp:80) but on MachO it's in
__LLVM,__bitcode (llvm/lib/Object/MachOObjectFile.cpp:2184)

llvm-bitcode-strip is a driver for the MachO bitcode_strip utility which
handles this. Use this instead.

Fixes #176398.

Assisted-by: codex

[CodeGen] De-type getMinimalPhysRegClass and related APIs (#197495)

Follow-up #193438 to de-type getMinimalPhysRegClass so it can be replaced with
the precomputed getDefaultMinimalPhysRegClass. Type is also removed from
related APIs.

There's very few uses of getMinimalPhysRegClass with a non-default type
(MVT::Other) and none at all for getCommonMinimalPhysRegClass. Rather than
trying to also handle the type when precomputing at compile-time it seems
better to remove the type altogether and simplify the APIs. This also improves
compile-time for a number of targets and configurations:

CTMark geomean:

  • stage1-O3: -0.23%
  • stage1-ReleaseThinLTO: -0.19%
  • stage1-ReleaseLTO-g: -0.15%
  • stage1-aarch64-O3: -0.57%
  • stage2-O3: -0.23%
  • clang: -0.13%

https://llvm-compile-time-tracker.com/compare.php?from=eae0b6b2498305ee29dc85a405ede9ccdc10ce7d&amp;to=08c052a2db407d2a21d468001fd2035d3720acf7&amp;stat=instructions%3Au

Assisted-by: codex

[X86] Move Non-VLX handling for VPMADD52 instructions entirely into tablegen (#200800)

Just widen v2i64/v4i64 cases to v8i64

Removes more VLX/NoVLX testing from DAG

[CIR] Add cir.lifetime.start and cir.lifetime.end Op (#199599)

summary

Adds the cir.lifetime.start and cir.lifetime.end ops, which mark the
beginning and end of an alloca's live range. They mirror the LLVM
intrinsics llvm.lifetime.start / llvm.lifetime.end

[LifetimeSafety] Prevent false-negative lifetimebound verification when origin escapes in an unrelated manner (#200786)

This PR removes a false-negative [[lifetimebound]] verification result
that occurs when the annotated attribute escapes via unrelated origin
escape kinds (escape through global, escape through field).

Change summary:

  • Checker.cpp has function checkAnnotations which checks if an
    escaping Origin was an annotated parameter. Modified the logic there
    to only verify [[lifetimebound]] in the case that the escape was
    through a return statement.

Fixes #200412


Signed-off-by: Abhinav Pradeep <abhinav.pradeep@oracle.com>

[clang][CodeGen] Drop TBAA metadata emission on FP libcalls (#200752)

TBAA annotation on FP libcalls has been superseded by
recently-introduced llvm.errno.tbaa module-level metadata.

[lldb][NFCI] Cleanup AppleObjCClassDescriptorV2::ivar_t API (#201042)

[LoopInterchange] Prevent interchange when memory-accessing calls exist (#200828)

Previously loop-interchange can be applied even though the loop has call
instructions which may access the memory. The root cause of this problem
is that the implementation didn't match the comment, like below:

        // readnone functions do not prevent interchanging.
        if (CI-&gt;onlyWritesMemory() || isa&lt;PseudoProbeInst&gt;(CI))
          continue;

However, I think ensuring readnone is insufficient in the first place,
because the LLVM Language Reference states about readnone as follows:

This attribute indicates that the function does not dereference that pointer argument, even though it may read or write the memory that the pointer points to if accessed through other pointers.

So, this patch fixes the issue by verifying that all the calls in the
loop have the memory(none) attribute. We could probably check that all
the arguments of the calls have readnone, but I believe just calling
doesNotAccessMemory is simpler and sufficient in practical cases
(though I haven't actually checked).

Fixes #200796.

[SPIR-V] Fix inverted signed/unsigned opcode for int-to-int convert builtins (#200791)

[clang][bytecode] Reject invalid UETT_OpenMPRequiredSimdAlign nodes (#200997)

[GVN] MemorySSA for GVN: eliminate redundant loads via MemorySSA (#152859)

Introduce the main algorithm performing redundant load elimination via
MemorySSA in GVN. The entry point is findReachingValuesForLoad, which,
given as input a possibly redundant load L, it attempts to provide as
output a set of reaching memory values (ReachingMemVal), i.e., which
values (defs or equivalent reads) can reach L along at least one path
where that memory location is not modified meanwhile (if non-local, PRE
will establish whether the load may be eliminated).

Specifically, a reaching value may be of the following descriptor kind
(DepKind):

  • Def: found a new instruction that produces exactly the bits the load
    would read. For example, a must-alias store (which defines the load
    memory location), or a must-alias read (exactly reads the same memory
    location, found, e.g., after a phi-translation fixup);
  • Clobber: found a write that clobbers a superset of the bits the load
    would read. For example, a memset call over a memory region, whose value
    read overlaps such a region (and may be forwarded to the load), or a
    generic call clobbering the load that cannot be reused;
  • Other: a precise instruction could not be found, but know the block
    where the dependency exists (e.g., a memory location already live at
    function entry).

We start off by looking for the users of the defining memory access of
the given load within the same block, searching for local dependencies.
We may need to iteratively follow the use-def chain of the defining
memory access to find the actual clobbering access, while staying within
the scope of the load. If an actual local def/clobber (or liveOnEntry)
is found, we are done and return that one. Otherwise, we move visiting
the predecessors of the load's block, considering the incoming value
from the MemoryPhi as a potential clobbering access. Hence, we search
for dependencies within the predecessor block itself. If the new
potential clobbering access is a MemoryPhi, we continue visiting the
transitive closure of the predecessor, recording its new potential
clobbering access. In this phase, as we are exploring non-local
definitions, the load itself may be considered a must-alias def as well
when being examined from a backedge path.

Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>

[clang][cmake] Move perf-training out of CLANG_INCLUDE_TESTS (#192163)

perf-training defines the generate-profdata target used by the PGO
bootstrap build.
However, it is currently enabled only when CLANG_INCLUDE_TESTS=ON.
For distribution builds such as Yocto/OE, tests are usually disabled by
setting this to OFF.

But perf-training is a PGO utility, not a test target, and it is
currently gated by that block.
As a result, generate-profdata is unavailable to the PGO bootstrap build
when
CLANG_INCLUDE_TESTS=OFF.

Move perf-training out of the CLANG_INCLUDE_TESTS block.

This is safe because utils/perf-training/CMakeLists.txt adds targets
only when
LLVM_BUILD_INSTRUMENTED or CLANG_BOLT is enabled, so moving it out does
not add
any targets unless PGO or BOLT is actually in use.

Also fix the path in lit.site.cfg.in for standalone builds.
When llvm_src_dir is set from CMAKE_SOURCE_DIR, standalone clang builds
end up
using the clang source directory instead of the LLVM source directory.
Set it to the proper LLVM source location instead.

Assisted-by: claude

[LV][RISCV] Add crash test for wide pointer stride in convertToStridedAccesses (#200985)

Pre-commit test for #199640.
The test demonstrates a crash when the pointer stride exceeds the
canonical IV type (i32) range. To be updated by #199647.

[llubi] Global variables with simple initializers and global-address constants (#200547)

This PR implements global variables with simple initializers and
global-address constants. Support for constant expressions, alias, etc.,
are left for future PRs.

[AArch64][GlobalISel] Add handling for scalar sqabs intrinsic (#200222)

sqabs is a neon intrinsic, so can only be performed on vector register
banks. To handle intrinsic properly, coerce regbankselection to put
sqabs on fpr banks.

Remove an invalid FIXME in a MemCpyOpt test (#200809)

Call slot optimization explicitly requires writable and noalias
when the destination is an argument, and sret doesn't imply any of
these.

So the memcpy into the return value pointer cannot be optimized away
in this test.

The test for doing this optimization when the sret argument is also
writable and noalias is here.

Relevant commits:

  • 369c9b7 requires writability in call
    slot optimization and adds the writable attribute in the test linked
    above, which already has sret, so that it optimizes as before. So
    sret does not imply writable.

  • f445e39 updates isWritable used in
    the commit above to check for noalias on arguments before concluding
    "writable".

The FIXME should probably have been removed with one of these commits.

[AMDGPU] Recompute EXEC liveness in SIWholeQuadMode::toExact (#200866)

This is required as stale liveness information can lead to an incorrect
optimization in SIOptimizeExecMaskingPreRA. For example, when hi16
is removed from EXEC, optimizeElseBranch produces an incorrect result
by removing S_AND.

This is caused by the following code:

  SlotIndex StartIdx = LIS-&gt;getInstructionIndex(SaveExecMI);
  SlotIndex EndIdx = LIS-&gt;getInstructionIndex(*AndExecMI);
  for (MCRegUnit Unit : TRI-&gt;regunits(ExecReg)) {
    LiveRange &amp;RegUnit = LIS-&gt;getRegUnit(Unit);
    if (RegUnit.find(StartIdx) != std::prev(RegUnit.find(EndIdx)))
      return false;
  }

When hi16 is available there are two RegUnits, one for hi16 and
one for lo16. In the case of wqm.ll test it produces two live
ranges:

0: [320r,320d:3)[368r,368d:2)[736r,736d:4)[832r,832d:1)[944r,944d:0) 0@<!-- -->944r 1@<!-- -->832r 2@<!-- -->368r 3@<!-- -->320r 4@<!-- -->736r
1: [12r,12d:1)[320r,320d:5)[368r,368d:4)[736r,736d:6)[744r,744d:0)[832r,832d:3)[944r,944d:2) 0@<!-- -->744r 1@<!-- -->12r 2@<!-- -->944r 3@<!-- -->832r 4@<!-- -->368r 5@<!-- -->320r 6@<!-- -->736r

When hi16 is removed there is only one range:

0: [320r,320d:3)[368r,368d:2)[736r,736d:4)[832r,832d:1)[944r,944d:0) 0@<!-- -->944r 1@<!-- -->832r 2@<!-- -->368r 3@<!-- -->320r 4@<!-- -->736r

If only the first range is considered the loop will finish without
returning false and continue to remove S_AND. It is because EXEC
register for `S_A


Patch is 25.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/201644.diff

11 Files Affected:

  • (modified) clang/include/clang/CIR/Dialect/IR/CIRAttrs.td (+44)
  • (modified) clang/include/clang/CIR/Dialect/IR/CIROps.td (+11)
  • (modified) clang/lib/CIR/CodeGen/CIRGenExprConstant.cpp (+24-2)
  • (modified) clang/lib/CIR/CodeGen/CIRGenFunction.cpp (+15)
  • (modified) clang/lib/CIR/CodeGen/CIRGenFunction.h (+10)
  • (modified) clang/lib/CIR/Dialect/IR/CIRDialect.cpp (+1-1)
  • (modified) clang/lib/CIR/Dialect/Transforms/GotoSolver.cpp (+40-2)
  • (modified) clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp (+53-15)
  • (modified) clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h (+4-1)
  • (added) clang/test/CIR/CodeGen/goto-address-label-table.c (+91)
  • (added) clang/test/CIR/IR/global-block-address.cir (+8)
diff --git a/clang/include/clang/CIR/Dialect/IR/CIRAttrs.td b/clang/include/clang/CIR/Dialect/IR/CIRAttrs.td
index 4032d8219fff3..4cab0e5aab49f 100644
--- a/clang/include/clang/CIR/Dialect/IR/CIRAttrs.td
+++ b/clang/include/clang/CIR/Dialect/IR/CIRAttrs.td
@@ -1476,6 +1476,50 @@ def CIR_BlockAddrInfoAttr : CIR_Attr<"BlockAddrInfo", "block_addr_info"> {
   let canHaveIllegalCXXABIType = 0;
 }
 
+//===----------------------------------------------------------------------===//
+// CIR_BlockAddressAttr
+//===----------------------------------------------------------------------===//
+
+def CIR_BlockAddressAttr
+    : CIR_ValueLikeAttr<"BlockAddress", "block_address"> {
+  let summary = "Constant address of a label within a function";
+  let description = [{
+    Represents the address of a label inside a function as a constant
+    pointer, mirroring LLVM IR's `blockaddress(@func, %bb)`.  Unlike the
+    `cir.block_address` operation, this attribute is a constant, so it can
+    appear in global initializers and `#cir.const_array` elements.  This is
+    required for the GNU computed-goto dispatch-table idiom:
+
+    ```c
+    static const void *tbl[] = {&&L1, &&L2};
+    goto *tbl[i];
+    ```
+
+    Example:
+
+    ```mlir
+    #cir.block_address<@func, "label"> : !cir.ptr<!cir.void>
+    ```
+  }];
+
+  let parameters = (ins AttributeSelfTypeParameter<"">:$type,
+                        "mlir::FlatSymbolRefAttr":$func,
+                        "mlir::StringAttr":$label);
+
+  let builders = [
+    AttrBuilderWithInferredContext<(ins "mlir::Type":$type,
+                                        "mlir::FlatSymbolRefAttr":$func,
+                                        "mlir::StringAttr":$label), [{
+      return $_get(type.getContext(), type, func, label);
+    }]>
+  ];
+
+  let assemblyFormat = "`<` $func `,` $label `>`";
+
+  // A block address is always a plain void pointer, never a C++ ABI type.
+  let canHaveIllegalCXXABIType = 0;
+}
+
 //===----------------------------------------------------------------------===//
 // Side Effect
 //===----------------------------------------------------------------------===//
diff --git a/clang/include/clang/CIR/Dialect/IR/CIROps.td b/clang/include/clang/CIR/Dialect/IR/CIROps.td
index c4d08d5337031..ec06faf468702 100644
--- a/clang/include/clang/CIR/Dialect/IR/CIROps.td
+++ b/clang/include/clang/CIR/Dialect/IR/CIROps.td
@@ -503,6 +503,11 @@ def CIR_ConstantOp : CIR_Op<"const", [
 
   let hasCXXABILowering = true;
   let isLLVMLoweringRecursive = true;
+
+  // The constant initializer may embed a #cir.block_address attribute, whose
+  // lowering shares the pass-owned block-address bookkeeping.
+  let customLLVMLoweringConstructorDecl =
+    LoweringBuilders<(ins "LLVMBlockAddressInfo &":$blockInfoAddr)>;
 }
 
 //===----------------------------------------------------------------------===//
@@ -2984,6 +2989,12 @@ def CIR_GlobalOp : CIR_Op<"global", [
     mlir::SymbolRefAttr getComdatAttr(cir::GlobalOp &op,
                                       mlir::OpBuilder &builder) const;
   }];
+
+  // A region-initialized global may embed a #cir.block_address attribute
+  // (e.g. a static computed-goto dispatch table), whose lowering shares the
+  // pass-owned block-address bookkeeping.
+  let customLLVMLoweringConstructorDecl =
+    LoweringBuilders<(ins "LLVMBlockAddressInfo &":$blockInfoAddr)>;
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/clang/lib/CIR/CodeGen/CIRGenExprConstant.cpp b/clang/lib/CIR/CodeGen/CIRGenExprConstant.cpp
index 5208af44412a3..04dd80fd7b377 100644
--- a/clang/lib/CIR/CodeGen/CIRGenExprConstant.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenExprConstant.cpp
@@ -1234,6 +1234,10 @@ struct ConstantLValue {
       : value(nullptr), hasOffsetApplied(false) {}
   /*implicit*/ ConstantLValue(cir::GlobalViewAttr address)
       : value(address), hasOffsetApplied(false) {}
+  // A label address has no object offset, so mark the offset as already
+  // applied to skip applyOffset (which only knows how to offset globals).
+  /*implicit*/ ConstantLValue(cir::BlockAddressAttr address)
+      : value(address), hasOffsetApplied(true) {}
 
   ConstantLValue() : value(nullptr), hasOffsetApplied(false) {}
 };
@@ -1514,8 +1518,26 @@ ConstantLValueEmitter::VisitPredefinedExpr(const PredefinedExpr *e) {
 
 ConstantLValue
 ConstantLValueEmitter::VisitAddrLabelExpr(const AddrLabelExpr *e) {
-  cgm.errorNYI(e->getSourceRange(), "ConstantLValueEmitter: addr label expr");
-  return {};
+  // A label address taken in a constant context, e.g. a static computed-goto
+  // dispatch table `static void *tbl[] = {&&L1, &&L2}`.  Emit a constant
+  // #cir.block_address and register the label as address-taken so the
+  // enclosing function gets an indirect-goto block with this label among its
+  // successors.  A label is always function-local, so cgf is set here.
+  assert(emitter.cgf &&
+         "label address in a constant requires an enclosing function");
+  assert(value.getLValueOffset().isZero() &&
+         "label address cannot carry an offset");
+  CIRGenFunction &cgf = *const_cast<CIRGenFunction *>(emitter.cgf);
+  mlir::MLIRContext *ctx = cgm.getBuilder().getContext();
+  auto func = cast<cir::FuncOp>(cgf.curFn);
+  mlir::Type ptrTy = cgm.getTypes().convertTypeForMem(destType);
+  assert(mlir::isa<cir::PointerType>(ptrTy) &&
+         "label address in a constant must be a pointer");
+
+  auto info = cir::BlockAddrInfoAttr::get(ctx, func.getSymName(),
+                                          e->getLabel()->getName());
+  cgf.takeAddressOfConstantLabel(info);
+  return cir::BlockAddressAttr::get(ptrTy, info.getFunc(), info.getLabel());
 }
 
 ConstantLValue ConstantLValueEmitter::VisitCallExpr(const CallExpr *e) {
diff --git a/clang/lib/CIR/CodeGen/CIRGenFunction.cpp b/clang/lib/CIR/CodeGen/CIRGenFunction.cpp
index 4ecb47a864146..d8e6e3b04f362 100644
--- a/clang/lib/CIR/CodeGen/CIRGenFunction.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenFunction.cpp
@@ -625,10 +625,20 @@ void CIRGenFunction::finishIndirectBranch() {
     succesors.push_back(labelOp->getBlock());
     rangeOperands.push_back(labelOp->getBlock()->getArguments());
   }
+  // Labels whose address was taken only from a constant initializer have no
+  // function-local BlockAddressOp; add them as successors here.  All labels
+  // are emitted by now, so the lookup resolves.
+  for (cir::BlockAddrInfoAttr info : constBlockAddressLabels) {
+    cir::LabelOp labelOp = cgm.lookupBlockAddressInfo(info);
+    assert(labelOp && "expected cir.label to be emitted for const block addr");
+    succesors.push_back(labelOp->getBlock());
+    rangeOperands.push_back(labelOp->getBlock()->getArguments());
+  }
   cir::IndirectBrOp::create(builder, builder.getUnknownLoc(),
                             indirectGotoBlock->getArgument(0), false,
                             rangeOperands, succesors);
   cgm.blockAddressToLabel.clear();
+  constBlockAddressLabels.clear();
 }
 
 void CIRGenFunction::finishFunction(SourceLocation endLoc) {
@@ -1476,6 +1486,11 @@ void CIRGenFunction::instantiateIndirectGotoBlock() {
                           {builder.getUnknownLoc()});
 }
 
+void CIRGenFunction::takeAddressOfConstantLabel(cir::BlockAddrInfoAttr info) {
+  constBlockAddressLabels.push_back(info);
+  instantiateIndirectGotoBlock();
+}
+
 mlir::Value CIRGenFunction::emitAlignmentAssumption(
     mlir::Value ptrValue, QualType ty, SourceLocation loc,
     SourceLocation assumptionLoc, int64_t alignment, mlir::Value offsetValue) {
diff --git a/clang/lib/CIR/CodeGen/CIRGenFunction.h b/clang/lib/CIR/CodeGen/CIRGenFunction.h
index 1e9be3dc2174e..a359b662cd2b0 100644
--- a/clang/lib/CIR/CodeGen/CIRGenFunction.h
+++ b/clang/lib/CIR/CodeGen/CIRGenFunction.h
@@ -730,6 +730,12 @@ class CIRGenFunction : public CIRGenTypeCache {
   /// been resolved.
   mlir::Block *indirectGotoBlock = nullptr;
 
+  /// Labels whose address is taken in a constant context (e.g. a static
+  /// computed-goto dispatch table).  These have no function-local
+  /// BlockAddressOp, so they are tracked here and added as indirect-goto
+  /// branch successors in finishIndirectBranch.
+  llvm::SmallVector<cir::BlockAddrInfoAttr> constBlockAddressLabels;
+
   void resolveBlockAddresses();
   void finishIndirectBranch();
 
@@ -1705,6 +1711,10 @@ class CIRGenFunction : public CIRGenTypeCache {
 
   void instantiateIndirectGotoBlock();
 
+  /// Record a label whose address is taken from a constant initializer and
+  /// ensure the indirect-goto block exists.
+  void takeAddressOfConstantLabel(cir::BlockAddrInfoAttr info);
+
   /// Emit a simple LLVM intrinsic that takes N scalar arguments and whose
   /// return type matches the type of the first argument. The intrinsic name is
   /// used verbatim; any overload mangling (e.g. `.f32`, `.p1`) must be baked
diff --git a/clang/lib/CIR/Dialect/IR/CIRDialect.cpp b/clang/lib/CIR/Dialect/IR/CIRDialect.cpp
index cf07fc4f0833a..3812a8ba45e10 100644
--- a/clang/lib/CIR/Dialect/IR/CIRDialect.cpp
+++ b/clang/lib/CIR/Dialect/IR/CIRDialect.cpp
@@ -588,7 +588,7 @@ static LogicalResult checkConstantTypes(mlir::Operation *op, mlir::Type opType,
   if (mlir::isa<cir::ConstArrayAttr, cir::ConstVectorAttr,
                 cir::ConstComplexAttr, cir::ConstRecordAttr,
                 cir::GlobalViewAttr, cir::PoisonAttr, cir::TypeInfoAttr,
-                cir::VTableAttr>(attrType))
+                cir::VTableAttr, cir::BlockAddressAttr>(attrType))
     return success();
 
   assert(isa<TypedAttr>(attrType) && "What else could we be looking at here?");
diff --git a/clang/lib/CIR/Dialect/Transforms/GotoSolver.cpp b/clang/lib/CIR/Dialect/Transforms/GotoSolver.cpp
index d590ccce1f540..8b3656c974059 100644
--- a/clang/lib/CIR/Dialect/Transforms/GotoSolver.cpp
+++ b/clang/lib/CIR/Dialect/Transforms/GotoSolver.cpp
@@ -6,8 +6,10 @@
 //
 //===----------------------------------------------------------------------===//
 #include "PassDetail.h"
+#include "mlir/IR/AttrTypeSubElements.h"
 #include "clang/CIR/Dialect/IR/CIRDialect.h"
 #include "clang/CIR/Dialect/Passes.h"
+#include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/Support/TimeProfiler.h"
 #include <memory>
@@ -27,7 +29,8 @@ struct GotoSolverPass : public impl::GotoSolverBase<GotoSolverPass> {
   void runOnOperation() override;
 };
 
-static void process(cir::FuncOp func) {
+static void process(cir::FuncOp func,
+                    const llvm::SmallSet<StringRef, 4> &constBlockAddrLabel) {
   mlir::OpBuilder rewriter(func.getContext());
   llvm::StringMap<Block *> labels;
   llvm::SmallVector<cir::GotoOp, 4> gotos;
@@ -43,6 +46,12 @@ static void process(cir::FuncOp func) {
     }
   });
 
+  // Labels whose address is taken only from a constant #cir.block_address
+  // (e.g. a static computed-goto dispatch table) have no function-local
+  // BlockAddressOp.  Treat them as address-taken so their LabelOp survives.
+  for (StringRef label : constBlockAddrLabel)
+    blockAddrLabel.insert(label);
+
   for (auto &lab : labels) {
     StringRef labelName = lab.getKey();
     Block *block = lab.getValue();
@@ -65,7 +74,36 @@ static void process(cir::FuncOp func) {
 
 void GotoSolverPass::runOnOperation() {
   llvm::TimeTraceScope scope("Goto Solver");
-  getOperation()->walk(&process);
+
+  // Collect labels whose address is taken via a constant #cir.block_address
+  // attribute anywhere in the module (e.g. in a static computed-goto dispatch
+  // table's initializer).  These references are not function-local
+  // BlockAddressOps, so gather them up front, keyed by function symbol, so the
+  // per-function pass does not erase the still-needed LabelOp.
+  llvm::DenseMap<StringRef, llvm::SmallSet<StringRef, 4>> constBlockAddrLabels;
+  // Only the presence of a label makes the per-function erase logic relevant,
+  // so skip the whole-module attribute walk entirely for the common case of a
+  // translation unit with no labels.
+  bool hasLabel = false;
+  getOperation()->walk([&](cir::LabelOp) {
+    hasLabel = true;
+    return mlir::WalkResult::interrupt();
+  });
+  if (hasLabel) {
+    mlir::AttrTypeWalker walker;
+    walker.addWalk([&](cir::BlockAddressAttr ba) {
+      constBlockAddrLabels[ba.getFunc().getValue()].insert(
+          ba.getLabel().getValue());
+    });
+    getOperation()->walk([&](mlir::Operation *op) {
+      for (const mlir::NamedAttribute &na : op->getAttrs())
+        walker.walk(na.getValue());
+    });
+  }
+
+  getOperation()->walk([&](cir::FuncOp func) {
+    process(func, constBlockAddrLabels.lookup(func.getSymName()));
+  });
 }
 
 } // namespace
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
index fdd2228edb06c..163ccc0d453c4 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
@@ -285,8 +285,10 @@ class CIRAttrToValue {
 public:
   CIRAttrToValue(mlir::Operation *parentOp,
                  mlir::ConversionPatternRewriter &rewriter,
-                 const mlir::TypeConverter *converter)
-      : parentOp(parentOp), rewriter(rewriter), converter(converter) {}
+                 const mlir::TypeConverter *converter,
+                 LLVMBlockAddressInfo &blockInfoAddr)
+      : parentOp(parentOp), rewriter(rewriter), converter(converter),
+        blockInfoAddr(blockInfoAddr) {}
 
 #define GET_CIR_ATTR_TO_VALUE_VISITOR_DECLS
 #include "clang/CIR/Dialect/IR/CIRLowering.inc"
@@ -296,14 +298,19 @@ class CIRAttrToValue {
   mlir::Operation *parentOp;
   mlir::ConversionPatternRewriter &rewriter;
   const mlir::TypeConverter *converter;
+  // Pass-owned block-address bookkeeping, shared with the LabelOp and
+  // BlockAddressOp lowerings so a block address embedded in a constant
+  // initializer resolves through the same mechanism as the op form.
+  LLVMBlockAddressInfo &blockInfoAddr;
 };
 
 /// Switches on the type of attribute and calls the appropriate conversion.
 mlir::Value lowerCirAttrAsValue(mlir::Operation *parentOp,
                                 const mlir::Attribute attr,
                                 mlir::ConversionPatternRewriter &rewriter,
-                                const mlir::TypeConverter *converter) {
-  CIRAttrToValue valueConverter(parentOp, rewriter, converter);
+                                const mlir::TypeConverter *converter,
+                                LLVMBlockAddressInfo &blockInfoAddr) {
+  CIRAttrToValue valueConverter(parentOp, rewriter, converter, blockInfoAddr);
   mlir::Value value = valueConverter.visit(attr);
   if (!value)
     llvm_unreachable("unhandled attribute type");
@@ -656,6 +663,33 @@ mlir::Value CIRAttrToValue::visitCirAttr(cir::GlobalViewAttr globalAttr) {
   llvm_unreachable("Expecting pointer or integer type for GlobalViewAttr");
 }
 
+// BlockAddressAttr visitor.  Mirrors CIRToLLVMBlockAddressOpLowering so a
+// block address embedded in a constant initializer (e.g. a static
+// computed-goto dispatch table) resolves through the same mechanism as the
+// cir.block_address op.
+mlir::Value CIRAttrToValue::visitCirAttr(cir::BlockAddressAttr attr) {
+  mlir::MLIRContext *ctx = rewriter.getContext();
+  mlir::Location loc = parentOp->getLoc();
+
+  cir::BlockAddrInfoAttr blockInfo =
+      cir::BlockAddrInfoAttr::get(ctx, attr.getFunc(), attr.getLabel());
+
+  mlir::LLVM::BlockTagOp matchLabel = blockInfoAddr.lookupBlockTag(blockInfo);
+  mlir::LLVM::BlockTagAttr tagAttr;
+  // If the BlockTagOp has not been emitted yet, use a placeholder tag.  It is
+  // patched with the correct tag index later in resolveBlockAddressOp.
+  if (matchLabel)
+    tagAttr = matchLabel.getTag();
+
+  auto blkAddr =
+      mlir::LLVM::BlockAddressAttr::get(ctx, attr.getFunc(), tagAttr);
+  auto newOp = mlir::LLVM::BlockAddressOp::create(
+      rewriter, loc, mlir::LLVM::LLVMPointerType::get(ctx), blkAddr);
+  if (!matchLabel)
+    blockInfoAddr.addUnresolvedBlockAddress(newOp, blockInfo);
+  return newOp;
+}
+
 // TypeInfoAttr visitor.
 mlir::Value CIRAttrToValue::visitCirAttr(cir::TypeInfoAttr typeInfoAttr) {
   mlir::Type llvmTy = converter->convertType(typeInfoAttr.getType());
@@ -2006,7 +2040,8 @@ mlir::LogicalResult CIRToLLVMConstantOpLowering::matchAndRewrite(
     }
     // Lower GlobalViewAttr to llvm.mlir.addressof
     if (auto gv = mlir::dyn_cast<cir::GlobalViewAttr>(op.getValue())) {
-      auto newOp = lowerCirAttrAsValue(op, gv, rewriter, getTypeConverter());
+      auto newOp = lowerCirAttrAsValue(op, gv, rewriter, getTypeConverter(),
+                                       blockInfoAddr);
       rewriter.replaceOp(op, newOp);
       return mlir::success();
     }
@@ -2018,32 +2053,34 @@ mlir::LogicalResult CIRToLLVMConstantOpLowering::matchAndRewrite(
 
     std::optional<mlir::Attribute> denseAttr;
     if (constArr && hasTrailingZeros(constArr)) {
-      const mlir::Value newOp =
-          lowerCirAttrAsValue(op, constArr, rewriter, getTypeConverter());
+      const mlir::Value newOp = lowerCirAttrAsValue(
+          op, constArr, rewriter, getTypeConverter(), blockInfoAddr);
       rewriter.replaceOp(op, newOp);
       return mlir::success();
     } else if (constArr &&
                (denseAttr = lowerConstArrayAttr(constArr, typeConverter))) {
       attr = denseAttr.value();
     } else {
-      const mlir::Value initVal =
-          lowerCirAttrAsValue(op, op.getValue(), rewriter, typeConverter);
+      const mlir::Value initVal = lowerCirAttrAsValue(
+          op, op.getValue(), rewriter, typeConverter, blockInfoAddr);
       rewriter.replaceOp(op, initVal);
       return mlir::success();
     }
   } else if (const auto recordAttr =
                  mlir::dyn_cast<cir::ConstRecordAttr>(op.getValue())) {
-    auto initVal = lowerCirAttrAsValue(op, recordAttr, rewriter, typeConverter);
+    auto initVal = lowerCirAttrAsValue(op, recordAttr, rewriter, typeConverter,
+                                       blockInfoAddr);
     rewriter.replaceOp(op, initVal);
     return mlir::success();
   } else if (const auto vecTy = mlir::dyn_cast<cir::VectorType>(op.getType())) {
-    rewriter.replaceOp(op, lowerCirAttrAsValue(op, op.getValue(), rewriter,
-                                               getTypeConverter()));
+    rewriter.replaceOp(op,
+                       lowerCirAttrAsValue(op, op.getValue(), rewriter,
+                                           getTypeConverter(), blockInfoAddr));
     return mlir::success();
   } else if (mlir::isa<cir::RecordType>(op.getType())) {
     if (mlir::isa<cir::ZeroAttr, cir::UndefAttr>(attr)) {
       mlir::Value initVal =
-          lowerCirAttrAsValue(op, attr, rewriter, typeConverter);
+          lowerCirAttrAsValue(op, attr, rewriter, typeConverter, blockInfoAddr);
       rewriter.replaceOp(op, initVal);
       return mlir::success();
     }
@@ -2430,7 +2467,7 @@ CIRToLLVMGlobalOpLowering::matchAndRewriteRegionInitializedGlobal(
   // to the appropriate value.
   const mlir::Location loc = op.getLoc();
   setupRegionInitializedLLVMGlobalOp(op, rewriter);
-  CIRAttrToValue valueConverter(op, rewriter, typeConverter);
+  CIRAttrToValue valueConverter(op, rewriter, typeConverter, blockInfoAddr);
   mlir::Value value = valueConverter.visit(init);
   mlir::LLVM::ReturnOp::create(rewriter, loc, value);
   return mlir::success();
@@ -3715,7 +3752,8 @@ void ConvertCIRToLLVMPass::runOnOperation() {
   /// repeated O(M) module-wide symbol scans for every call site.
   mlir::SymbolTableCollection symbolTables;
   mlir::RewritePatternSet patterns(&getContext());
-  patterns.add<CIRToLLVMBlockAddressOpLowering, CIRToLLVMLabelOpLowering>(
+  patterns.add<CIRToLLVMBlockAddressOpLowering, CIRToLLVMLabelOpLowering,
+               CIRToLLVMConstantOpLowering, CIRToLLVMGlobalOpLowering>(
       converter, patterns.getContext(), dl, blockInfoAddr);
   patterns.add<CIRToLLVMCallOpLowering, CIRToLLVMTryCallOpLowering>(
       converter, patterns.getContext(), dl, symbolTables);
diff --git a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
index c0abb40b7304e..7212a17debfa1 100644
--- a/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
+++ b/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.h
@@ -22,11 +22,14 @@ namespace cir {
 
 namespace direct {
 
+struct LLVMBlockAddressInfo;
+
 /// Convert a CIR attribu...
[truncated]

@adams381 adams381 changed the title [mlir][spirv][tosa] Add remaining TOSA 1.0 SPIR-V TOSA ops (#200383) [CIR] Lower constant block addresses for goto Jun 4, 2026
@adams381 adams381 requested review from erichkeane and lanza June 4, 2026 17:34
// CIR_BlockAddressAttr
//===----------------------------------------------------------------------===//

def CIR_BlockAddressAttr

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CIR_BlockAddrInfoAttr already does exactly this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to embedding BlockAddrInfoAttr as the single parameter, matching how BlockAddressOp already holds it. Custom print/parse keeps the assembly format unchanged.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shared part is real -- both carry func+label -- but the two roles can't collapse into one attribute. BlockAddrInfoAttr is the untyped identity: the operand of cir.block_address (the pointer type lives on the op result) and the DenseMap key for label resolution in CIRGen, GotoSolver, and lowering. BlockAddressAttr has to be a ValueLikeAttr carrying a type because the constant form appears in global initializers and #cir.const_array elements, where the initializer whitelist requires a typed value-like attr, and it lowers to an LLVM constant. Making the identity attr value-like instead would force a redundant self-type onto every operand and map-key construction -- the type already lives on the op result and is always !cir.ptr<!cir.void> -- so BlockAddressAttr just wraps the identity attr with that type for the constant positions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked CIR_BlockAddressAttr to take a single BlockAddrInfoAttr parameter instead of redeclaring func/label; a custom printer/parser keeps the <@func, "label"> format.

The constant `#cir.block_address` attribute stored its own `func` and
`label` parameters, duplicating `BlockAddrInfoAttr`, which already
bundles exactly that pair and is what the `cir.block_address`
operation takes.  Store a `BlockAddrInfoAttr` instead so the constant
and operation forms share one representation.

The textual format is unchanged (`#cir.block_address<@func, "label">`):
a small custom printer/parser flattens the embedded bundle so the two
forms keep printing the same way the operation does.  Call sites now
reach the function and label through `getBlockAddrInfo()`.
Comment thread clang/test/CIR/IR/global-block-address.cir

@xlauko xlauko left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh just looked at the other Attr which @andykaylor mentioned, do we really need both? Also I would be more for solution with Base class for both attributes, if we really need two of them.

This will allow to reuse assembly format and not to use custom assembly format.

In CIR_BlockAddressAttr one would jst add something like:

let append parameters = (ins AttributeSelfTypeParameter<"">:$type)
let append assemblyFormat = [{ `:` $type }]

to add type.

@xlauko xlauko self-requested a review June 10, 2026 09:06
@adams381

Copy link
Copy Markdown
Contributor Author

@xlauko on whether both attributes are needed: BlockAddrInfoAttr stays untyped because it's the operand of cir.block_address (the type is on the op result) and the func+label DenseMap key for label resolution across CIRGen, GotoSolver, and lowering; BlockAddressAttr is the typed value-like form required in constant initializers and #cir.const_array, which only accept a typed value-like attr. A shared base doesn't compose cleanly -- an MLIR attr can't append parameters to a base, and the base would have to be untyped for the op while the constant needs the self-type, so only the format string is sharable. The custom printer/parser only exists to print the embedded info flat as <@func, "label"> rather than the nested #cir.block_addr_info<...>; the declarative nested form would drop the custom code at the cost of the nested syntax.

@adams381 adams381 requested a review from andykaylor June 10, 2026 21:29

@andykaylor andykaylor left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somehow didn't realize this was handling the same case I was looking at today. See #203644. I think it's a bit cleaner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang Clang issues not falling into any other category ClangIR Anything related to the ClangIR project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants