Skip to content

Upstream merge 2026-04-13 16:28 EDT#2181

Open
lamb-j wants to merge 74 commits intoamd-mainfrom
upstream_merge_20260413202835
Open

Upstream merge 2026-04-13 16:28 EDT#2181
lamb-j wants to merge 74 commits intoamd-mainfrom
upstream_merge_20260413202835

Conversation

@lamb-j
Copy link
Copy Markdown
Collaborator

@lamb-j lamb-j commented Apr 13, 2026

Automated merge of upstream llvm/llvm-project main branch.

This PR was created automatically by the upstream-merge workflow.

zeyi2 and others added 30 commits April 13, 2026 12:43
…ks (llvm#189522)

Add `utils::diagDeprecatedCheckAlias` so checks can detect whether they
are running under a deprecated name without enabling the new names.

This commit also comes with an example with `zircon` module. It is
deprecated in 22 release but we didn't provide a note for it before.
…#191797)

These #includes are only needed in the SimpleRemoteEPC.cpp
implementation.
…vm#191622)

They have never existed since the initial public checkin.
For primitive array elements, we would accidentally activate the element
and then immediate de-activate the array root, which is wrong. Ignore
the element from the beginning to the later check never even compares
with the element.
…ax. (llvm#191799)

Extend test coverage with dedicated epilogue vectorization tests for
dead first-order recurrences and FMinMaxNum reductions.

Add users to FORs in existing tests where the dead FORs appeared
unintentional.
…pe builtins (llvm#190969)

When promoting scalar arguments to vectors for builtins like `ldexp`,
`pown`, and `rootn`, use the correct vector type matching the argument
element type instead of always using the return type: these builtins
take an integer argument but at the same time have floating point return
type

Fix `ldexp` test that does not pass spirv-val and add similar tests for
`pown` and `rootn`

related to llvm#190736
Pass GISelValueTracking* through isKnownNeverNaN and isKnownNeverSNaN so
that the implementation can call computeKnownFPClass to derive NaN
information from value tracking, rather than only looking at flags and
direct constant definitions. Update all callers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lvm#190519)

Most clients don't have a notion of "address" and pass arbitrary values
(including `0` and `sizeof(void *)`) to `DataExtractor` constructors.
This makes address-extraction methods dangerous to use.

Those clients that do have a notion of address can use other methods
like `getUnsigned()` to extract an address, or they can derive from
`DataExtractor` and add convenience methods if extracting an address is
routine. `DWARFDataExtractor` is an example, where the removed methods
were actually moved.

This does not remove `AddressSize` argument of `DataExtractor`
constructors yet, but makes it unused and overloads constructors in
preparation for their deletion. I'll be removing uses of the
to-be-deleted constructors in follow-up patches.
Enables LC_UUID load commands to be added with the addLoadCommand
method.

This will be used in future MachOPlatform changes to add support for
adding UUIDs to MachO JITDylibs.
…#175870)

Replace the manual check in `verifyRemoved()` with `AssertingVH`
instrumentation. For cases where the leader table becomes very large,
this is a cheaper way to verify we don't have dangling entries in the
leader table.

For this change, we must implement a move constructor for `AssertingVH`
so that we can keep the first entry as an inline-allocated node that
will be handled correctly as the table grows.
Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

This PR expands four intrinsics before codegen, but doesn't remove the
codegen handling yet as both DAGCombiner and type legalization can
create these nodes.

vp.fneg and vp.fpext are expanded in lockstep with the fma/fmuladd
intrinsics since some test cases for vfmacc etc. also use these
intrinsics, and mixing dynamic and constant vls causes some of the more
complex patterns to be missed.

The fixed-length VP vfmacc, vfmsac, vfnmacc and vfnmsac tests also need
to replace the EVL of the vp.merge/vp.select with an immediate otherwise
the resulting vmerge.vvm can't be folded into them. This only happens
for fixed vector intrinsics with no passthru, since we end up with a
constant vl from the fixed vector and dynamic vl from the vp.merge that
prevents folding.

As far as I'm aware we don't emit fixed length vp.merges in practice,
since we only emit vp.merge in the loop vectorizer, and we only use it
with EVL tail folding which requires a scalable VF.
This patch fixes 2 problems in lldb-server argument parser:

1. Let's try to start lldb-server with incorrect arguments

```
./lldb-server platform --listen *:1111--server
```
Current behavior
 * lldb-server run in gdbserver mode with port 1111

Expected behavior
 * fail, as `1111–server` is not a number

2. And try to start lldb-server with host:port specification without
colon
```
./lldb-server gdbserver 1111 ./test 
Launched './test' as process 186...
lldb-server-local_build
lldb-server: llvm-project/lldb/source/Host/common/TCPSocket.cpp:245: virtual Status lldb_private::TCPSocket::Listen(llvm::StringRef, int): Assertion `error.Fail()' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ./lldb-server gdbserver 1111 ./test
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  lldb-server     0x0000002ab86d0ca2
1  lldb-server     0x0000002ab86ced06
2  lldb-server     0x0000002ab86d1428
3  linux-vdso.so.1 0x0000003f8e7fd800 __vdso_rt_sigreturn + 0
4  libc.so.6       0x0000003f8e2b264a
5  libc.so.6       0x0000003f8e27b1ac gsignal + 18
6  libc.so.6       0x0000003f8e26c14c abort + 180
7  libc.so.6       0x0000003f8e2760cc
8  libc.so.6       0x0000003f8e27610e __assert_perror_fail + 0
9  lldb-server     0x0000002ab86eb628
10 lldb-server     0x0000002ab86f1010
11 lldb-server     0x0000002ab86eeee0
12 lldb-server     0x0000002ab86eee5c
13 lldb-server     0x0000002ab863ef3a
14 lldb-server     0x0000002ab864067c
15 lldb-server     0x0000002ab86438da
16 libc.so.6       0x0000003f8e26c476
17 libc.so.6       0x0000003f8e26c51e __libc_start_main + 116
18 lldb-server     0x0000002ab863ce64
Aborted
```

We expect to see an error instead of lldb-server crash in this case
…#191359)

Both implementations are currently equivalent. This is likely a leftover
from the past, when `llvm::Optional` existed.
Expose existing trylock internal operation to posix interface.
POSIX.1-2024 only specifies the `EBUSY` error case.

Assisted-by: Codex with gpt-5.4 default fast
…191819)

This change aims to make it easier for MachOPlatform clients to
customize JITDylib MachO headers.

At MachOPlatform construction time clients can now supply a
MachOPlatform::HeaderOptionsBuilder. The supplied callback will be
called by setupJITDylib to create the HeaderOptions for the JITDylib
being set up.

No testcase: Constructing a MachOPlatform instance requires the ORC
runtime, which we can't require for LLVM unit or regression suite tests.
We should look at testing this functionality in the new ORC runtime once
it's ready.
For a loop-nest-generating construct this function returns the number of
loops in the generated loop nest.

A loop-nest-transformation construct can be thought of as replacing N
nested loops with K nested loops, where
  N = GetAffectedNestDepthWithReason
  K = GetGeneratedNestDepthWithReason
This change improves the lifetime safety checker to detect when
constructor parameters escape to class fields and suggest appropriate
`[[clang::lifetimebound]]` annotations.

```cpp
struct A {
  View v;
  A(const MyObj& obj) : v(obj) {} // Now suggests [[clang::lifetimebound]]
};
```
…lvm#191834)

…cess).

6dbf9d1 forward declared the MemoryAccess class in
ExecutorProcessControl.h, breaking some examples that were depending on
the transitive include. (See e.g.
https://lab.llvm.org/buildbot/#/builders/80/builds/21875).

This commit adds the missing #includes to the broken examples.
These are now listed in the asciidoc spec here
https://github.com/riscv/riscv-p-spec

I got some help on this from AI, but I reviewed it. Test cases were
fully generated with AI.
…m#191818)

Add GICv5 `ICH_PPI_HVIR{0,1}_EL2` system registers (Interrupt
Controller PPI Hide Virtual Interrupt Registers). These registers
are added because a hypervisor may want to only expose a subset of the
PPIs to the virtual machine and hide the remaining PPIs.

The only way the hypervisor can do this is by trapping all the PPI ICV
registers which leads to additional code complexity and adds performance
overhead especially for nested virtualization.

These are documented here:

https://developer.arm.com/documentation/111107/latest/AArch64-Registers/ICH-PPI-HVIR-n--EL2--Interrupt-Controller-PPI-Hide-Virtual-Interrupt-Registers
…follow LLVM conventions (llvm#191134)

Follow-up to
#[189948](llvm#189948 (comment)).
Addresses review feedback

Co-authored-by: padivedi <padivedi@amd.com>
…kernels (llvm#191770)

Don't use the L0 heuristics if all the dimensions are specified by the
user code.
…#190026)

Add translation from the MLIR OpenMP depend clause with iterator
modifier to LLVM IR. `buildDependData` (in OpenMPToLLVMIRTranslation)
allocates a single `kmp_depend_info` array sized to hold both locator
(non-iterated) and iterated entries. Locator dependencies use the
existing static path (a vector of `DependData`), while iterated
dependencies use a dynamically-sized path (`DepArray`, `NumDeps`).

The reason both paths are not unified under the dynamic allocation is
that the existing locator path emits actual `kmp_depend_info` entries
inside OMPIRBuilder methods (`createTask`, `createTarget`), whereas the
iterator path must emit the iterator loop in OpenMPToLLVMIRTranslation
(since the convention is to not pass MLIR ops into the OMPIRBuilder).
Unifying them would require modifying existing depend clause tests.

The `OMPIRBuilder::DependenciesInfo` struct is extended to hold either a
`SmallVector<DependData>` (locator path) or a pre-built `{DepArray,
NumDeps}` pair (iterator path). The single-entry `emitTaskDependency`
helper is made public so the translation layer can fill individual
`kmp_depend_info` entries inside the iterator loop body.

This patch is part of the feature work for llvm#188061.

Assisted with copilot.
This is a set of squashed reverts of recent clang doc patches, since its
breaking something on Darwin builders:
https://lab.llvm.org/buildbot/#/builders/23/builds/19172

Revert "[clang-doc][nfc] Default initialize all StringRef members
(llvm#191641)"

This reverts commit 155b9b3.

Revert "[clang-doc] Initialize StringRef members in Info types
(llvm#191637)"

This reverts commit 489dab3.

Revert "[clang-doc] Initialize member variable (llvm#191570)"

This reverts commit 5d64a44.

Revert "[clang-doc] Merge data into persistent memory (llvm#190056)"

This reverts commit 21e0034.

Revert "[clang-doc] Support deep copy between arenas for merging
(llvm#190055)"

This reverts commit c70dae8.
This PR improves native binary generation by avoiding
`llvm::sys::ExecuteAndWait` call for ocloc and instead
leveraging `oclocInvoke()` that consumes an in-memory SPIR-V string.

Co-authored-by: Artem Kroviakov <artem.kroviakov@intel.com>
usx95 and others added 24 commits April 13, 2026 22:53
…s for any rank (llvm#188983)

The fold for `vector.multi_reduction` only handled the rank-1 case with
no reduction dimensions. For higher-rank vectors (e.g.,
`vector<2x3xf32>`) with empty reduction dims `[]`, the fold returned
null, allowing `ElideUnitDimsInMultiDimReduction` to fire incorrectly.
That canonicalization pattern checks that all *reduced* dims have size
1, but with zero reduction dims the check trivially passes, and the
pattern then computes `acc op source` (e.g., `acc + source`) instead of
the correct no-op result (`source`).

This caused `--canonicalize` to produce a different value than
`--lower-vector-multi-reduction` for the same program:

  vector.mask %m { vector.multi_reduction <add>, %src, %src [] :
vector<3x3xi32> to vector<3x3xi32> } : vector<3x3xi1> -> vector<3x3xi32>

  * Without --lower-vector-multi-reduction: `src + src` (e.g., 2)
  * With    --lower-vector-multi-reduction: `src` (e.g., 1)

Fix the fold to return `source` for any rank when `reduction_dims` is
empty. This makes the empty-dims case consistent: the operation is a
noop regardless of rank, and `ElideUnitDimsInMultiDimReduction` no
longer gets a chance to mishandle it.

Fixes llvm#129415

Assisted-by: Claude Code
…llvm#191756)

The inner CONCAT_VECTORS result type was hardcoded to MVT::v8i1, which
is only correct when BitBytes == 1. Otherwise, the inner concat produces
fewer elements than 8, causing an assertion failure:

Assertion `(Ops[0].getValueType().getVectorElementCount() * Ops.size())
  == VT.getVectorElementCount() && "Incorrect element count in vector
  concatenation!"' failed.

Fix by computing the inner vector type dynamically based on BitBytes.
…part 42) (llvm#191751)

Tests converted from test/Lower/Intrinsics: storage_size.f90, sum.f90,
system_clock.f90, trailz.f90, transfer.f90
…lvm#189241)

A new MoveLastSplitAxisPattern class handles the case where the last
grid axis of one tensor dimension is moved to the front of another
tensor dimension's split axes, e.g. [[0, 1], [2]] -> [[0], [1, 2]].

The three bugs fixed are:

1. detectMoveLastSplitAxisInResharding: compared source.back() with
target.back() instead of target.front(), preventing the pattern from
being detected for resharding like [[0,1],[2]] -> [[0],[1,2]].

2. targetShardingInMoveLastAxis: axes were appended with push_back but
should be inserted at the front, producing wrong split_axes order.

3. handlePartialAxesDuringResharding: a copy_if wrote results into the
wrong output variable (addressed structurally by the clean
implementation).

Fixes llvm#136117

Assisted-by: Claude Code
…89000)

When tiling a rank-0 linalg.generic op, tileUsingSCF returns an empty
loops vector (rank-0 ops have no parallel dimensions and produce no
scf.forall). Two call sites unconditionally accessed
tilingResult.loops.front(), causing a crash:

- tileToForallOpImpl: the loop normalization block was entered whenever
mixedNumThreads was empty, regardless of whether any loops exist. Guard
it with \!tilingResult.loops.empty().

- TileUsingForallOp::apply: tileOps.push_back was called
unconditionally. Guard it with \!tilingResult.loops.empty().

Add regression tests for both the tile_sizes and num_threads paths,
verifying that the linalg.generic is preserved and no scf.forall is
emitted.

Fixes llvm#187073

Assisted-by: Claude Code
This patch re-enables unicode tests on Windows by improving the
`Terminal::SupportsUnicode` check.

Checking that the stdout handle is a `FILE_TYPE_CHAR` is a better
heuristic than always returning true, which assumed we were always using
a terminal and never piping the output.
llvm#191835)

an issue reported with this patch
llvm#191241. Revert for now and
reenable later

This reverts commit e71da01.
This PR fixes a crash due to a failed assertion in the `from_python`
implementations of the type casters. The assertion obviously only
triggers if assertions are enabled, which isn't the case for many Python
installations, *and* if a Python capsule of the wrong type is attempted
to be used, so this this isn't triggered easily. The problem is that the
conversion from Python capsules may set the Python error indicator but
the callers of the type casters do not expect that. In fact, if there
are several operloads of a function, the first may cause the error
indicator to be set and the second runs into the assertion. The fix is
to unset the error indicator after a failed capsule conversion, which is
indicated with the return value of the function anyways.

In alternative fix would be to unset the error indicator *inside* the
`mlirPythonCapsuleTo*` functions; however, their documentations does say
that the Python error indicator is set, so I assume that some callers
may *want* to see the indicator and that the responsibility to handle it
is on them.

Signed-off-by: Ingo Müller <ingomueller@google.com>
llvm#191773)

The old name was misleading because this function is not specific to
unary ops

suggested in
llvm#189099 (comment)
…#191493)

Follow-up to llvm#188113 per @erichkeane's feedback: `isFundamentalIntType`
and `isFundamental()` should not disagree.

The previous patch added `!isBitInt()` only inside
`IntType::isFundamental()`, leaving the underlying TableGen predicates
(`CIR_AnyFundamentalIntType` etc.) unaware of `_BitInt`. That meant
`isSignedFundamental()` and `isUnsignedFundamental()` were silently
wrong — a `_BitInt(32)` would pass them.

This patch adds a `CIR_IsNotBitIntPred` to the three fundamental-int
constraint defs so everything stays consistent. `isFundamental()` now
just forwards to `isFundamentalIntType()` with no extra logic.

Includes an `invalid-bitint.cir` test that checks a `_BitInt(32)` is
rejected where a fundamental unsigned int is required.

Made with [Cursor](https://cursor.com)
Add CIR-to-LLVM and classic codegen RUN lines to empty.cpp,
c89-implicit-int.c, expressions.cpp, binop.c, forward-enum.c, and
static-vars.c so each test verifies LLVM IR output from both pipelines.

Made with [Cursor](https://cursor.com)
)

PR llvm#181071 caused regressions on Linux on Arm. These are being tracked
in:
- llvm#191855
- llvm#191859

This PR disables the failing tests for now, to fix the broken buildbot.
MachOPlatform::HeaderOptions now includes an optional UUID field. If
set, this will be used to build an LC_UUID load command for the
JITDylib's MachO header.

No testcase: MachOPlatform construction requires the ORC runtime, which
we can't require in LLVM regression or unit tests. In the future we
should test this through the ORC runtime.
…llvm#191875)

Depending on the case, SLP either misses optimizing re-vectorized runtime
strided loads (and use a gather instead) or produces the incorrect
strided load.
…nclude in `UniqueBBID.h` (llvm#191877)

The modules build of LLVM broke when this patch landed

```
commit 2f422a5
Author: Rahman Lavaee <rahmanl@google.com>
Date:   Fri Apr 10 15:58:16 2026 -0700

    [Codegen, X86] Add prefetch insertion based on Propeller profile (llvm#166324)
```

with an error like:

```
[2026-04-11T10:33:41.699Z] While building module 'LLVM_Utils' imported from /Users/ec2-user/jenkins/workspace/m.org_clang-stage2-Rthinlto_main/llvm-project/llvm/lib/Demangle/Demangle.cpp:13:
[2026-04-11T10:33:41.699Z] In file included from <module-includes>:321:
[2026-04-11T10:33:41.699Z] /Users/ec2-user/jenkins/workspace/m.org_clang-stage2-Rthinlto_main/llvm-project/llvm/include/llvm/Support/UniqueBBID.h:40:3: error: missing '#include "llvm/ADT/StringRef.h"'; 'StringRef' must be declared before it is used
[2026-04-11T10:33:41.699Z]    40 |   StringRef TargetFunction;
[2026-04-11T10:33:41.699Z]       |   ^
[2026-04-11T10:33:41.699Z] /Users/ec2-user/jenkins/workspace/m.org_clang-stage2-Rthinlto_main/llvm-project/llvm/include/llvm/ADT/StringRef.h:55:24: note: declaration here is not visible
[2026-04-11T10:33:41.699Z]    55 | class LLVM_GSL_POINTER StringRef {
[2026-04-11T10:33:41.699Z]       |                        ^
[2026-04-11T10:33:41.699Z] /Users/ec2-user/jenkins/workspace/m.org_clang-stage2-Rthinlto_main/llvm-project/llvm/lib/Demangle/Demangle.cpp:13:10: fatal error: could not build module 'LLVM_Utils'
[2026-04-11T10:33:41.699Z]    13 | #include "llvm/Demangle/Demangle.h"
[2026-04-11T10:33:41.699Z]       |  ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
```


https://ci.swift.org/job/llvm.org/job/clang-stage2-Rthinlto/job/main/150/

This patch tries to fix that by adding the missing include.

rdar://174555346
…t rotations (llvm#157208)

This does exactly what AArch64 does.
…rgets (llvm#187134)

Currently, when `X86_64ABIInfo::classifyRegCallStructTypeImpl`
classifies a struct argument or return value as direct, it leaves the
LLVM IR coerce type unspecified, implicitly relying on
`CodeGenTypes::ConvertType` to eventually construct a default IR type
based on the struct's layout. This conversion is neither stable nor
guaranteed to adhere to the ABI's classification rules.

Instead, rewrite `classifyRegCallStructTypeImpl` to construct an
explicit sequence of coerce types, using the existing field
classification to obtain a coerce type for each member of the struct.
Also, rename the function to `passRegCallStructTypeDirectly` and return
a boolean instead, so that now `classifyRegCallStructType` is the only
place that computes `ABIArgInfo`.

This rewrite also fixes several other issues with the `X86_64ABIInfo`
implementation of `__regcall`:

* Empty structs are now ignored instead of being misclassified as
direct.
* Arrays are now classified specially based on the element type, since
`X86_64ABIInfo::classifyArgumentType` ignores standalone array types.
* SSE registers used for return values are now correctly reused for
arguments, matching the 64-bit Windows behavior.

Since this is an ABI change, it has the potential to cause
incompatibilities with `__regcall` code compiled by earlier versions of
Clang. Specifically:

* Because SSE return registers can now be reused as argument registers,
functions will now pass more floating point arguments in SSE registers.
* `_Complex float` struct fields are now passed in one SSE register
instead of two.

Fixes llvm#62999
Fixes llvm#98635
@z1-cciauto
Copy link
Copy Markdown
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.