Skip to content

merge main into amd-main#2147

Open
z1-cciauto wants to merge 143 commits intoamd-mainfrom
upstream_merge_202604111206
Open

merge main into amd-main#2147
z1-cciauto wants to merge 143 commits intoamd-mainfrom
upstream_merge_202604111206

Conversation

@z1-cciauto
Copy link
Copy Markdown
Collaborator

No description provided.

aengelke and others added 30 commits April 10, 2026 14:29
Apparently required by some older libstdc++ versions.
…dify-Write Sequence, Fix llvm#189183 (llvm#190350)

This patch improves the SystemZ cost model to identify Read-Modify-Write
sequences
 that can be folded into a single instruction (e.g., ASI, NI, OI).
If a load, a scalar arithmetic operation (ADD, SUB, AND, OR, XOR) with
an
 immediate, and a store all target the same memory location and have no
 external uses, the cost of the arithmetic and store insn should bw 0.
This implementation does not include TTI::TCK_RecipThroughput CostKind,
as
 it causes regression in non-power-2-subvector-extract.ll.

Fixes llvm#189183. (Refer it for example)

---------

Co-authored-by: anoopkg6 <anoopkg6@github.com>
Summary:
Naked functions are intended to allow the user to write the entirety of
the function block, so we shouldn't include the `waitcnt` instructions
for them.
…#191208)

This moves the test of whether the iteration variable of an affected DO
loop is marked as threadprivate. This makes the `ordCollapseLevel`
member unnecessary.

Issue: llvm#191249
Added the generate-libc-headers custom target depending on libc-headers.

This allows troubleshooting headers without needing to install them
first.
…vm#191375)

While in this area I also removed unnecessary annotations for wchar_size
and also cleaned up some more function attributes.
…1408)

Failure to read all required fields for msgbuf isn't ObjectFile's fault
but FreeBSD-Kernel-Core plugin specific. Thus this should be logged
through `LLDBLog::Process` rather than `LLDBLog::Object`.

Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
…lvm#186981)

This PR follows suit of the Extensions.md document and provides the same
file for OpenMP API extensions. These have previously been stored in
OpenMPSupport.md. Having a more centralized view and place for these
extensions seems useful.

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
llvm#191289)

Also, update the conformance script to look for closed issues when
searching for unlinked issues.
…ne table coverage in isolation (llvm#183790)

Patch 2 of 3 to add to llvm-dwarfdump the ability to measure DWARF
coverage of local variables in terms of source lines, as discussed in
[this
RFC](https://discourse.llvm.org/t/rfc-debug-info-coverage-tool-v2/83266).

This patch adds the ability to compare a variable’s coverage against a
baseline, e.g. an unoptimised compilation of the same code. This is
provided using the optional `--coverage-baseline` argument.

When a baseline is provided, the output also includes a per-variable
measure of the line table’s coverage (`LT`, `LTRatio`), distinct from
the variable’s coverage proper. See section 2.2 of the RFC for details
on this metric.
Reworked libc/docs/gpu/building.rst to match the style of
getting_started.rst:

* Removed mkdir and cd commands.
* Used -S and -B flags for CMake.
* Used -C flag for Ninja.
* Split commands into smaller blocks with brief explanations.

Use the same terminology as elsewhere in the LLVM libc docs and move
away from the deprecated runtime terms.

* Standard runtimes build -> Bootstrap Build
* Runtimes cross build -> Two-stage Cross-compiler Build
In llvm#178306, I made an incorrect assumption that traversing `allproc` in
reverse direction would give incremental pid order based on the fact
that new processes are added at the head of allproc. However, this
assumption is false under certain circumstance such as reusing pid
number, thus failing to sort threads correctly. Without using any
assumption, explicitly sort threads based on pid retrieved from memory.

Fixes: 5349c66 (llvm#178306)

---------

Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
llvm#191231)

…ties

Some of the utilities may be used in symbol resolution which is before
the expression analysis is done. In such situations, the typedExpr's
normally stored in parser::Expr may not be available. To be able to
obtain the numeric values of expressions, using the analyzer directly
may be necessary, which requires SemanticsContext to be provided.
…m#191098)

The motivation of this PR is to refactor and expose DSO helper functions
so
they can be used by all compiler-rt libraries, including the profile
library,
without duplicating dlopen/dlsym (non-Windows) or
LoadLibrary/GetProcAddress
(Windows) logic in each runtime.

Implement the helpers in namespace __interception in
interception_linux.cpp for
non-Windows targets and interception_win.cpp for Windows, and use them
from the
existing Linux interception path for RTLD_NEXT/RTLD_DEFAULT/dlvsym
lookups.

This is NFC for existing libraries that already use interception's
public APIs;
sanitizer and interception lit behavior is unchanged.
In some cases the use of *-DAG seemed to confuse the update scripts
because of the clash with FileCheck's built-in -DAG suffix.
Specialize linalg.generic to linalg.mmt4d based on index map
…erage (llvm#187368)

We don't need to run the full exhaustive test for all floating points,
as long as we're testing the radix sort code path (which we are, since
radix sort triggers at 1024 elements).

This reduces the test execution time on my machine from 20s to 12s.

Fixes llvm#187329
Fix iterator misuse in four BOLT passes, caught by _GLIBCXX_DEBUG
(enabled via LLVM_ENABLE_EXPENSIVE_CHECKS=ON).

* AllocCombiner: combineAdjustments() erases instructions while
iterating in reverse via llvm::reverse(BB), invalidating the reverse
iterator. Defer erasures to after the loop using a SmallVector.
* ShrinkWrapping: processDeletions() uses
std::prev(BB.eraseInstruction(II)) which is undefined when II ==
begin(). Restructure to standard forward iteration with erase.
* DataflowAnalysis: run() unconditionally dereferences BB->rbegin(),
which crashes on empty basic blocks (possible after the ShrinkWrapping
fix). Guard with an emptiness check.
* IndirectCallPromotion: rewriteCall() dereferences the end iterator via
&(*IndCallBlock.end()). Replace with &IndCallBlock.back().
* TailDuplication: constantAndCopyPropagate() uses
std::prev(OriginalBB.eraseInstruction(Itr)) which is undefined when Itr
== begin(). Restructure to standard forward iteration with erase.
…8271)

Example:

    int foo(int a, int b) { return a - 1 + ~b; }

Before, on AArch64:

    mvn w8, w1
    add w8, w0, w8
    sub w0, w8, #1

After (matches gcc):

    sub w0, w0, w1
    sub w0, w0, #2

Proof: https://alive2.llvm.org/ce/z/g_bV01
…#191413)

Squelch the stage-2 compile time regression introduced by the variadic
m_Combine(And|Or) matchers, by replacing the std::apply on a std::tuple
with a recursive inheritance.
…ORTED for zOS (llvm#190835)

Tests in `llvm/test/Examples` and `llvm/test/ExecutionEngine` use JIT
which is unsupported for zOS causing the tests to fail.

---------

Co-authored-by: Bahareh Farhadi <bahareh.farhadi@ibm.com>
The default inliner policy changed slighlty, which was expected after PR
llvm#190168.
Coro haven't yet been fixed up for profcheck, so new tests are likely to
fail.

mtune.ll exercises loop vectorizer (not fixed)
When a user calls `omp_control_tool`, a tool is attached and it
registered the `ompt_control_tool` callback, the tool should receive a
callback with the users arguments.

However, in llvm#112924, it was discovered that this only happens after at
least one host side directive or runtime call calling into
`__kmp_do_middle_initialize` has been executed.

The check for `__kmp_init_middle` in `FTN_CONTROL_TOOL` did not try to
do the middle initialization and instead always returned `-2` (no tool).
A tool therefore received no callback. The user program did not get the
info that there is a tool attached. To fix this, change the explicit
return to a call of `__kmp_middle_initialize()`, as done in several
other places of `libomp`.

Further handling is then done in `__kmp_control_tool`, where the values
`-2` (no tool), `-1` (no callback), or the tools return value are
returned.

Also expand the tests to introduce checks where no callaback is
registered, or `omp_control_tool` is called before any OpenMP directive.

Fixes llvm#112924

CC @jprotze, @hansangbae

Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
…(NFC) (llvm#191430)

CompilationGraph owns all nodes and edges via `unique_ptr`, but exposes
pointers to the underlying objects. Make them non-movable to maintain
stable addresses.
Make them non-copyable since we don't want to copy `Command` objects
they hold or create duplicate root nodes.

Apply full rule-of-five to `CompilationGraph`.
…m IntegerExpandSetCCOperands. NFC (llvm#191353)

LHSLo and RHSLo must have the same type, we don't need to check both.
Same for LHSHi and RHSHi.
While running in server mode, multiple clients can be connected at the
same time. In LLDBUtils we had a static mutex that can cause other
clients to hang due to the single static lock.

Instead, I adjusted the logic to take the existing SBMutex as a paremter
and guard that mutex during command handling.
MaskRay and others added 26 commits April 10, 2026 21:21
…vm#191591)

Reverts llvm#191550

Merged without understanding getImplicitAddend and test convention, and
less than 4 hours after a colleague rubber stamping with "I am not ELF
or linker expert but to me looks good."
…eption specs (llvm#190593)

Functions whose exception spec has not yet been evaluated have no body
in the AST. Because the compiler does not generate call sites for these
functions before evaluating their spec, they cannot propagate
exceptions.

Closes llvm#188730
…m#191596)

Now that MCAsmInfo stores the MCTargetOptions pointer (set by
TargetRegistry::createMCAsmInfo llvm#180464), MCContext can retrieve it via
MCAsmInfo. Remove the redundant MCTargetOptions parameter from the
MCContext constructor and update all callers.
…lvm#184032)

https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019/17
The recently-introduced .prefalign only worked when each function was in
its own section (-ffunction-sections), because the section size gave the
function body size needed for the alignment rule.

This led to -ffunction-sections and -fno-function-sections AsmPrinter
differences (llvm#155529), which is rather unusual.

This patch fixes this AsmPrinter difference by extending .prefalign to
accept an end symbol and a required fill operand:

    .prefalign <log2_align>, <end_sym>, nop
    .prefalign <log2_align>, <end_sym>, <fill_byte>

The first operand is a log2 alignment value (e.g. 4 means 16-byte
alignment). The body size (end_sym_offset - start_offset) determines the
alignment:

    body_size < pref_align   => ComputedAlign = std::bit_ceil(body_size)
    body_size >= pref_align  => ComputedAlign = pref_align

To also enforce a minimum alignment, emit a .p2align before .prefalign.

The fill operand is required: `nop` generates target-appropriate NOP
instructions via writeNopData, while an integer in [0,255] fills the
padding with that byte value.

Initialize MCSection::CurFragList to nullptr and add a null check
to skip ELFObjectWriter-created sections like .strtab/.symtab
that never receive changeSection calls.

relaxPrefAlign is called in both layoutSection and relaxFragment.
The layoutSection call ensures correct initial padding before
relaxOnce, and is also needed for the post-finishLayout re-layout
where relaxOnce is not used. relaxPrefAlign walks forward to the
end symbol to compute BodySize (summing fragment sizes), avoiding
dependence on stale downstream symbol offsets.
…mpile jobs (llvm#191610)

In `createClangModulePrecompileJob`, the `PrependArg` parameter was not
being passed for the newly created Clang module precompile job.
This causes failures for setups where the clang executable is a wrapper
(e.g., the llvm-driver wrapper).

See
llvm#191258 (comment)
The test has been failing flakily for a while; see PRs llvm#170911, llvm#171469,
llvm#188441.

Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>
This removes fixes implemented in
llvm@ea8c637,
llvm@4a58116,
and
llvm@2b957ed.
We don't need them anymore after llvm#130374.

---

A little (unfortunate) winding history, mostly for my mental bookeeping.
Read the below only if you are curious:

There is a function called `findUnwindDestinations` in
`SelectionDAGBuilder.cpp`.

https://github.com/llvm/llvm-project/blob/c94f79886035a61bb5f3dc992f75fe0c08bdcd4b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp#L2107-L2164

This function adds unwind successors to BBs with `invoke`s. In case of
Itanium EH, you only add one `landingpad` BB. In WinEH, `catchswitch`
may not catch an exception, so you add all possible unwind
destionations. For example,
```ll
entry:
  invoke void @foo()
          to label %try.cont unwind label %catch.dispatch

catch.dispatch:
  %0 = catchswitch within none [label %catch.start] unwind label %catch.dispatch1

catch.start:
  ...

catch.dispatch1:
  %7 = catchswitch within none [label %catch.start1] unwind to caller

catch.start1:
  ...
```
`catchswitch` BBs are removed in iSel. So in this case, both
`catch.start` and `catch.start1` BBs are added as unwind successors to
`entry`, because an exception may not be caught by `catch.dispatch` and
unwind further to `catch.dispatch1`.

In the beginning of 2019, I added our own `findWasmUnwindDestinations`
in
llvm@d6f4878.
This was when I was implementing [the V2 (pre-legacy)
proposal,](https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/pre-legacy/Exceptions-v2.md)
which had `exnref` and `try`-`catch_all` (It was named `catch`, but
semantically it was `catch_all`) The rationale was, even though we were
using WinEH, we only had one catchpad and `catch` caught everything. So
I figured adding only the first catchpad successor, `catch.start` in the
example above, would simpify things.

By the end of 2020, we changed the proposal to [the V3 (legacy)
proposal](https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/legacy/Exceptions.md),
which removed `exnref` and introduced separate `catch` and `catch_all`
instructions. The previous invariant "`catch` always catches everything"
didn't hold anymore, but I left `findWasmUnwindDestinations` as was with
some updated comments in
llvm@9e4eade.
The comments could be summed up as "there will always be an `invoke`
instruction in the first catchpad that unwinds to the next unwind
destination. (which later turned out to be false)

And in 2021, I tweaked the ExceptionInfo algorithm to fix exception
grouping
(llvm@ea8c637,
llvm@4a58116,
and
llvm@2b957ed)
The bug was, in tl;dr: "Your next unwind destination can be
(accidentally) dominated by your current catchpad, making your unwind
destination a subexception of the current exception). For example:
```cpp
try {
  try {
    foo();
  } catch (int) { // EH pad
    ...
  }
} catch (...) {   // unwind destination
}
```
Here the outer `catch` is (accidentally) dominated by the inner `catch`,
because we only added the first catchpad (inner `catch`) as an unwind
successor of `foo()` BB, and hoped that some `invoke`s within the inner
`catch` to unwind it to the outer `catch`. But this caused us to
`delegate` to a middle of an inner scope. So I tweaked the algorithm to
take the outer `catch` out to form a separate exception. I didn't
realize `findWasmUnwindDestinations` was actually the source of problem
then.

Fast forward to 2025. The 2020 assumption of "There will always be an
`invoke` instruction in the first catchpad" turned out to be false. So I
just removed `findWasmUnwindDestinations` and switched to use the common
`findUnwindDestinations` in llvm#130374, which recently accidentally
discovered another bug (llvm#187302).

While investigating llvm#187302, I realized we don't need those tweaks in
WebAssemblyExceptionInfo anymore, because `findUnwindDestinations` adds
all unwind destinations as successors. (llvm#187302 is actually not related
to this; it was just a trigger to investigate things) So in case of the
little C++ example above, the outer `catch` BB will also be added as an
unwind successor of the `foo()` BB. I actually think we may not even
need WebAssemblyExceptionInfo analysis at all if we only use [the latest
standard (exnref)
proposal](https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md).
But we still need to keep the legacy support, so we need it for now.
`gfx90a` added a set of MFMA instructions that are not available on
prior GFXIPs. The Clang builtins for these were requiring the
`mai-insts` feature, which is incorrect (`gfx908` supports this and does
not support the added MFMAs). This led to opaque bugs where we'd check
with `__has_builtin` for the availability of the builtin, target 908,
and get an ISEL failure.
…vm#183990)

This makes the test `fold_left` and `fold_left_with_iter` with and
without telemetrics similar to what we do in `check_iterator`.
Previously, getValueType() always returned the compared operand type
(e.g. i32) for CmpInst, which was incorrect for gather cost estimation
and codegen where the result type (i1) is needed. This caused ad-hoc
fixups scattered across getEntryCost, calculateTreeCostAndTrimNonProfitable,
and vectorizeTree that overrode ScalarTy back to i1 for CmpInsts.
Add a LookThroughCmp parameter to getValueType() (default: false) so
callers that need the operand type for vector width calculations can
explicitly opt in. This removes the need for the scattered CmpInst
special cases:
- getEntryCost gather path: remove `if (isa<CmpInst>) ScalarTy = i1`
- calculateTreeCostAndTrimNonProfitable: remove same override
- vectorizeTree: simplify `if (!isa<CmpInst>) ScalarTy = getValueType(V)`
  to just `getValueType(V)`
For the ICmp/FCmp cost case in getEntryCost, add a fallthrough from
ICmp/FCmp to Select that overrides ScalarTy with the compared operand
type via getValueType(VL0, true), since getCmpSelInstrCost expects the
compared type as its first argument. Fix the condition type argument
passed to getCmpSelInstrCost for both scalar and vector paths: use the
actual condition/result type instead of always Builder.getInt1Ty().

Reviewers: hiraditya, RKSimon

Pull Request: llvm#190618
…calar

The LLVM cost model uses integer-valued throughput costs which cannot
represent fractional costs. For 2-element vectors, this rounding can
make vectorization appear profitable when it actually produces more
instructions than the scalar code — the overhead from shuffles, inserts,
extracts, and buildvectors is underestimated.
Add an instruction-count safety check in getTreeCost that estimates
the number of vector instructions (including gathers, shuffles, and
extracts) and compares against the number of scalar instructions.
If the vector code would produce more instructions, reject the tree
regardless of what the cost model says. This catches cases where
fractional cost rounding hides real overhead.

The check is gated behind -slp-inst-count-check (default: on) and
only applies to 2-element root trees where rounding errors matter most.

Reviewers: hiraditya, bababuck, RKSimon

Pull Request: llvm#190414
When SLPReVec is enabled, getValueType returns the vector result type
for InsertElement instructions rather than the scalar element type. This
caused getEntryCost to propagate an incorrect ScalarTy (e.g. <4 x float>
instead of float) into getScalarizationOverhead and getWidenedType,
triggering an assertion failure and producing wrong cost estimates.
Narrow ScalarTy to its element type when costing vectorized
InsertElement entries whose inserted operands are scalars.
Fixes llvm#191175.

Reviewers: 

Pull Request: llvm#191628
Fixes:
```
warning: format specifies type 'long' but the argument has type 'intptr_t' ...
```
…91299)

After llvm#189372 both minimum
iteration checks for epilogue vectorization are created in VPlan, which
removes the last blocker for unconditionally running
materializeConstantVectorTripCount. This enables additional folds for
plans in the native path, as well as removes some trip count
computations with epilogue vectorization.

PR: llvm#191299
…#191498)

NSSW/NUSW on a wider AddRec does not imply NSSW/NUSW on a narrower
AddRec.

Fixes llvm#191382.
The output currently contains
```
            "unicode32"
            'u' or "unsigned decimal"
            'p' or
            "pointer"
            "char[]"
            "int8_t[]"
```
The 'p' and "pointer" are supposed to appear on the same line. When
we're about to print "pointer," we check whether it would exceed the
column limit (in which case, we insert a line feed). This check only
checks for spaces as separators, but in this case, "words" may be
separated by newlines as well. Look for them too.
…n (NFC) (llvm#189489)

This NFC prepares the scheduler's rematerialization stage for
integration with the target-independent rematerializer. It brings
various small design changes and optimizations to the stage's internal
state to make the not-exactly-NFC rematerializer integration as small as
possible.

The main changes are, in no particular order:

- Sort and pick useful rematerialization candidates by their index in
the vector of candidates instead of directly sorting objects within the
candidate vector. This reduces the amount of data movement and
simplifies the candidate selection logic.
- Move some data members from `PreRARematStage::RematReg` to
`PreRARematStage::ScoredRemat`. This makes the former a simplified
version of the rematerializer's own internal register representation
(`Rematerializer::Reg`), which can be cleanly deleted during
integration.
- Remove an inferable argument to `modifyRegionSchedule`. This allows
the stage to stop tracking the parent block of each region.
- Use a boolean (`RevertAllRegions`) to track scheduling revert decision
post rematerialization instead of clearing `RescheduleRegions`. This
allows to avoid re-computing the latter during rollback.
- Estimate usefulness of rematerialization from `GCNRegPressure` instead
of from `Register` (requires adding a new method variant in
`GCNRPTarget`).
We had a report of some assertion failures in

llvm#190054 (comment),
and some msan failures in
llvm#190056.

These appear to be due to default constructed StringRef's being used in
some cases. To address, we can provide default initializers that should
prevent such cases from causing further problems.
@z1-cciauto z1-cciauto requested a review from a team April 11, 2026 16:07
@z1-cciauto
Copy link
Copy Markdown
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.