Skip to content

merge main into amd-main#2158

Open
z1-cciauto wants to merge 182 commits intoamd-mainfrom
upstream_merge_202604121206
Open

merge main into amd-main#2158
z1-cciauto wants to merge 182 commits intoamd-mainfrom
upstream_merge_202604121206

Conversation

@z1-cciauto
Copy link
Copy Markdown
Collaborator

No description provided.

aengelke and others added 30 commits April 10, 2026 14:29
Apparently required by some older libstdc++ versions.
…dify-Write Sequence, Fix llvm#189183 (llvm#190350)

This patch improves the SystemZ cost model to identify Read-Modify-Write
sequences
 that can be folded into a single instruction (e.g., ASI, NI, OI).
If a load, a scalar arithmetic operation (ADD, SUB, AND, OR, XOR) with
an
 immediate, and a store all target the same memory location and have no
 external uses, the cost of the arithmetic and store insn should bw 0.
This implementation does not include TTI::TCK_RecipThroughput CostKind,
as
 it causes regression in non-power-2-subvector-extract.ll.

Fixes llvm#189183. (Refer it for example)

---------

Co-authored-by: anoopkg6 <anoopkg6@github.com>
Summary:
Naked functions are intended to allow the user to write the entirety of
the function block, so we shouldn't include the `waitcnt` instructions
for them.
…#191208)

This moves the test of whether the iteration variable of an affected DO
loop is marked as threadprivate. This makes the `ordCollapseLevel`
member unnecessary.

Issue: llvm#191249
Added the generate-libc-headers custom target depending on libc-headers.

This allows troubleshooting headers without needing to install them
first.
…vm#191375)

While in this area I also removed unnecessary annotations for wchar_size
and also cleaned up some more function attributes.
…1408)

Failure to read all required fields for msgbuf isn't ObjectFile's fault
but FreeBSD-Kernel-Core plugin specific. Thus this should be logged
through `LLDBLog::Process` rather than `LLDBLog::Object`.

Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
…lvm#186981)

This PR follows suit of the Extensions.md document and provides the same
file for OpenMP API extensions. These have previously been stored in
OpenMPSupport.md. Having a more centralized view and place for these
extensions seems useful.

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
llvm#191289)

Also, update the conformance script to look for closed issues when
searching for unlinked issues.
…ne table coverage in isolation (llvm#183790)

Patch 2 of 3 to add to llvm-dwarfdump the ability to measure DWARF
coverage of local variables in terms of source lines, as discussed in
[this
RFC](https://discourse.llvm.org/t/rfc-debug-info-coverage-tool-v2/83266).

This patch adds the ability to compare a variable’s coverage against a
baseline, e.g. an unoptimised compilation of the same code. This is
provided using the optional `--coverage-baseline` argument.

When a baseline is provided, the output also includes a per-variable
measure of the line table’s coverage (`LT`, `LTRatio`), distinct from
the variable’s coverage proper. See section 2.2 of the RFC for details
on this metric.
Reworked libc/docs/gpu/building.rst to match the style of
getting_started.rst:

* Removed mkdir and cd commands.
* Used -S and -B flags for CMake.
* Used -C flag for Ninja.
* Split commands into smaller blocks with brief explanations.

Use the same terminology as elsewhere in the LLVM libc docs and move
away from the deprecated runtime terms.

* Standard runtimes build -> Bootstrap Build
* Runtimes cross build -> Two-stage Cross-compiler Build
In llvm#178306, I made an incorrect assumption that traversing `allproc` in
reverse direction would give incremental pid order based on the fact
that new processes are added at the head of allproc. However, this
assumption is false under certain circumstance such as reusing pid
number, thus failing to sort threads correctly. Without using any
assumption, explicitly sort threads based on pid retrieved from memory.

Fixes: 5349c66 (llvm#178306)

---------

Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
llvm#191231)

…ties

Some of the utilities may be used in symbol resolution which is before
the expression analysis is done. In such situations, the typedExpr's
normally stored in parser::Expr may not be available. To be able to
obtain the numeric values of expressions, using the analyzer directly
may be necessary, which requires SemanticsContext to be provided.
…m#191098)

The motivation of this PR is to refactor and expose DSO helper functions
so
they can be used by all compiler-rt libraries, including the profile
library,
without duplicating dlopen/dlsym (non-Windows) or
LoadLibrary/GetProcAddress
(Windows) logic in each runtime.

Implement the helpers in namespace __interception in
interception_linux.cpp for
non-Windows targets and interception_win.cpp for Windows, and use them
from the
existing Linux interception path for RTLD_NEXT/RTLD_DEFAULT/dlvsym
lookups.

This is NFC for existing libraries that already use interception's
public APIs;
sanitizer and interception lit behavior is unchanged.
In some cases the use of *-DAG seemed to confuse the update scripts
because of the clash with FileCheck's built-in -DAG suffix.
Specialize linalg.generic to linalg.mmt4d based on index map
…erage (llvm#187368)

We don't need to run the full exhaustive test for all floating points,
as long as we're testing the radix sort code path (which we are, since
radix sort triggers at 1024 elements).

This reduces the test execution time on my machine from 20s to 12s.

Fixes llvm#187329
Fix iterator misuse in four BOLT passes, caught by _GLIBCXX_DEBUG
(enabled via LLVM_ENABLE_EXPENSIVE_CHECKS=ON).

* AllocCombiner: combineAdjustments() erases instructions while
iterating in reverse via llvm::reverse(BB), invalidating the reverse
iterator. Defer erasures to after the loop using a SmallVector.
* ShrinkWrapping: processDeletions() uses
std::prev(BB.eraseInstruction(II)) which is undefined when II ==
begin(). Restructure to standard forward iteration with erase.
* DataflowAnalysis: run() unconditionally dereferences BB->rbegin(),
which crashes on empty basic blocks (possible after the ShrinkWrapping
fix). Guard with an emptiness check.
* IndirectCallPromotion: rewriteCall() dereferences the end iterator via
&(*IndCallBlock.end()). Replace with &IndCallBlock.back().
* TailDuplication: constantAndCopyPropagate() uses
std::prev(OriginalBB.eraseInstruction(Itr)) which is undefined when Itr
== begin(). Restructure to standard forward iteration with erase.
…8271)

Example:

    int foo(int a, int b) { return a - 1 + ~b; }

Before, on AArch64:

    mvn w8, w1
    add w8, w0, w8
    sub w0, w8, #1

After (matches gcc):

    sub w0, w0, w1
    sub w0, w0, #2

Proof: https://alive2.llvm.org/ce/z/g_bV01
…#191413)

Squelch the stage-2 compile time regression introduced by the variadic
m_Combine(And|Or) matchers, by replacing the std::apply on a std::tuple
with a recursive inheritance.
…ORTED for zOS (llvm#190835)

Tests in `llvm/test/Examples` and `llvm/test/ExecutionEngine` use JIT
which is unsupported for zOS causing the tests to fail.

---------

Co-authored-by: Bahareh Farhadi <bahareh.farhadi@ibm.com>
The default inliner policy changed slighlty, which was expected after PR
llvm#190168.
Coro haven't yet been fixed up for profcheck, so new tests are likely to
fail.

mtune.ll exercises loop vectorizer (not fixed)
When a user calls `omp_control_tool`, a tool is attached and it
registered the `ompt_control_tool` callback, the tool should receive a
callback with the users arguments.

However, in llvm#112924, it was discovered that this only happens after at
least one host side directive or runtime call calling into
`__kmp_do_middle_initialize` has been executed.

The check for `__kmp_init_middle` in `FTN_CONTROL_TOOL` did not try to
do the middle initialization and instead always returned `-2` (no tool).
A tool therefore received no callback. The user program did not get the
info that there is a tool attached. To fix this, change the explicit
return to a call of `__kmp_middle_initialize()`, as done in several
other places of `libomp`.

Further handling is then done in `__kmp_control_tool`, where the values
`-2` (no tool), `-1` (no callback), or the tools return value are
returned.

Also expand the tests to introduce checks where no callaback is
registered, or `omp_control_tool` is called before any OpenMP directive.

Fixes llvm#112924

CC @jprotze, @hansangbae

Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
…(NFC) (llvm#191430)

CompilationGraph owns all nodes and edges via `unique_ptr`, but exposes
pointers to the underlying objects. Make them non-movable to maintain
stable addresses.
Make them non-copyable since we don't want to copy `Command` objects
they hold or create duplicate root nodes.

Apply full rule-of-five to `CompilationGraph`.
…m IntegerExpandSetCCOperands. NFC (llvm#191353)

LHSLo and RHSLo must have the same type, we don't need to check both.
Same for LHSHi and RHSHi.
While running in server mode, multiple clients can be connected at the
same time. In LLDBUtils we had a static mutex that can cause other
clients to hang due to the single static lock.

Instead, I adjusted the logic to take the existing SBMutex as a paremter
and guard that mutex during command handling.
aengelke and others added 26 commits April 12, 2026 07:45
…1635)

Consequence of llvm#182526.

With PCH used for unit tests (llvm#191402), this breaks now due to matching:

llvm-build/tools/clang/tools/extra/test/clang-tidy/infrastructure/Output/custom-query-check.cpp.tmp/cqc-main.cpp

with:

llvm-build/tools/clang/tools/extra/clangd/unittests/DecisionForestRuntimeTest.cpp
This was marked as xfail earlier for some .prefalign fixes, but is
unexpectedly passing on AArch64 Premerge CI.

Just mark it unsupported for now to get things back to green.
Implements P3936R1

Closes llvm#189594

# References:

- https://llvm.org/PR162236
- https://wg21.link/p3936r1

---------

Co-authored-by: A. Jiang <de34@live.cn>
Co-authored-by: Hristo Hristov <zingam@outlook.com>
Implements P3953R3

- renamed test files (no changes to the contents but the function
names).

Closes llvm#189624

# References:

- https://llvm.org/PR105394
- https://wg21.link/p2918r2
…widening (llvm#191650)

Replace VPMULLQ/VPMAXQ/VPMINQ + var shift custom patterns
…per (llvm#191648)

Move avx512_unary_lowering so we can avoid manually writing the XMM/YMM->ZMM widening for NonVLX targets

Adds some missing comments for instruction classes as well
…#191660)

The changes are only on 5 lines, but now the entire file is invariant
under clang-format.
Some comments in openmp-utils.cpp became outdated after the code had
changed.
…191634)

SymbolStringPtr comparisons should be more efficient that string
comparisons. Fixes a FIXME.
… stage (llvm#189491)

This makes the scheduler's rematerialization stage use the
target-independent rematerializer. Previously duplicate logic is
deleted, and restrictions are put in place in the stage so that the same
constraints as before apply on rematerializable registers (as the
rematerializer is able to expose many more rematerialization
opportunities than what the stage can track at the moment). Consequently
it is not expected that this change improves performance overall, but it
is a first step toward being able to use the rematerializer's more
advanced capabilities during scheduling.

This is *not* a NFC for 2 reasons.

- Score equalities between two rematerialization candidates with
otherwise equivalent score are decided by their corresponding register's
index handle in the rematerializer (previously the pointer to their
state object's value). This is determined by the rematerializer's
register collection order, which is different from the stage's old
register collection order. This is the cause of all test changes but
one, and should not be detrimental to performance in real cases.
- To support rollback, the stage now uses the rematerializer's rollback
listener instead of its previous ad-hoc method (setting the opcode of
rematerialized MIs to a DBG_VALUE, and their registers to the sentinel).
This is the source of test changes in
`machine-scheduler-sink-trivial-remats-debug.mir`. The new rollback
mechanism completely removes the behavior tested by
`misched-remat-revert.ll` so the test is deleted.

---------

Co-authored-by: Shilei Tian <i@tianshilei.me>
…lvm#189574)

Implements P4052R0.

Also renames:
- the internal names for consistency.
- test files (no changes to the contents but the function names).

Fixes: llvm#189589

---------

Co-authored-by: A. Jiang <de34@live.cn>
Pass Instruction::Load instead of Instruction::GetElementPtr to
getGEPCosts in isMaskedLoadCompress and CheckForShuffledLoads.
These call sites estimate costs for wide contiguous loads and sub-vector
load patterns, not for masked gather pointer vector formation. Using
Instruction::GetElementPtr incorrectly triggered the gather-style cost
path, which computes vector GEP formation costs. Since the call sites
already add scalarization overhead for pointer vector building
separately, this led to double-counting of pointer costs and inaccurate
vectorization decisions.

Reviewers: hiraditya, RKSimon

Pull Request: llvm#191620
…lization stage (llvm#189491)"

This reverts commit be62f27, it breaks
the compilation!!!

Reviewers: 

Pull Request: llvm#191717
…lvm#185028)

This is an alternative approach to
llvm#169769.

We increase the size of the old `Integral<Bits, Signed>` to 24 bytes (on
a 64 bit system) and introduce a new `Char<Signed>` that's used for the
old `PT_Sint8` and `PT_Uint8` primitive types.

The old approach did not work out in the end because we need to be able
to do arithmetic (but essentially just `+` and `-`) on the offsets of
such integers-that-are-actually-pointers.

c-t-t-:

https://llvm-compile-time-tracker.com/compare.php?from=723d5cb11b2a64e4f11032f24967702e52f822bc&to=16dc90efebbf52e381c7655131b2fb74c307cc42&stat=instructions:u
…t non-copyable in another

When a value is treated as a copyable element in one tree entry and as a
non-copyable element in another, both feeding into PHI nodes, the
scheduler could produce vectorized IR where an instruction does not
dominate all its uses. Bail out of scheduling in tryScheduleBundle when
this conflict is detected to prevent generating broken modules.
Fixes llvm#191714

Reviewers: 

Pull Request: llvm#191724
@z1-cciauto z1-cciauto requested a review from a team April 12, 2026 16:06
@z1-cciauto
Copy link
Copy Markdown
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.