merge main into amd-main by z1-cciauto · Pull Request #2163 · ROCm/llvm-project

z1-cciauto · 2026-04-13T04:08:36Z

No description provided.

Apparently required by some older libstdc++ versions.

…dify-Write Sequence, Fix llvm#189183 (llvm#190350) This patch improves the SystemZ cost model to identify Read-Modify-Write sequences that can be folded into a single instruction (e.g., ASI, NI, OI). If a load, a scalar arithmetic operation (ADD, SUB, AND, OR, XOR) with an immediate, and a store all target the same memory location and have no external uses, the cost of the arithmetic and store insn should bw 0. This implementation does not include TTI::TCK_RecipThroughput CostKind, as it causes regression in non-power-2-subvector-extract.ll. Fixes llvm#189183. (Refer it for example) --------- Co-authored-by: anoopkg6 <anoopkg6@github.com>

Summary: Naked functions are intended to allow the user to write the entirety of the function block, so we shouldn't include the `waitcnt` instructions for them.

…#191208) This moves the test of whether the iteration variable of an affected DO loop is marked as threadprivate. This makes the `ordCollapseLevel` member unnecessary. Issue: llvm#191249

Added the generate-libc-headers custom target depending on libc-headers. This allows troubleshooting headers without needing to install them first.

…vm#191375) While in this area I also removed unnecessary annotations for wchar_size and also cleaned up some more function attributes.

…1408) Failure to read all required fields for msgbuf isn't ObjectFile's fault but FreeBSD-Kernel-Core plugin specific. Thus this should be logged through `LLDBLog::Process` rather than `LLDBLog::Object`. Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>

…lvm#186981) This PR follows suit of the Extensions.md document and provides the same file for OpenMP API extensions. These have previously been stored in OpenMPSupport.md. Having a more centralized view and place for these extensions seems useful. --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

llvm#191289) Also, update the conformance script to look for closed issues when searching for unlinked issues.

…ne table coverage in isolation (llvm#183790) Patch 2 of 3 to add to llvm-dwarfdump the ability to measure DWARF coverage of local variables in terms of source lines, as discussed in [this RFC](https://discourse.llvm.org/t/rfc-debug-info-coverage-tool-v2/83266). This patch adds the ability to compare a variable’s coverage against a baseline, e.g. an unoptimised compilation of the same code. This is provided using the optional `--coverage-baseline` argument. When a baseline is provided, the output also includes a per-variable measure of the line table’s coverage (`LT`, `LTRatio`), distinct from the variable’s coverage proper. See section 2.2 of the RFC for details on this metric.

Reworked libc/docs/gpu/building.rst to match the style of getting_started.rst: * Removed mkdir and cd commands. * Used -S and -B flags for CMake. * Used -C flag for Ninja. * Split commands into smaller blocks with brief explanations. Use the same terminology as elsewhere in the LLVM libc docs and move away from the deprecated runtime terms. * Standard runtimes build -> Bootstrap Build * Runtimes cross build -> Two-stage Cross-compiler Build

In llvm#178306, I made an incorrect assumption that traversing `allproc` in reverse direction would give incremental pid order based on the fact that new processes are added at the head of allproc. However, this assumption is false under certain circumstance such as reusing pid number, thus failing to sort threads correctly. Without using any assumption, explicitly sort threads based on pid retrieved from memory. Fixes: 5349c66 (llvm#178306) --------- Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>

llvm#191231) …ties Some of the utilities may be used in symbol resolution which is before the expression analysis is done. In such situations, the typedExpr's normally stored in parser::Expr may not be available. To be able to obtain the numeric values of expressions, using the analyzer directly may be necessary, which requires SemanticsContext to be provided.

…m#191098) The motivation of this PR is to refactor and expose DSO helper functions so they can be used by all compiler-rt libraries, including the profile library, without duplicating dlopen/dlsym (non-Windows) or LoadLibrary/GetProcAddress (Windows) logic in each runtime. Implement the helpers in namespace __interception in interception_linux.cpp for non-Windows targets and interception_win.cpp for Windows, and use them from the existing Linux interception path for RTLD_NEXT/RTLD_DEFAULT/dlvsym lookups. This is NFC for existing libraries that already use interception's public APIs; sanitizer and interception lit behavior is unchanged.

In some cases the use of *-DAG seemed to confuse the update scripts because of the clash with FileCheck's built-in -DAG suffix.

…llvm#191397)

Fixed 5531990.

Specialize linalg.generic to linalg.mmt4d based on index map

…erage (llvm#187368) We don't need to run the full exhaustive test for all floating points, as long as we're testing the radix sort code path (which we are, since radix sort triggers at 1024 elements). This reduces the test execution time on my machine from 20s to 12s. Fixes llvm#187329

Fix iterator misuse in four BOLT passes, caught by _GLIBCXX_DEBUG (enabled via LLVM_ENABLE_EXPENSIVE_CHECKS=ON). * AllocCombiner: combineAdjustments() erases instructions while iterating in reverse via llvm::reverse(BB), invalidating the reverse iterator. Defer erasures to after the loop using a SmallVector. * ShrinkWrapping: processDeletions() uses std::prev(BB.eraseInstruction(II)) which is undefined when II == begin(). Restructure to standard forward iteration with erase. * DataflowAnalysis: run() unconditionally dereferences BB->rbegin(), which crashes on empty basic blocks (possible after the ShrinkWrapping fix). Guard with an emptiness check. * IndirectCallPromotion: rewriteCall() dereferences the end iterator via &(*IndCallBlock.end()). Replace with &IndCallBlock.back(). * TailDuplication: constantAndCopyPropagate() uses std::prev(OriginalBB.eraseInstruction(Itr)) which is undefined when Itr == begin(). Restructure to standard forward iteration with erase.

…8271) Example: int foo(int a, int b) { return a - 1 + ~b; } Before, on AArch64: mvn w8, w1 add w8, w0, w8 sub w0, w8, #1 After (matches gcc): sub w0, w0, w1 sub w0, w0, #2 Proof: https://alive2.llvm.org/ce/z/g_bV01

…#191413) Squelch the stage-2 compile time regression introduced by the variadic m_Combine(And|Or) matchers, by replacing the std::apply on a std::tuple with a recursive inheritance.

…ORTED for zOS (llvm#190835) Tests in `llvm/test/Examples` and `llvm/test/ExecutionEngine` use JIT which is unsupported for zOS causing the tests to fail. --------- Co-authored-by: Bahareh Farhadi <bahareh.farhadi@ibm.com>

The default inliner policy changed slighlty, which was expected after PR llvm#190168.

Coro haven't yet been fixed up for profcheck, so new tests are likely to fail. mtune.ll exercises loop vectorizer (not fixed)

@jprotze

When a user calls `omp_control_tool`, a tool is attached and it registered the `ompt_control_tool` callback, the tool should receive a callback with the users arguments. However, in llvm#112924, it was discovered that this only happens after at least one host side directive or runtime call calling into `__kmp_do_middle_initialize` has been executed. The check for `__kmp_init_middle` in `FTN_CONTROL_TOOL` did not try to do the middle initialization and instead always returned `-2` (no tool). A tool therefore received no callback. The user program did not get the info that there is a tool attached. To fix this, change the explicit return to a call of `__kmp_middle_initialize()`, as done in several other places of `libomp`. Further handling is then done in `__kmp_control_tool`, where the values `-2` (no tool), `-1` (no callback), or the tools return value are returned. Also expand the tests to introduce checks where no callaback is registered, or `omp_control_tool` is called before any OpenMP directive. Fixes llvm#112924 CC @jprotze, @hansangbae Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>

…(NFC) (llvm#191430) CompilationGraph owns all nodes and edges via `unique_ptr`, but exposes pointers to the underlying objects. Make them non-movable to maintain stable addresses. Make them non-copyable since we don't want to copy `Command` objects they hold or create duplicate root nodes. Apply full rule-of-five to `CompilationGraph`.

…ion (NFC) (llvm#191441)

…m IntegerExpandSetCCOperands. NFC (llvm#191353) LHSLo and RHSLo must have the same type, we don't need to check both. Same for LHSHi and RHSHi.

While running in server mode, multiple clients can be connected at the same time. In LLDBUtils we had a static mutex that can cause other clients to hang due to the single static lock. Instead, I adjusted the logic to take the existing SBMutex as a paremter and guard that mutex during command handling.

…-class types (llvm#155169) Fixes llvm#104948 # References - https://wg21.link/range.iota.view - https://wg21.link/range.iota.view#17 - https://wg21.link/LWG3610 --------- Co-authored-by: A. Jiang <de34@live.cn>

Pass Instruction::Load instead of Instruction::GetElementPtr to getGEPCosts in isMaskedLoadCompress and CheckForShuffledLoads. These call sites estimate costs for wide contiguous loads and sub-vector load patterns, not for masked gather pointer vector formation. Using Instruction::GetElementPtr incorrectly triggered the gather-style cost path, which computes vector GEP formation costs. Since the call sites already add scalarization overhead for pointer vector building separately, this led to double-counting of pointer costs and inaccurate vectorization decisions. Reviewers: hiraditya, RKSimon Pull Request: llvm#191620

…lization stage (llvm#189491)" This reverts commit be62f27, it breaks the compilation!!! Reviewers: Pull Request: llvm#191717

…lvm#185028) This is an alternative approach to llvm#169769. We increase the size of the old `Integral<Bits, Signed>` to 24 bytes (on a 64 bit system) and introduce a new `Char<Signed>` that's used for the old `PT_Sint8` and `PT_Uint8` primitive types. The old approach did not work out in the end because we need to be able to do arithmetic (but essentially just `+` and `-`) on the offsets of such integers-that-are-actually-pointers. c-t-t-: https://llvm-compile-time-tracker.com/compare.php?from=723d5cb11b2a64e4f11032f24967702e52f822bc&to=16dc90efebbf52e381c7655131b2fb74c307cc42&stat=instructions:u

…91608)

…t non-copyable in another When a value is treated as a copyable element in one tree entry and as a non-copyable element in another, both feeding into PHI nodes, the scheduler could produce vectorized IR where an instruction does not dominate all its uses. Bail out of scheduling in tryScheduleBundle when this conflict is detected to prevent generating broken modules. Fixes llvm#191714 Reviewers: Pull Request: llvm#191724

…timates" This reverts commit 778c0fb to fix buildbots https://lab.llvm.org/buildbot/#/builders/213/builds/2725, https://lab.llvm.org/buildbot/#/builders/212/builds/2876 Reviewers: Pull Request: llvm#191725

fix : llvm#187648 Fix the missed optimization for `icmp ugt (umax(x, C)), ~x` and `icmp ult (umax(x, C)), ~x` Alive2 proof: https://alive2.llvm.org/ce/z/dDNJ2m https://alive2.llvm.org/ce/z/X633UX

…m#191715) Similar to the DylibManager change in e55fb5d, this removes an unnecessary coupling between ExecutorProcessControl and MemoryAccess, allowing clients to select MemoryAccess implementations independently. To simplify the transition, the ExecutorProcessControl::createDefaultMemoryAccess method will return an instance of whatever MemoryAccess the ExecutorProcessControl implementation had been using previously.

Pass Instruction::Load instead of Instruction::GetElementPtr to getGEPCosts in isMaskedLoadCompress and CheckForShuffledLoads. These call sites estimate costs for wide contiguous loads and sub-vector load patterns, not for masked gather pointer vector formation. Using Instruction::GetElementPtr incorrectly triggered the gather-style cost path, which computes vector GEP formation costs. Since the call sites already add scalarization overhead for pointer vector building separately, this led to double-counting of pointer costs and inaccurate vectorization decisions. Reviewers: hiraditya, RKSimon Pull Request: llvm#191728

…1282)

…copyables, NFC Reviewers: Pull Request: llvm#191730

@f4

…efalign (llvm#191675) PR llvm#155529 (only fired with -ffunction-sections, then modified by PR 184032) compared `MF->getAlignment()` (the backend's minimum function alignment) against `MF->getPreferredAlignment()` to decide whether to emit `.prefalign`. This ignored the IR function's own align attribute, which `emitAlignment` picks up later via `getGVAlignment`, so the comparison was against the wrong minimum. Consequences on x86 (backend min = 1, target pref = 16): * `[[gnu::aligned(32)]] void g(){}` lowers to `align 32 prefalign(32)`. .p2align 5 .prefalign 5, .Lfunc_end, nop The .prefalign is fully redundant: .p2align 5 already forces the desired 32-byte alignment. * `define void @f4() align 32 prefalign(16)`. .p2align 5 .prefalign 4, .Lfunc_end, nop Here .prefalign with a weak alignment is harmless but the assembly output is nonsensical. This patch updates `emitAlignment` to return the effective alignment it emits and use that as the true minimum in `emitFunctionHeader`.

Fix ASAN warning about unexpected format specifier %llc introduced in commit f149ab6. The 'c' format specifier should not have the 'll' length modifier. Separated the 'c' case to use the correct format without the length modifier, casting to int as required by the standard.

Fix 136-byte memory leak introduced in commit 6dc059a. Before that commit, the TextDiagnosticBuffer was passed to DiagnosticsEngine constructor which took ownership and managed its lifetime. After the refactoring, the buffer is no longer passed to DiagnosticsEngine, so it becomes an orphaned allocation that is never freed. Changed to use std::unique_ptr for automatic cleanup.

…91217) This allows to inherit tabbed indent from the lines we break by the lines we want to align. Thus in the AlignWithSpaces mode aligned lines do not generate smaller indent than those they are aligned to.

…llvm#190681) Retrieve the called function and check its memory attributes, to determine if a VPInstruction calling a function reads or writes memory. Use it to strengthen assert in areAllLoadsDereferenceable. PR: llvm#190681

…ts (llvm#191732) They might not match the descriptor contents exactly, so just look at the descriptors.

…d of poison (llvm#191729) Assisted-by: claude-4.6-opus

…vm#191674) Rename IRBuilder and LLVM C API function params for overload types to use names to better reflect their meaning.

…ype` (llvm#190260) `DecodeIITType` does a range check each time the next entry from the IIT encoding table is read. This is required to handle IIT encodings that are in-lined into the `IIT_Table` entries, since the `IITEntries` array in `getIntrinsicInfoTableEntries` is terminated after the last non-zero nibble is seen in the inlined encoding (but that may not be the actual end). Change this code to instead have the `IITEntries` array for the inlined case point to the full `IITValues` array payload + a IIT_Done terminator, so that such entries look exactly like they would if they were encoded in the long encoding table and then remove the range check in `DecodeIITType` to streamline that code a bit. Additionally, change some use if 0s (in loop conditions and default constructed terminator in the IIT long encoding table) to explicitly use IIT_Done to clarify the code better. Also use `consume_front()` in a few places instead of `front()` followed by `slice(1)`.

…1722)

…roup path (llvm#188895) Add a fast path for the common case that total work-group size is multiple of max sub-group size. The fallback path is ported from amdgpu/workitem/clc_get_sub_group_size.cl. Compiler can generate predicated instructions for the fallback path to avoid branches.

…#190587) This avoids a downstream regression where LSR prefers {-1,+1}. When constant zero typically doesn't require preheader initialization (queried via TTI::getIntImmCost), consider it as free in getSetupCost. Three test changes are improvements: amx-across-func.ll, 2011-11-29-postincphi.ll and pr62660-normalization-failure.ll. Other test changes are neutral.

libclc target is now passed in from LLVM_RUNTIME_TARGETS. The old configure flow based on `-DLLVM_ENABLE_RUNTIMES=libclc` is deprecated because libclc no longer has a default target. `-DLLVM_ENABLE_RUNTIMES=libclc -DLLVM_RUNTIME_TARGETS="<target-triple>"` still works but it is considered legacy. The new standard build requires: Each target must now be selected explicitly on the CMake command line through the runtimes target-specific cache entry and LLVM_RUNTIME_TARGETS. For example: -DRUNTIMES_amdgcn-amd-amdhsa-llvm_LLVM_ENABLE_RUNTIMES=libclc -DLLVM_RUNTIME_TARGETS="amdgcn-amd-amdhsa-llvm" -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=libclc -DLLVM_RUNTIME_TARGETS="nvptx64-nvidia-cuda" -DRUNTIMES_clspv--_LLVM_ENABLE_RUNTIMES=libclc -DLLVM_RUNTIME_TARGETS="clspv--" -DRUNTIMES_clspv64--_LLVM_ENABLE_RUNTIMES=libclc -DLLVM_RUNTIME_TARGETS="clspv64--" -DRUNTIMES_spirv-mesa3d-_LLVM_ENABLE_RUNTIMES=libclc -DLLVM_RUNTIME_TARGETS="spirv-mesa3d-" -DRUNTIMES_spirv64-mesa3d-_LLVM_ENABLE_RUNTIMES=libclc -DLLVM_RUNTIME_TARGETS="spirv64-mesa3d-" To build multiple targets, pass them as a semicolon-separated list in `LLVM_RUNTIME_TARGETS` and provide a matching `RUNTIMES_<target-triple>_LLVM_ENABLE_RUNTIMES=libclc` entry for each target. Updated README.md to document the new build flow. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

z1-cciauto · 2026-04-13T04:11:51Z

PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/11/builds/203

aengelke and others added 30 commits April 10, 2026 14:29

[UnitTest][ADT] Add iterator operator== (llvm#191396)

8b37260

Apparently required by some older libstdc++ versions.

[AMDGPU] Do not emit function prologue on naked functions (llvm#191398)

9791929

Summary: Naked functions are intended to allow the user to write the entirety of the function block, so we shouldn't include the `waitcnt` instructions for them.

[flang][OpenMP] Move check for threadprivate iteration variable (llvm…

e267605

…#191208) This moves the test of whether the iteration variable of an affected DO loop is marked as threadprivate. This makes the `ordCollapseLevel` member unnecessary. Issue: llvm#191249

[libc] Add generate-libc-headers custom target (llvm#191160)

a72f7fc

Added the generate-libc-headers custom target depending on libc-headers. This allows troubleshooting headers without needing to install them first.

[LV][NFC] Remove llvm.ident, tbaa and other attributes from tests (ll…

4cd2db4

…vm#191375) While in this area I also removed unnecessary annotations for wchar_size and also cleaned up some more function attributes.

[libc++] Fix incorrect links and broken formatting in CSV status files (

dfbae3e

llvm#191289) Also, update the conformance script to look for closed issues when searching for unlinked issues.

[AMDGPU] Change *-DAG to *-SDAG in check prefixes (llvm#191411)

b35941e

In some cases the use of *-DAG seemed to confuse the update scripts because of the clash with FileCheck's built-in -DAG suffix.

[LLDB][ProcessFreeBSDKernelCore] Log error when creating kernel image (…

8926b3c

…llvm#191397)

[bazel] Fix Bazel build issue with llvm#190862 (llvm#191420)

a7883e5

Fixed 5531990.

[MLIR][Linalg] Specialize linalg.generic to linalg.mmt4d (llvm#189719)

ca80bda

Specialize linalg.generic to linalg.mmt4d based on index map

[PatternMatchHelpers] Improve compile time of m_Combine(And|Or) (llvm…

7cf8207

…#191413) Squelch the stage-2 compile time regression introduced by the variadic m_Combine(And|Or) matchers, by replacing the std::apply on a std::tuple with a recursive inheritance.

Fix ml inliner tests after PR llvm#190168 (llvm#191431)

078c43c

The default inliner policy changed slighlty, which was expected after PR llvm#190168.

Add new coro test to profcheck-xfail (llvm#191436)

15e46e2

Coro haven't yet been fixed up for profcheck, so new tests are likely to fail. mtune.ll exercises loop vectorizer (not fixed)

[clang][modules-driver] Extract logic to feed jobs back into Compilat…

75ceda1

…ion (NFC) (llvm#191441)

[LegalizeIntegerTypes] Remove some unnecessary isTypeLegal checks fro…

606b2a4

…m IntegerExpandSetCCOperands. NFC (llvm#191353) LHSLo and RHSLo must have the same type, we don't need to check both. Same for LHSHi and RHSHi.

H-G-Hristov and others added 26 commits April 12, 2026 17:25

[libc++][ranges] LWG3610: iota_view::size sometimes rejects integer…

b844cc8

…-class types (llvm#155169) Fixes llvm#104948 # References - https://wg21.link/range.iota.view - https://wg21.link/range.iota.view#17 - https://wg21.link/LWG3610 --------- Co-authored-by: A. Jiang <de34@live.cn>

Revert "[AMDGPU][Scheduler] Use MIR-level rematerializer in remateria…

b444d1d

…lization stage (llvm#189491)" This reverts commit be62f27, it breaks the compilation!!! Reviewers: Pull Request: llvm#191717

[RISCV] Enable vfslide1up/down for bf16 shuffles with Zvfbfa. (llvm#1…

e9cd683

…91608)

Revert "[SLP] Fix GEP cost computation for load vectorization cost es…

90d3515

…timates" This reverts commit 778c0fb to fix buildbots https://lab.llvm.org/buildbot/#/builders/213/builds/2725, https://lab.llvm.org/buildbot/#/builders/212/builds/2876 Reviewers: Pull Request: llvm#191725

[InstCombine] Missed fold: umax(x, C) > ~x -> x < 0 (llvm#189396)

8113b98

fix : llvm#187648 Fix the missed optimization for `icmp ugt (umax(x, C)), ~x` and `icmp ult (umax(x, C)), ~x` Alive2 proof: https://alive2.llvm.org/ce/z/dDNJ2m https://alive2.llvm.org/ce/z/X633UX

ValueTracking: Handle frexp exp in computeKnownConstantRange (llvm#19…

0ab10d1

…1282)

[SLP][NFC]Add a test with the reordering of the RHS/LHS operands for …

7c872e9

…copyables, NFC Reviewers: Pull Request: llvm#191730

[clang-format] treat continuation as indent for aligned lines (llvm#1…

029e5b0

…91217) This allows to inherit tabbed indent from the lines we break by the lines we want to align. Thus in the AlignWithSpaces mode aligned lines do not generate smaller indent than those they are aligned to.

[clang][bytecode] Stop using QualTypes when checking evaluation resul…

c0fbdb2

…ts (llvm#191732) They might not match the descriptor contents exactly, so just look at the descriptors.

[CallGraphUpdater] Replace dead function in metadata with null instea…

d946ac3

…d of poison (llvm#191729) Assisted-by: claude-4.6-opus

[NFC][LLVM] Rename IRBuilder/LLVM C API params for overload types (ll…

e62acf4

…vm#191674) Rename IRBuilder and LLVM C API function params for overload types to use names to better reflect their meaning.

[flang][NFC] Fix typo in comment for multi-image environment (llvm#19…

00328f1

…1722)

merge main into amd-main

bfef0cf

z1-cciauto requested review from fabianmcg and nicolasvasilache as code owners April 13, 2026 04:08

z1-cciauto requested a review from a team April 13, 2026 04:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge main into amd-main#2163

merge main into amd-main#2163
z1-cciauto wants to merge 200 commits intoamd-mainfrom
upstream_merge_202604130008

z1-cciauto commented Apr 13, 2026

Uh oh!

z1-cciauto commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

z1-cciauto commented Apr 13, 2026

Uh oh!

z1-cciauto commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants