merge main into amd-main by z1-cciauto · Pull Request #2153 · ROCm/llvm-project

z1-cciauto · 2026-04-12T04:06:27Z

No description provided.

Apparently required by some older libstdc++ versions.

…dify-Write Sequence, Fix llvm#189183 (llvm#190350) This patch improves the SystemZ cost model to identify Read-Modify-Write sequences that can be folded into a single instruction (e.g., ASI, NI, OI). If a load, a scalar arithmetic operation (ADD, SUB, AND, OR, XOR) with an immediate, and a store all target the same memory location and have no external uses, the cost of the arithmetic and store insn should bw 0. This implementation does not include TTI::TCK_RecipThroughput CostKind, as it causes regression in non-power-2-subvector-extract.ll. Fixes llvm#189183. (Refer it for example) --------- Co-authored-by: anoopkg6 <anoopkg6@github.com>

Summary: Naked functions are intended to allow the user to write the entirety of the function block, so we shouldn't include the `waitcnt` instructions for them.

…#191208) This moves the test of whether the iteration variable of an affected DO loop is marked as threadprivate. This makes the `ordCollapseLevel` member unnecessary. Issue: llvm#191249

Added the generate-libc-headers custom target depending on libc-headers. This allows troubleshooting headers without needing to install them first.

…vm#191375) While in this area I also removed unnecessary annotations for wchar_size and also cleaned up some more function attributes.

…1408) Failure to read all required fields for msgbuf isn't ObjectFile's fault but FreeBSD-Kernel-Core plugin specific. Thus this should be logged through `LLDBLog::Process` rather than `LLDBLog::Object`. Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>

…lvm#186981) This PR follows suit of the Extensions.md document and provides the same file for OpenMP API extensions. These have previously been stored in OpenMPSupport.md. Having a more centralized view and place for these extensions seems useful. --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

llvm#191289) Also, update the conformance script to look for closed issues when searching for unlinked issues.

…ne table coverage in isolation (llvm#183790) Patch 2 of 3 to add to llvm-dwarfdump the ability to measure DWARF coverage of local variables in terms of source lines, as discussed in [this RFC](https://discourse.llvm.org/t/rfc-debug-info-coverage-tool-v2/83266). This patch adds the ability to compare a variable’s coverage against a baseline, e.g. an unoptimised compilation of the same code. This is provided using the optional `--coverage-baseline` argument. When a baseline is provided, the output also includes a per-variable measure of the line table’s coverage (`LT`, `LTRatio`), distinct from the variable’s coverage proper. See section 2.2 of the RFC for details on this metric.

Reworked libc/docs/gpu/building.rst to match the style of getting_started.rst: * Removed mkdir and cd commands. * Used -S and -B flags for CMake. * Used -C flag for Ninja. * Split commands into smaller blocks with brief explanations. Use the same terminology as elsewhere in the LLVM libc docs and move away from the deprecated runtime terms. * Standard runtimes build -> Bootstrap Build * Runtimes cross build -> Two-stage Cross-compiler Build

In llvm#178306, I made an incorrect assumption that traversing `allproc` in reverse direction would give incremental pid order based on the fact that new processes are added at the head of allproc. However, this assumption is false under certain circumstance such as reusing pid number, thus failing to sort threads correctly. Without using any assumption, explicitly sort threads based on pid retrieved from memory. Fixes: 5349c66 (llvm#178306) --------- Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>

llvm#191231) …ties Some of the utilities may be used in symbol resolution which is before the expression analysis is done. In such situations, the typedExpr's normally stored in parser::Expr may not be available. To be able to obtain the numeric values of expressions, using the analyzer directly may be necessary, which requires SemanticsContext to be provided.

…m#191098) The motivation of this PR is to refactor and expose DSO helper functions so they can be used by all compiler-rt libraries, including the profile library, without duplicating dlopen/dlsym (non-Windows) or LoadLibrary/GetProcAddress (Windows) logic in each runtime. Implement the helpers in namespace __interception in interception_linux.cpp for non-Windows targets and interception_win.cpp for Windows, and use them from the existing Linux interception path for RTLD_NEXT/RTLD_DEFAULT/dlvsym lookups. This is NFC for existing libraries that already use interception's public APIs; sanitizer and interception lit behavior is unchanged.

In some cases the use of *-DAG seemed to confuse the update scripts because of the clash with FileCheck's built-in -DAG suffix.

…llvm#191397)

Fixed 5531990.

Specialize linalg.generic to linalg.mmt4d based on index map

…erage (llvm#187368) We don't need to run the full exhaustive test for all floating points, as long as we're testing the radix sort code path (which we are, since radix sort triggers at 1024 elements). This reduces the test execution time on my machine from 20s to 12s. Fixes llvm#187329

Fix iterator misuse in four BOLT passes, caught by _GLIBCXX_DEBUG (enabled via LLVM_ENABLE_EXPENSIVE_CHECKS=ON). * AllocCombiner: combineAdjustments() erases instructions while iterating in reverse via llvm::reverse(BB), invalidating the reverse iterator. Defer erasures to after the loop using a SmallVector. * ShrinkWrapping: processDeletions() uses std::prev(BB.eraseInstruction(II)) which is undefined when II == begin(). Restructure to standard forward iteration with erase. * DataflowAnalysis: run() unconditionally dereferences BB->rbegin(), which crashes on empty basic blocks (possible after the ShrinkWrapping fix). Guard with an emptiness check. * IndirectCallPromotion: rewriteCall() dereferences the end iterator via &(*IndCallBlock.end()). Replace with &IndCallBlock.back(). * TailDuplication: constantAndCopyPropagate() uses std::prev(OriginalBB.eraseInstruction(Itr)) which is undefined when Itr == begin(). Restructure to standard forward iteration with erase.

…8271) Example: int foo(int a, int b) { return a - 1 + ~b; } Before, on AArch64: mvn w8, w1 add w8, w0, w8 sub w0, w8, #1 After (matches gcc): sub w0, w0, w1 sub w0, w0, #2 Proof: https://alive2.llvm.org/ce/z/g_bV01

…#191413) Squelch the stage-2 compile time regression introduced by the variadic m_Combine(And|Or) matchers, by replacing the std::apply on a std::tuple with a recursive inheritance.

…ORTED for zOS (llvm#190835) Tests in `llvm/test/Examples` and `llvm/test/ExecutionEngine` use JIT which is unsupported for zOS causing the tests to fail. --------- Co-authored-by: Bahareh Farhadi <bahareh.farhadi@ibm.com>

The default inliner policy changed slighlty, which was expected after PR llvm#190168.

Coro haven't yet been fixed up for profcheck, so new tests are likely to fail. mtune.ll exercises loop vectorizer (not fixed)

@jprotze

When a user calls `omp_control_tool`, a tool is attached and it registered the `ompt_control_tool` callback, the tool should receive a callback with the users arguments. However, in llvm#112924, it was discovered that this only happens after at least one host side directive or runtime call calling into `__kmp_do_middle_initialize` has been executed. The check for `__kmp_init_middle` in `FTN_CONTROL_TOOL` did not try to do the middle initialization and instead always returned `-2` (no tool). A tool therefore received no callback. The user program did not get the info that there is a tool attached. To fix this, change the explicit return to a call of `__kmp_middle_initialize()`, as done in several other places of `libomp`. Further handling is then done in `__kmp_control_tool`, where the values `-2` (no tool), `-1` (no callback), or the tools return value are returned. Also expand the tests to introduce checks where no callaback is registered, or `omp_control_tool` is called before any OpenMP directive. Fixes llvm#112924 CC @jprotze, @hansangbae Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>

…(NFC) (llvm#191430) CompilationGraph owns all nodes and edges via `unique_ptr`, but exposes pointers to the underlying objects. Make them non-movable to maintain stable addresses. Make them non-copyable since we don't want to copy `Command` objects they hold or create duplicate root nodes. Apply full rule-of-five to `CompilationGraph`.

…ion (NFC) (llvm#191441)

…m IntegerExpandSetCCOperands. NFC (llvm#191353) LHSLo and RHSLo must have the same type, we don't need to check both. Same for LHSHi and RHSHi.

While running in server mode, multiple clients can be connected at the same time. In LLDBUtils we had a static mutex that can cause other clients to hang due to the single static lock. Instead, I adjusted the logic to take the existing SBMutex as a paremter and guard that mutex during command handling.

…calar The LLVM cost model uses integer-valued throughput costs which cannot represent fractional costs. For 2-element vectors, this rounding can make vectorization appear profitable when it actually produces more instructions than the scalar code — the overhead from shuffles, inserts, extracts, and buildvectors is underestimated. Add an instruction-count safety check in getTreeCost that estimates the number of vector instructions (including gathers, shuffles, and extracts) and compares against the number of scalar instructions. If the vector code would produce more instructions, reject the tree regardless of what the cost model says. This catches cases where fractional cost rounding hides real overhead. The check is gated behind -slp-inst-count-check (default: on) and only applies to 2-element root trees where rounding errors matter most. Reviewers: hiraditya, bababuck, RKSimon Pull Request: llvm#190414

…init (llvm#190530) Fixes llvm#152024.

…vm#191627) Fixes llvm#191549. Assisted-by: claude-4.6-opus

When SLPReVec is enabled, getValueType returns the vector result type for InsertElement instructions rather than the scalar element type. This caused getEntryCost to propagate an incorrect ScalarTy (e.g. <4 x float> instead of float) into getScalarizationOverhead and getWidenedType, triggering an assertion failure and producing wrong cost estimates. Narrow ScalarTy to its element type when costing vectorized InsertElement entries whose inserted operands are scalars. Fixes llvm#191175. Reviewers: Pull Request: llvm#191628

Fixes: ``` warning: format specifies type 'long' but the argument has type 'intptr_t' ... ```

…91299) After llvm#189372 both minimum iteration checks for epilogue vectorization are created in VPlan, which removes the last blocker for unconditionally running materializeConstantVectorTripCount. This enables additional folds for plans in the native path, as well as removes some trip count computations with epilogue vectorization. PR: llvm#191299

…#191498) NSSW/NUSW on a wider AddRec does not imply NSSW/NUSW on a narrower AddRec. Fixes llvm#191382.

…`s (llvm#191621)

The output currently contains ``` "unicode32" 'u' or "unsigned decimal" 'p' or "pointer" "char[]" "int8_t[]" ``` The 'p' and "pointer" are supposed to appear on the same line. When we're about to print "pointer," we check whether it would exceed the column limit (in which case, we insert a line feed). This check only checks for spaces as separators, but in this case, "words" may be separated by newlines as well. Look for them too.

…n (NFC) (llvm#189489) This NFC prepares the scheduler's rematerialization stage for integration with the target-independent rematerializer. It brings various small design changes and optimizations to the stage's internal state to make the not-exactly-NFC rematerializer integration as small as possible. The main changes are, in no particular order: - Sort and pick useful rematerialization candidates by their index in the vector of candidates instead of directly sorting objects within the candidate vector. This reduces the amount of data movement and simplifies the candidate selection logic. - Move some data members from `PreRARematStage::RematReg` to `PreRARematStage::ScoredRemat`. This makes the former a simplified version of the rematerializer's own internal register representation (`Rematerializer::Reg`), which can be cleanly deleted during integration. - Remove an inferable argument to `modifyRegionSchedule`. This allows the stage to stop tracking the parent block of each region. - Use a boolean (`RevertAllRegions`) to track scheduling revert decision post rematerialization instead of clearing `RescheduleRegions`. This allows to avoid re-computing the latter during rollback. - Estimate usefulness of rematerialization from `GCNRegPressure` instead of from `Register` (requires adding a new method variant in `GCNRPTarget`).

…91578)

We had a report of some assertion failures in llvm#190054 (comment), and some msan failures in llvm#190056. These appear to be due to default constructed StringRef's being used in some cases. To address, we can provide default initializers that should prevent such cases from causing further problems.

…leSpec when checking LoadScriptFromSymFile setting (llvm#191473) We were incorrectly passing the script's `FileSpec` into `GetScriptLoadStyleForModule`. Meaning if a script name wasn't actually the same as the module name, the `target.auto-load-scripts-for-modules` didn't take effect. This patch passes the module's `FileSpec` instead. For `dSYM`s we save the original `FileSpec` because the loop tries to strip extensions until it finds a script. But we still want to use the module's name. **AI Usage**: - Used Claude to write the unit-test skeletons. Then reviewed/adjusted them manually

Ensure all StringRef members are default initialized to avoid potential bugs.

…eter packs (llvm#191484) I believe that is the intent of SubstIndex in AssociatedConstraint. So this enforces the checking explicitly, in case nested SubstIndexes confuses our poor constraint evaluator. I reverted the previous fix 257cc5a because that was wrong. As a drive-by fix, this also removes an strange assertion and an unnecessary SubstIndex setup in nested requirement transform. No release note because this is a regression fix. Fixes llvm#188505 Fixes llvm#190169

…SPass.cpp (llvm#191647)

AsmPrinter needs to hold state between doInitialization, runOnMachineFunction, and doFinalization, which are all separate passes in the NewPM. Storing this state externally somewhere like MachineModuleInfo or a new analysis is possible, but a bit messy given some state, particularly EHHandler objects, has backreferences into the AsmPrinter and assumes there is a single AsmPrinter throughout the entire compilation. So instead, store AsmPrinter in an analysis that stays constant throughout compilation which solves all these problems. This also means we can also just let AsmPrinter continue to own the MCStreamer, which means object file emission should work after this as well. This does require passing the ModuleAnalysisManager into buildCodeGenPipeline to register the AsmPrinterAnalysis, but that seems pretty reasonable to do. Reviewers: paperchalice, RKSimon, arsenm Pull Request: llvm#191535

…m#186766) ## Description When `AMDGPUTargetLowering::performStoreCombine` inserts a synthetic bitcast to convert vector types (e.g. `<1 x float>` → `i32`) for stores, the bitcast inherits the **store's** SDLoc. When `DAGCombiner::visitBITCAST` later folds `bitcast(load)` → `load`, the resulting load loses its original debug location. ## Analysis The bitcast is **not** present in the initial SelectionDAG — it is inserted during DAGCombine by `AMDGPUTargetLowering::performStoreCombine`. This can be observed with `-debug-only=isel,dagcombine`: ``` Initial selection DAG: no bitcast, load is v1f32 directly used by store Combining: t17: ch = store ... /tmp/beans.c:6:14 ... into: t20: ch = store ... /tmp/beans.c:6:14 Combining: t19: i32 = bitcast [ORD=3] # D:1 t13, /tmp/beans.c:6:14 ... into: t21: i32,ch = load ... /tmp/beans.c:6:14 ``` In `performStoreCombine` (`AMDGPUISelLowering.cpp`): ```cpp SDLoc SL(N); // N = store node → SL has store's DebugLoc ... SDValue CastVal = DAG.getNode(ISD::BITCAST, SL, NewVT, Val); // bitcast gets store's DebugLoc, not load's ``` When `visitBITCAST` folds `bitcast(load)` → `load`, it uses `SDLoc(N)` (the bitcast's loc = store's loc), so the resulting load loses its original debug location. ``` Before (initial DAG): t13: v1f32 = load ... line 2 ; original load t14: ch = store t13, ... line 3 ; store After performStoreCombine: t13: v1f32 = load ... line 2 ; original load t19: i32 = bitcast t13 line 3 ; synthetic bitcast (store's loc!) t20: ch = store t19, ... line 3 After visitBITCAST folds (incorrect): t21: i32 = load ... line 0 ; lost debug location After visitBITCAST folds (expected): t21: i32 = load ... line 2 ; preserves load's location ``` ## Fix Target-specific fix in `AMDGPUISelLowering.cpp` `performStoreCombine`: use `DAG.getBitcast()` instead of `DAG.getNode(ISD::BITCAST, SL, ...)`. `getBitcast()` internally uses `SDLoc(V)` (the value operand's SDLoc), so the synthetic bitcast naturally inherits the load's DebugLoc instead of the store's: ```cpp // Before: SDValue CastVal = DAG.getNode(ISD::BITCAST, SL, NewVT, Val); if (OtherUses) { SDValue CastBack = DAG.getNode(ISD::BITCAST, SL, VT, CastVal); // After: SDValue CastVal = DAG.getBitcast(NewVT, Val); if (OtherUses) { SDValue CastBack = DAG.getBitcast(VT, CastVal); ``` This is consistent with `performLoadCombine` where the bitcast also uses the load's `SDLoc`.

…vm#190543) Qt 6.11 added `OVERRIDE` and `VIRTUAL` keywords to the [property system](https://doc.qt.io/qt-6.11/properties.html).

When building the LLVM installer on Windows, fix CRT / dllimport mismatch and unused locals / tautological comparisons in env handling.

We prefer statically linking all library dependencies.

Fixes a few warnings found while building the LLVM installer with `llvm/utils/release/build_llvm_release.bat --x64 --version 23.0.0 --skip-checkout --local-python`.

Implemented in llvm@fc4661a

…romoting. (llvm#191568) The conversion needs to be done by promoting to f32. If we're already at LMUL=8, we need to split before we can promote.

z1-cciauto · 2026-04-12T04:08:52Z

PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/11/builds/198

aengelke and others added 30 commits April 10, 2026 14:29

[UnitTest][ADT] Add iterator operator== (llvm#191396)

8b37260

Apparently required by some older libstdc++ versions.

[AMDGPU] Do not emit function prologue on naked functions (llvm#191398)

9791929

Summary: Naked functions are intended to allow the user to write the entirety of the function block, so we shouldn't include the `waitcnt` instructions for them.

[flang][OpenMP] Move check for threadprivate iteration variable (llvm…

e267605

…#191208) This moves the test of whether the iteration variable of an affected DO loop is marked as threadprivate. This makes the `ordCollapseLevel` member unnecessary. Issue: llvm#191249

[libc] Add generate-libc-headers custom target (llvm#191160)

a72f7fc

Added the generate-libc-headers custom target depending on libc-headers. This allows troubleshooting headers without needing to install them first.

[LV][NFC] Remove llvm.ident, tbaa and other attributes from tests (ll…

4cd2db4

…vm#191375) While in this area I also removed unnecessary annotations for wchar_size and also cleaned up some more function attributes.

[libc++] Fix incorrect links and broken formatting in CSV status files (

dfbae3e

llvm#191289) Also, update the conformance script to look for closed issues when searching for unlinked issues.

[AMDGPU] Change *-DAG to *-SDAG in check prefixes (llvm#191411)

b35941e

In some cases the use of *-DAG seemed to confuse the update scripts because of the clash with FileCheck's built-in -DAG suffix.

[LLDB][ProcessFreeBSDKernelCore] Log error when creating kernel image (…

8926b3c

…llvm#191397)

[bazel] Fix Bazel build issue with llvm#190862 (llvm#191420)

a7883e5

Fixed 5531990.

[MLIR][Linalg] Specialize linalg.generic to linalg.mmt4d (llvm#189719)

ca80bda

Specialize linalg.generic to linalg.mmt4d based on index map

[PatternMatchHelpers] Improve compile time of m_Combine(And|Or) (llvm…

7cf8207

…#191413) Squelch the stage-2 compile time regression introduced by the variadic m_Combine(And|Or) matchers, by replacing the std::apply on a std::tuple with a recursive inheritance.

Fix ml inliner tests after PR llvm#190168 (llvm#191431)

078c43c

The default inliner policy changed slighlty, which was expected after PR llvm#190168.

Add new coro test to profcheck-xfail (llvm#191436)

15e46e2

Coro haven't yet been fixed up for profcheck, so new tests are likely to fail. mtune.ll exercises loop vectorizer (not fixed)

[clang][modules-driver] Extract logic to feed jobs back into Compilat…

75ceda1

…ion (NFC) (llvm#191441)

[LegalizeIntegerTypes] Remove some unnecessary isTypeLegal checks fro…

606b2a4

…m IntegerExpandSetCCOperands. NFC (llvm#191353) LHSLo and RHSLo must have the same type, we don't need to check both. Same for LHSHi and RHSHi.

alexey-bataev and others added 26 commits April 11, 2026 08:18

[clang-tidy] Add IgnoreMacros option to readability-redundant-member-…

00ccd11

…init (llvm#190530) Fixes llvm#152024.

[OpenMP] Fix nondependent inscan variables in templated functions (ll…

54d4bf2

…vm#191627) Fixes llvm#191549. Assisted-by: claude-4.6-opus

[mlir] Fix warning when building on Windows (llvm#191558)

e2bb91c

Fixes: ``` warning: format specifies type 'long' but the argument has type 'intptr_t' ... ```

[SCEV] Bail out on wider AddRecs in SCEVWrapPrediacte::implies. (llvm…

9dd1eb4

…#191498) NSSW/NUSW on a wider AddRec does not imply NSSW/NUSW on a narrower AddRec. Fixes llvm#191382.

[libc++][NFC] Sync <mdspan> synopsis and remove redundant `typename…

eda97dd

…`s (llvm#191621)

[RISCV] Add missing Zvfbfa isel patterns for VFSLIDE1UP/DOWN. (llvm#1…

9ab2c57

…91578)

[RISCV] Consistently use hasVInstructionsF16/BF16(). NFC (llvm#191592)

d35cd21

[clang-doc][nfc] Default initialize all StringRef members (llvm#191641)

155b9b3

Ensure all StringRef members are default initialized to avoid potential bugs.

[NFC][AMDGPU] clang-format llvm/lib/Target/AMDGPU/AMDGPULowerModuleLD…

2a54bf5

…SPass.cpp (llvm#191647)

[clang-format] Update QtPropertyKeywords to Qt 6.11 documentation (ll…

0c0ae37

…vm#190543) Qt 6.11 added `OVERRIDE` and `VIRTUAL` keywords to the [property system](https://doc.qt.io/qt-6.11/properties.html).

[flang-rt] Fix warnings on Windows (llvm#191562)

ecc283a

When building the LLVM installer on Windows, fix CRT / dllimport mismatch and unused locals / tautological comparisons in env handling.

[CMake] Enable static libxml2 for Fuchsia toolchain (llvm#191657)

4fb5b78

We prefer statically linking all library dependencies.

[LLDB] Silence warnings when building on Windows (llvm#191566)

5347264

Fixes a few warnings found while building the LLVM installer with `llvm/utils/release/build_llvm_release.bat --x64 --version 23.0.0 --skip-checkout --local-python`.

[libc++][ranges][NFC] Mark LWG3947 as implemented (llvm#191642)

4b2c155

Implemented in llvm@fc4661a

[RISCV] Split LMUL=8 f16 fixed vector (s/u)ittofp/fpto(s/u)i before p…

af209b6

…romoting. (llvm#191568) The conversion needs to be done by promoting to f32. If we're already at LMUL=8, we need to split before we can promote.

merge main into amd-main

c9e615d

z1-cciauto requested a review from a team April 12, 2026 04:06

z1-cciauto requested review from fabianmcg and nicolasvasilache as code owners April 12, 2026 04:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge main into amd-main#2153

merge main into amd-main#2153
z1-cciauto wants to merge 156 commits intoamd-mainfrom
upstream_merge_202604120006

z1-cciauto commented Apr 12, 2026

Uh oh!

z1-cciauto commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

z1-cciauto commented Apr 12, 2026

Uh oh!

z1-cciauto commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants