merge main into amd-staging by z1-cciauto · Pull Request #2190 · ROCm/llvm-project

z1-cciauto · 2026-04-14T11:06:38Z

No description provided.

…189657)" (llvm#191912) This reverts commit 67c893e due to buildbot breakage (llvm#189657 (comment), llvm#189657 (comment)).

Extend copyMetadata to every call-to-call replacement in AMDGPULowerIntrinsics, not just the single-wave s_barrier → wave_barrier path. This covers: - s_cluster_barrier → wave_barrier (single-wave) - s_cluster_barrier → signal_isfirst + wait + signal + wait (multi-wave) - s_barrier → signal + wait (split barriers) Add GFX11 and GFX12 RUN lines and test functions for all lowering paths to verify metadata preservation. Made-with: Cursor

…1925) Closes llvm#191910 --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>

…lvm#191745) This reverts commit 4abb927. The code is not needed since 121f5a9 because the C compiler is now always just-built clang in in-tree build. In addition, CMAKE_AR is llvm-ar and CMAKE_RANLIB is llvm-ranlib.

…fier (llvm#191849) Currrently the signature of `result(..)` is: ```python result(*, infer_type: bool = False, default_factory: Callable[[], Any] | None = None, kw_only: bool = False) -> Result ``` so when users use `result(infer_type=True)`, the type checkers will still get `kw_only=False` (from the signature), but actually the `kw_only` should be `True` (it should follow the value of `infer_type`). users can use `result(infer_type=True, kw_only=True)` but it's unnecessarily verbose. So it may introduce an incompatibility when we start to use `dataclass_transform`. currently it's fine because we just don't use `dataclass_transform`. But when we use, we may require a breaking change. This PR migrates such use to a new field specifier named `infer_result()`.

These seemed to have gotten removed here.

…llvm#188189) Upstreaming clangIR PR: llvm/clangir#2092 This PR adds support for emitting llvm.used and llvm.compiler.used global arrays in CIR. Added addUsedGlobal() and addCompilerUsedGlobal() methods to CIRGenModule Adds __hip_cuid_* to llvm.compiler.used for HIP compilation. Followed OGCG implementation in clang/lib/CodeGen/CodeGenModule.cpp

…test (llvm#191936) Using -fopenmp uses the default openmp lib, which defaults to libomp but may be something else. This test only passes with libomp, so it passes when using default, but fails downstream if configured for something else, like libgomp.

… test. (llvm#191941) This fixes https://lab.llvm.org/buildbot/#/builders/187/builds/18954.

) When the linker is specified as ld, toolchain applies special handling by invoking (triple)-ld instead of resolving ld via standard PATH lookup. This causes GNU ld installed via the system package manager to take the precedence (since (triple)-ld appears earlier in the search path), effectively overriding ld.lld. As a result, we set the default Linker on FreeBSD to ld.lld to indicate we want to use lld by default.

…lvm#190250) A bare `!$omp declare target` could incorrectly mark `_QQmain` as `omp.declare_target` when it appeared in an interface body inside a named main program. That pulled host-only callees into device compilation and caused offload link failures. Fix this by skipping main programs in the implicit-capture path. Also add a regression test for the named-main interface case and update `real10.f90` to use a valid container for the bare `declare target` form. This fixes offload link failures where `_QQmain` was incorrectly treated as a device function and pulled in host-only symbols such as Fortran I/O runtime calls. Minimal reproducer: ```fortran program named_main interface subroutine sub_a(x) !$omp declare target integer, intent(inout) :: x end subroutine end interface integer :: v !$omp target call sub_a(v) !$omp end target end program

…m#191841) `AddressSize` parameter is not used by `DataExtractor` and will be removed in the future. See llvm#190519 for more context. I took the liberty of switching from using the `StringRef` constructor overload to `ArrayRef` where appropriate.

…m#191864) `AddressSize` parameter is not used by `DataExtractor` and will be removed in the future. See llvm#190519 for more context.

)

Updated [TidyFastCheck.inc](https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/TidyFastChecks.inc#L1) that has been stale for a while using this [script](https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/TidyFastChecks.py), as discussed in llvm#190531. In the thread, there was some conversation on the limitations of doing this manually at every new release (adding the script to the release checklist would definitely help) but it seems like this is the only low-risk solution for now.

In the RVV Clang builtins generator, a new prototype descriptor `d` was added to represent vectors with `2 x LMUL`. The `.ll` tests were generated by LLM and I have reviewed them. And the .c tests were generated by riscv-non-isa/riscv-rvv-intrinsic-doc#431.

llvm#191731) Add `UseFact`s for field origins when calling instance methods. Fixes llvm#182945 --------- Co-authored-by: Utkarsh Saxena <usx@google.com>

…191956) This reduces the bytecode output for the copy constructor of a struct such as: ```c++ struct Buffer { struct { char D[N]; } V; Buffer() = default; }; ``` from ``` Buffer<5>::(unnamed struct)::(unnamed struct at array.cpp:873:3) 0x7d38d2de3f80 frame size: 104 arg size: 96 rvo: 0 this arg: 1 0 GetPtrThisField 16 16 GetParamPtr 0 32 GetPtrFieldPop 16 48 InitScope 0 64 SetLocalPtr 40 80 GetLocalPtr 40 96 ArrayDecay 104 ExpandPtr 112 ConstUint64 0 128 ArrayElemPtrPopUint64 136 LoadPopSint8 144 InitElemSint8 0 160 GetLocalPtr 40 176 ArrayDecay 184 ExpandPtr 192 ConstUint64 1 208 ArrayElemPtrPopUint64 216 LoadPopSint8 224 InitElemSint8 1 240 GetLocalPtr 40 256 ArrayDecay 264 ExpandPtr 272 ConstUint64 2 288 ArrayElemPtrPopUint64 296 LoadPopSint8 304 InitElemSint8 2 320 GetLocalPtr 40 336 ArrayDecay 344 ExpandPtr 352 ConstUint64 3 368 ArrayElemPtrPopUint64 376 LoadPopSint8 384 InitElemSint8 3 400 GetLocalPtr 40 416 ArrayDecay 424 ExpandPtr 432 ConstUint64 4 448 ArrayElemPtrPopUint64 456 LoadPopSint8 464 InitElemSint8 4 480 FinishInitPop 488 Destroy 0 504 Destroy 0 520 RetVoid ``` (where `N = 5`). to: ``` Buffer<5>::(unnamed struct)::(unnamed struct at array.cpp:873:3) 0x7c85b9fe3f80 frame size: 0 arg size: 96 rvo: 0 this arg: 1 0 GetPtrThisField 16 16 GetParamPtr 0 32 GetPtrFieldPop 16 48 CopyArraySint8 0 0 5 80 FinishInitPop 88 RetVoid ```

…llvm#186593) Add new SelectionDAG pattern matchers for funnel shifts: - m_FShL and m_FShR as ternary wrappers for ISD::FSHL/ISD::FSHR - m_FShLLike and m_FShRLike to match: -- direct FSHL/FSHR nodes -- ROTL/ROTR equivalents (binding both X and Y to the same rotate operand) -- OR(SHL(X, C), SRL(Y, BW - C)) forms (including commuted OR) Also add unit tests covering positive and negative cases for: - direct funnel-shif matching - rotate equivalence matching - OR-based funnel-shift-like patterns Fixes llvm#185880

Fixes llvm#190502 Added implementation of helper combineOrWithGF2P8AFFINEQB and wired the logic with combineOrXorWithSETCC: Fold: (GF2P8AFFINEQB(X, Y, Imm) or_disjoint SplatVal) -> GF2P8AFFINEQB(X, Y, Imm ^ SplatVal) When OR is disjoint (no common bits), the splat constant can be folded directly into the GF2P8AFFINEQB immediate via XOR.

Fixes a problem that tryCompressVPMOVPattern incorrectly folds instruction using extended registers into VEX. Introduced relevant tests in MIR. AI Statement: I used AI to write the tests. Fixes llvm#191304

…mdspan` taking `(data_handle_type, mapping_type, accessor_type)` and the corresponding constructor (llvm#191950) No functional change; this only removes a redundant const qualifier. Fixes: llvm#189860

…190838) Almost all recipes now go through ::computeCost to properly compute their costs using the VPlan-based cost model. There are currently no known cases where the VPlan-based cost model returns an incorrect cost vs the legacy cost model. I check the remaining open issues with reports of the assertion triggering and in all cases the VPlan-based cost model is more accurate, which is causing the divergence. There are still some fall-back paths, mostly via precomputeCosts, but those cannot be easily removed without triggering the assert, as the VPlan-based cost model is more accurate for those cases. An example of this is llvm#187056. Fixes llvm#38575. Fixes llvm#149651. Fixes llvm#182646. Fixes llvm#183739. Fixes llvm#187523. PR: llvm#190838

…lvm#190139) For some cores it is preferable to choose a destination predicate register that does not match the governing predicate. The hint is conservative in that it tries not to pick a callee-save register if it's not already used/allocated for other purposes, as that would introduce new spills/fills. Note that this might be preferable if the instruction is executed in a loop, but it might also be less preferable for small functions that have an SVE interface (p4-p15 are caller-preserved). It is enabled for all cores by default, but it can be disabled by adding the `disable-distinct-dst-reg-cmp-match` feature. This feature can also be added to specific cores if this behaviour is undesirable.

Follow up for llvm#191300

Extend the existing NonNarrowingCastsOptimization to also cover casts between floating point types f32, f16, bf16, f8E4M3FN and F8E5M2. Avoid introducing direct casts between f8 types since those are not allowed in TOSA. Also expand the set of cases that are considering non-narrowing by only checking if the cast we're trying to remove is non-narrowing. Example i16 -> i32 -> i8 would have been rejected before, but it is now safely converted to a single i16 -> i8 tosa.cast, since the behaviour should identical for the entire input space. Finally disallow the optimization in the case when the cast that we would remove involves integer types of different signedness. Signed-off-by: Ian Tayler Lessa <ian.taylerlessa@arm.com>

…lvm#191820) SPIR-V cannot encode hidden for now, which leads to quirky errors. For now we deal with this at run time, as part of JIT. Once SPIR-V learns about `hidden` it'll be revisited.

This patch builds on llvm#184659 and llvm#184649 and adds cost modelling for new dot instructions variants, codegened in those patches.

…#186896) This builds on the MCLFIRewriter infrastructure to add the AArch64-specific LFI rewriter, which rewrites AArch64 instructions for LFI sandboxing during the assembler step. The initial rewriter handles system instructions: system calls, thread pointer accesses, and also rejects modifications to reserved registers.

llvm#191814) Initially such ops were marked Pure wrongly since they could overflow or underflow the accumulator and result in undefined behavior. Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>

This relates to llvm#35980.

…1576) This introduces two macros that do the same `UnwindLogMsg()`/`UnwindLogMsgVerbose()` functions, but allow using `formatv()`-style formatting. In addition to the benefits that the `formatv()` function provides, this makes `log enable -F lldb unwind` print the correct methods names from which the messages originate (previously, it printed the name of one of those two helper methods). I didn't replace all function calls with macros because there are too many of them for one PR. This only replaces calls whose format string contains no specifiers or only '%s' specifiers.

This relates to llvm#35980.

Ret was uint32_t truncating the uint64_t __readlink return, and was compared against the unrelated getdents64 BufSize (1024) instead of sizeof(TargetPath) (NameMax, 4096). A truncated readlink of exactly NameMax bytes also wrote one byte past TargetPath.

…lvm#192032) Replace calls to `UnwindLogMsg()`/`UnwindLogMsgVerbose()` with `UNWIND_LOG`/`UNWIND_LOG_VERBOSE` macros introduced in 8417922. This replaces calls whose format string contains only '%d' and sometimes '%s' specifiers, the rest will be addressed in a future patch. As a result of this change, the `UnwindLogMsgVerbose()` is no longer used and has been removed.

…#189657)" (llvm#191939) This reverts commit bfff42c.

z1-cciauto · 2026-04-14T11:11:29Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/5124

thurstond and others added 30 commits April 13, 2026 18:58

Revert "[libc++][format] P3953R3: Rename std::runtime_format (llvm#…

bfff42c

…189657)" (llvm#191912) This reverts commit 67c893e due to buildbot breakage (llvm#189657 (comment), llvm#189657 (comment)).

[Offload] Revert part of llvm#187138. Workaround llvm#191910 (llvm#19…

aab5c10

…1925) Closes llvm#191910 --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>

[CIR][CUDA] Global emission for fatbin symbols (llvm#187636)

93871c5

[bazel] Restore MLIR bytecode tests. (llvm#191938)

42ce5c1

These seemed to have gotten removed here.

[NFC] Use stable_sort to fix the basic-block-sections-code-prefetch.l…

c2624b5

… test. (llvm#191941) This fixes https://lab.llvm.org/buildbot/#/builders/187/builds/18954.

[XRay] Remove unused argument of DataExtractor constructor (NFC) (llv…

d9c02ff

…m#191864) `AddressSize` parameter is not used by `DataExtractor` and will be removed in the future. See llvm#190519 for more context.

[NFC] clang-format llvm/lib/CodeGen/InsertCodePrefetch.cpp. (llvm#191959

b8b5962

)

[clang][bytecode] Use qualified name in Function::dump() (llvm#191958)

8c9ce12

[LifetimeSafety] Detect use-after-scope through fields in member calls (

78cd6c9

llvm#191731) Add `UseFact`s for field origins when calling instance methods. Fixes llvm#182945 --------- Co-authored-by: Utkarsh Saxena <usx@google.com>

[X86] Fix VPMOVPattern folding for extended registers (llvm#191760)

dd034ae

Fixes a problem that tryCompressVPMOVPattern incorrectly folds instruction using extended registers into VEX. Introduced relevant tests in MIR. AI Statement: I used AI to write the tests. Fixes llvm#191304

[libc++] LWG4511: Inconsistency between the deduction guide of `std::…

c838322

…mdspan` taking `(data_handle_type, mapping_type, accessor_type)` and the corresponding constructor (llvm#191950) No functional change; this only removes a redundant const qualifier. Fixes: llvm#189860

[mlir][tosa] Create and use utility to print shapes (llvm#191774)

eb9a9b9

Follow up for llvm#191300

[Driver][HIP] Do not default to hidden visibility for AMDGCNSPIRV (l…

8beed11

…lvm#191820) SPIR-V cannot encode hidden for now, which leads to quirky errors. For now we deal with this at run time, as part of JIT. Once SPIR-V learns about `hidden` it'll be revisited.

[AArch64] Add new dot insts. to cost model (llvm#189642)

bdec04f

This patch builds on llvm#184659 and llvm#184649 and adds cost modelling for new dot instructions variants, codegened in those patches.

zyedidia and others added 21 commits April 14, 2026 02:38

[mlir][spirv] Mark several SPIR-V TOSA Ext Inst ops as NoMemoryEffects (

cabb972

llvm#191814) Initially such ops were marked Pure wrongly since they could overflow or underflow the accumulator and result in undefined behavior. Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>

[llvm][DebugInfo] formatv in DWARFDebugRangeList (llvm#191989)

4c543ac

This relates to llvm#35980.

[llvm][DebugInfo] formatv in DWARFDebugRnglists (llvm#191991)

1e68dcc

This relates to llvm#35980.

[llvm][DebugInfo] formatv in DWARFDie (llvm#191992)

d023386

This relates to llvm#35980.

[llvm][DebugInfo] formatv in DWARFListTable (llvm#191996)

20c2216

This relates to llvm#35980.

[llvm][DebugInfo] formatv in DWARFTypeUnit (llvm#191997)

2e07997

This relates to llvm#35980.

[llvm][DebugInfo] formatv in DWARFUnwindTablePrinter (llvm#191999)

9f1da15

This relates to llvm#35980.

[llvm][DebugInfo] formatv in LVCodeViewVisitor (llvm#192010)

cf3a6c8

This relates to llvm#35980.

[llvm][DebugInfo] formatv in LVBinaryReader (llvm#192009)

34e5b95

This relates to llvm#35980.

[llvm][DebugInfo] formatv in LVScope (llvm#192008)

453d0e2

This relates to llvm#35980.

[llvm][DebugInfo] formatv in LVRange (llvm#192006)

65c462a

This relates to llvm#35980.

[llvm][DebugInfo] formatv in LVObject (llvm#192004)

8a5dc12

This relates to llvm#35980.

[llvm][DebugInfo] formatv in LVLocation (llvm#192003)

941b0f4

This relates to llvm#35980.

[llvm][DebugInfo] formatv in LVElement (llvm#192002)

2b593be

This relates to llvm#35980.

[BOLT][runtime] harden profile file open. (llvm#191669)

297510f

Reapply "[libc++][format] P3953R3: Rename std::runtime_format (llvm…

270e065

…#189657)" (llvm#191939) This reverts commit bfff42c.

merge main into amd-staging

c533cb2

z1-cciauto requested a review from a team April 14, 2026 11:06

z1-cciauto requested review from antiagainst, kuhar, lamb-j and stellaraccident as code owners April 14, 2026 11:06

ronlieb approved these changes Apr 14, 2026

View reviewed changes

z1-cciauto merged commit c924de4 into amd-staging Apr 14, 2026
49 of 50 checks passed

z1-cciauto deleted the upstream_merge_202604140706 branch April 14, 2026 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge main into amd-staging#2190

merge main into amd-staging#2190
z1-cciauto merged 57 commits intoamd-stagingfrom
upstream_merge_202604140706

z1-cciauto commented Apr 14, 2026

Uh oh!

z1-cciauto commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

z1-cciauto commented Apr 14, 2026

Uh oh!

z1-cciauto commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants