Skip to content

merge main into amd-staging#2190

Merged
z1-cciauto merged 57 commits intoamd-stagingfrom
upstream_merge_202604140706
Apr 14, 2026
Merged

merge main into amd-staging#2190
z1-cciauto merged 57 commits intoamd-stagingfrom
upstream_merge_202604140706

Conversation

@z1-cciauto
Copy link
Copy Markdown
Collaborator

No description provided.

thurstond and others added 30 commits April 13, 2026 18:58
Extend copyMetadata to every call-to-call replacement in
AMDGPULowerIntrinsics, not just the single-wave s_barrier →
wave_barrier path. This covers:
- s_cluster_barrier → wave_barrier (single-wave)
- s_cluster_barrier → signal_isfirst + wait + signal + wait (multi-wave)
- s_barrier → signal + wait (split barriers)

Add GFX11 and GFX12 RUN lines and test functions for all lowering
paths to verify metadata preservation.

Made-with: Cursor
…1925)

Closes llvm#191910

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
…lvm#191745)

This reverts commit 4abb927.

The code is not needed since 121f5a9 because the C compiler is now
always just-built clang in in-tree build. In addition, CMAKE_AR is
llvm-ar and CMAKE_RANLIB is llvm-ranlib.
…fier (llvm#191849)

Currrently the signature of `result(..)` is:
```python
result(*, infer_type: bool = False, default_factory: Callable[[], Any] | None = None, kw_only: bool = False) -> Result
```

so when users use `result(infer_type=True)`, the type checkers will
still get `kw_only=False` (from the signature), but actually the
`kw_only` should be `True` (it should follow the value of `infer_type`).
users can use `result(infer_type=True, kw_only=True)` but it's
unnecessarily verbose.

So it may introduce an incompatibility when we start to use
`dataclass_transform`. currently it's fine because we just don't use
`dataclass_transform`. But when we use, we may require a breaking
change.

This PR migrates such use to a new field specifier named
`infer_result()`.
These seemed to have gotten removed here.
…llvm#188189)

Upstreaming clangIR PR: llvm/clangir#2092

This PR adds support for emitting llvm.used and llvm.compiler.used
global arrays in CIR.

Added addUsedGlobal() and addCompilerUsedGlobal() methods to
CIRGenModule
Adds __hip_cuid_* to llvm.compiler.used for HIP compilation.
Followed OGCG implementation in clang/lib/CodeGen/CodeGenModule.cpp
…test (llvm#191936)

Using -fopenmp uses the default openmp lib, which defaults to libomp but
may be something else. This test only passes with libomp, so it passes
when using default, but fails downstream if configured for something
else, like libgomp.
)

When the linker is specified as ld, toolchain applies special handling
by invoking (triple)-ld instead of resolving ld via standard PATH
lookup. This causes GNU ld installed via the system package manager to
take the precedence (since (triple)-ld appears earlier in the search
path), effectively overriding ld.lld.

As a result, we set the default Linker on FreeBSD to ld.lld to indicate
we want to use lld by default.
…lvm#190250)

A bare `!$omp declare target` could incorrectly mark `_QQmain` as
`omp.declare_target` when it appeared in an interface body inside a
named
main program. That pulled host-only callees into device compilation and
caused offload link failures.

Fix this by skipping main programs in the implicit-capture path.
Also add a regression test for the named-main interface case and update
`real10.f90` to use a valid container for the bare `declare target`
form.

This fixes offload link failures where `_QQmain` was incorrectly treated
as
a device function and pulled in host-only symbols such as Fortran I/O
runtime calls.

Minimal reproducer:

```fortran
program named_main
  interface
    subroutine sub_a(x)
      !$omp declare target
      integer, intent(inout) :: x
    end subroutine
  end interface
  integer :: v
  !$omp target
    call sub_a(v)
  !$omp end target
end program
…m#191841)

`AddressSize` parameter is not used by `DataExtractor` and will be
removed in the future. See llvm#190519 for more context.

I took the liberty of switching from using the `StringRef` constructor
overload to `ArrayRef` where appropriate.
…m#191864)

`AddressSize` parameter is not used by `DataExtractor` and will be
removed in the future. See llvm#190519 for more context.
Updated
[TidyFastCheck.inc](https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/TidyFastChecks.inc#L1)
that has been stale for a while using this
[script](https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/TidyFastChecks.py),
as discussed in llvm#190531. In the thread, there was some conversation on
the limitations of doing this manually at every new release (adding the
script to the release checklist would definitely help) but it seems like
this is the only low-risk solution for now.
In the RVV Clang builtins generator, a new prototype descriptor
`d` was added to represent vectors with `2 x LMUL`.

The `.ll` tests were generated by LLM and I have reviewed them.

And the .c tests were generated by
riscv-non-isa/riscv-rvv-intrinsic-doc#431.
llvm#191731)

Add `UseFact`s for field origins when calling instance methods.

Fixes llvm#182945

---------

Co-authored-by: Utkarsh Saxena <usx@google.com>
…191956)

This reduces the bytecode output for the copy constructor of a struct
such as:

```c++
struct Buffer {
  struct {
    char D[N];
  } V;

  Buffer() = default;
};
```
from
```
Buffer<5>::(unnamed struct)::(unnamed struct at array.cpp:873:3) 0x7d38d2de3f80
frame size: 104
arg size:   96
rvo:        0
this arg:   1
0      GetPtrThisField          16
16     GetParamPtr              0
32     GetPtrFieldPop           16
48     InitScope                0
64     SetLocalPtr              40
80     GetLocalPtr              40
96     ArrayDecay
104    ExpandPtr
112    ConstUint64              0
128    ArrayElemPtrPopUint64
136    LoadPopSint8
144    InitElemSint8            0
160    GetLocalPtr              40
176    ArrayDecay
184    ExpandPtr
192    ConstUint64              1
208    ArrayElemPtrPopUint64
216    LoadPopSint8
224    InitElemSint8            1
240    GetLocalPtr              40
256    ArrayDecay
264    ExpandPtr
272    ConstUint64              2
288    ArrayElemPtrPopUint64
296    LoadPopSint8
304    InitElemSint8            2
320    GetLocalPtr              40
336    ArrayDecay
344    ExpandPtr
352    ConstUint64              3
368    ArrayElemPtrPopUint64
376    LoadPopSint8
384    InitElemSint8            3
400    GetLocalPtr              40
416    ArrayDecay
424    ExpandPtr
432    ConstUint64              4
448    ArrayElemPtrPopUint64
456    LoadPopSint8
464    InitElemSint8            4
480    FinishInitPop
488    Destroy                  0
504    Destroy                  0
520    RetVoid
```
(where `N = 5`).

to:
```
Buffer<5>::(unnamed struct)::(unnamed struct at array.cpp:873:3) 0x7c85b9fe3f80
frame size: 0
arg size:   96
rvo:        0
this arg:   1
0     GetPtrThisField    16
16    GetParamPtr        0
32    GetPtrFieldPop     16
48    CopyArraySint8     0 0 5
80    FinishInitPop
88    RetVoid
```
…llvm#186593)

Add new SelectionDAG pattern matchers for funnel shifts:
- m_FShL and m_FShR as ternary wrappers for ISD::FSHL/ISD::FSHR
- m_FShLLike and m_FShRLike to match:
-- direct FSHL/FSHR nodes
-- ROTL/ROTR equivalents (binding both X and Y to the same rotate operand)
-- OR(SHL(X, C), SRL(Y, BW - C)) forms (including commuted OR)

Also add unit tests covering positive and negative cases for:
- direct funnel-shif matching
- rotate equivalence matching
- OR-based funnel-shift-like patterns

Fixes llvm#185880
Fixes llvm#190502

Added implementation of helper combineOrWithGF2P8AFFINEQB and wired the logic with combineOrXorWithSETCC:

Fold: (GF2P8AFFINEQB(X, Y, Imm) or_disjoint SplatVal) -> GF2P8AFFINEQB(X, Y, Imm ^ SplatVal)

When OR is disjoint (no common bits), the splat constant can be folded directly into the GF2P8AFFINEQB immediate via XOR.
Fixes a problem that tryCompressVPMOVPattern incorrectly folds
instruction using extended registers into VEX. Introduced relevant tests
in MIR.

AI Statement: I used AI to write the tests.
Fixes llvm#191304
…mdspan` taking `(data_handle_type, mapping_type, accessor_type)` and the corresponding constructor (llvm#191950)

No functional change; this only removes a redundant const qualifier.

Fixes: llvm#189860
…190838)

Almost all recipes now go through ::computeCost to properly compute
their costs using the VPlan-based cost model. There are currently no
known cases where the VPlan-based cost model returns an incorrect cost
vs the legacy cost model. I check the remaining open issues with reports
of the assertion triggering and in all cases the VPlan-based cost model
is more accurate, which is causing the divergence.

There are still some fall-back paths, mostly via precomputeCosts, but
those cannot be easily removed without triggering the assert, as the
VPlan-based cost model is more accurate for those cases. An example of
this is llvm#187056.

Fixes llvm#38575. 
Fixes llvm#149651. 
Fixes llvm#182646. 
Fixes llvm#183739. 
Fixes llvm#187523.

PR: llvm#190838
…lvm#190139)

For some cores it is preferable to choose a destination predicate
register that does not match the governing predicate.

The hint is conservative in that it tries not to pick a callee-save
register if it's not already used/allocated for other purposes, as that
would introduce new spills/fills. Note that this might be preferable if
the instruction is executed in a loop, but it might also be less
preferable for small functions that have an SVE interface (p4-p15 are
caller-preserved).

It is enabled for all cores by default, but it can be disabled by adding
the `disable-distinct-dst-reg-cmp-match` feature. This feature can also
be added to specific cores if this behaviour is undesirable.
Extend the existing NonNarrowingCastsOptimization to also cover casts
between floating point types f32, f16, bf16, f8E4M3FN and F8E5M2. Avoid
introducing direct casts between f8 types since those are not allowed in
TOSA.

Also expand the set of cases that are considering non-narrowing by only
checking if the cast we're trying to remove is non-narrowing. Example
i16 -> i32 -> i8 would have been rejected before, but it is now safely
converted to a single i16 -> i8 tosa.cast, since the behaviour should
identical for the entire input space.

Finally disallow the optimization in the case when the cast that we
would remove involves integer types of different signedness.

Signed-off-by: Ian Tayler Lessa <ian.taylerlessa@arm.com>
…lvm#191820)

SPIR-V cannot encode hidden for now, which leads to quirky errors. For
now we deal with this at run time, as part of JIT. Once SPIR-V learns
about `hidden` it'll be revisited.
This patch builds on llvm#184659 and llvm#184649 and adds cost modelling for new
dot instructions variants, codegened in those patches.
zyedidia and others added 21 commits April 14, 2026 02:38
…#186896)

This builds on the MCLFIRewriter infrastructure to add the
AArch64-specific LFI rewriter, which rewrites AArch64 instructions for
LFI sandboxing during the assembler step.

The initial rewriter handles system instructions: system calls, thread
pointer accesses, and also rejects modifications to reserved registers.
llvm#191814)

Initially such ops were marked Pure wrongly since they could overflow or
underflow the accumulator and result in undefined behavior.

Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>
…1576)

This introduces two macros that do the same
`UnwindLogMsg()`/`UnwindLogMsgVerbose()` functions, but allow using
`formatv()`-style formatting. In addition to the benefits that the
`formatv()` function provides, this makes `log enable -F lldb unwind`
print the correct methods names from which the messages originate
(previously, it printed the name of one of those two helper methods).

I didn't replace all function calls with macros because there are too
many of them for one PR. This only replaces calls whose format string
contains no specifiers or only '%s' specifiers.
Ret was uint32_t truncating the uint64_t __readlink return, and was
compared against the unrelated getdents64 BufSize (1024) instead of
sizeof(TargetPath) (NameMax, 4096). A truncated readlink of exactly
NameMax bytes also wrote one byte past TargetPath.
…lvm#192032)

Replace calls to `UnwindLogMsg()`/`UnwindLogMsgVerbose()` with
`UNWIND_LOG`/`UNWIND_LOG_VERBOSE` macros introduced in 8417922.

This replaces calls whose format string contains only '%d' and sometimes
'%s' specifiers, the rest will be addressed in a future patch.

As a result of this change, the `UnwindLogMsgVerbose()` is no longer
used and has been removed.
@z1-cciauto z1-cciauto requested a review from a team April 14, 2026 11:06
@z1-cciauto
Copy link
Copy Markdown
Collaborator Author

@z1-cciauto z1-cciauto merged commit c924de4 into amd-staging Apr 14, 2026
49 of 50 checks passed
@z1-cciauto z1-cciauto deleted the upstream_merge_202604140706 branch April 14, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.