Open
Conversation
Apparently required by some older libstdc++ versions.
…dify-Write Sequence, Fix llvm#189183 (llvm#190350) This patch improves the SystemZ cost model to identify Read-Modify-Write sequences that can be folded into a single instruction (e.g., ASI, NI, OI). If a load, a scalar arithmetic operation (ADD, SUB, AND, OR, XOR) with an immediate, and a store all target the same memory location and have no external uses, the cost of the arithmetic and store insn should bw 0. This implementation does not include TTI::TCK_RecipThroughput CostKind, as it causes regression in non-power-2-subvector-extract.ll. Fixes llvm#189183. (Refer it for example) --------- Co-authored-by: anoopkg6 <anoopkg6@github.com>
Summary: Naked functions are intended to allow the user to write the entirety of the function block, so we shouldn't include the `waitcnt` instructions for them.
…#191208) This moves the test of whether the iteration variable of an affected DO loop is marked as threadprivate. This makes the `ordCollapseLevel` member unnecessary. Issue: llvm#191249
Added the generate-libc-headers custom target depending on libc-headers. This allows troubleshooting headers without needing to install them first.
…vm#191375) While in this area I also removed unnecessary annotations for wchar_size and also cleaned up some more function attributes.
…1408) Failure to read all required fields for msgbuf isn't ObjectFile's fault but FreeBSD-Kernel-Core plugin specific. Thus this should be logged through `LLDBLog::Process` rather than `LLDBLog::Object`. Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
…lvm#186981) This PR follows suit of the Extensions.md document and provides the same file for OpenMP API extensions. These have previously been stored in OpenMPSupport.md. Having a more centralized view and place for these extensions seems useful. --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
llvm#191289) Also, update the conformance script to look for closed issues when searching for unlinked issues.
…ne table coverage in isolation (llvm#183790) Patch 2 of 3 to add to llvm-dwarfdump the ability to measure DWARF coverage of local variables in terms of source lines, as discussed in [this RFC](https://discourse.llvm.org/t/rfc-debug-info-coverage-tool-v2/83266). This patch adds the ability to compare a variable’s coverage against a baseline, e.g. an unoptimised compilation of the same code. This is provided using the optional `--coverage-baseline` argument. When a baseline is provided, the output also includes a per-variable measure of the line table’s coverage (`LT`, `LTRatio`), distinct from the variable’s coverage proper. See section 2.2 of the RFC for details on this metric.
Reworked libc/docs/gpu/building.rst to match the style of getting_started.rst: * Removed mkdir and cd commands. * Used -S and -B flags for CMake. * Used -C flag for Ninja. * Split commands into smaller blocks with brief explanations. Use the same terminology as elsewhere in the LLVM libc docs and move away from the deprecated runtime terms. * Standard runtimes build -> Bootstrap Build * Runtimes cross build -> Two-stage Cross-compiler Build
In llvm#178306, I made an incorrect assumption that traversing `allproc` in reverse direction would give incremental pid order based on the fact that new processes are added at the head of allproc. However, this assumption is false under certain circumstance such as reusing pid number, thus failing to sort threads correctly. Without using any assumption, explicitly sort threads based on pid retrieved from memory. Fixes: 5349c66 (llvm#178306) --------- Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
llvm#191231) …ties Some of the utilities may be used in symbol resolution which is before the expression analysis is done. In such situations, the typedExpr's normally stored in parser::Expr may not be available. To be able to obtain the numeric values of expressions, using the analyzer directly may be necessary, which requires SemanticsContext to be provided.
…m#191098) The motivation of this PR is to refactor and expose DSO helper functions so they can be used by all compiler-rt libraries, including the profile library, without duplicating dlopen/dlsym (non-Windows) or LoadLibrary/GetProcAddress (Windows) logic in each runtime. Implement the helpers in namespace __interception in interception_linux.cpp for non-Windows targets and interception_win.cpp for Windows, and use them from the existing Linux interception path for RTLD_NEXT/RTLD_DEFAULT/dlvsym lookups. This is NFC for existing libraries that already use interception's public APIs; sanitizer and interception lit behavior is unchanged.
In some cases the use of *-DAG seemed to confuse the update scripts because of the clash with FileCheck's built-in -DAG suffix.
Specialize linalg.generic to linalg.mmt4d based on index map
…erage (llvm#187368) We don't need to run the full exhaustive test for all floating points, as long as we're testing the radix sort code path (which we are, since radix sort triggers at 1024 elements). This reduces the test execution time on my machine from 20s to 12s. Fixes llvm#187329
Fix iterator misuse in four BOLT passes, caught by _GLIBCXX_DEBUG (enabled via LLVM_ENABLE_EXPENSIVE_CHECKS=ON). * AllocCombiner: combineAdjustments() erases instructions while iterating in reverse via llvm::reverse(BB), invalidating the reverse iterator. Defer erasures to after the loop using a SmallVector. * ShrinkWrapping: processDeletions() uses std::prev(BB.eraseInstruction(II)) which is undefined when II == begin(). Restructure to standard forward iteration with erase. * DataflowAnalysis: run() unconditionally dereferences BB->rbegin(), which crashes on empty basic blocks (possible after the ShrinkWrapping fix). Guard with an emptiness check. * IndirectCallPromotion: rewriteCall() dereferences the end iterator via &(*IndCallBlock.end()). Replace with &IndCallBlock.back(). * TailDuplication: constantAndCopyPropagate() uses std::prev(OriginalBB.eraseInstruction(Itr)) which is undefined when Itr == begin(). Restructure to standard forward iteration with erase.
…8271) Example: int foo(int a, int b) { return a - 1 + ~b; } Before, on AArch64: mvn w8, w1 add w8, w0, w8 sub w0, w8, #1 After (matches gcc): sub w0, w0, w1 sub w0, w0, #2 Proof: https://alive2.llvm.org/ce/z/g_bV01
…#191413) Squelch the stage-2 compile time regression introduced by the variadic m_Combine(And|Or) matchers, by replacing the std::apply on a std::tuple with a recursive inheritance.
…ORTED for zOS (llvm#190835) Tests in `llvm/test/Examples` and `llvm/test/ExecutionEngine` use JIT which is unsupported for zOS causing the tests to fail. --------- Co-authored-by: Bahareh Farhadi <bahareh.farhadi@ibm.com>
The default inliner policy changed slighlty, which was expected after PR llvm#190168.
Coro haven't yet been fixed up for profcheck, so new tests are likely to fail. mtune.ll exercises loop vectorizer (not fixed)
When a user calls `omp_control_tool`, a tool is attached and it registered the `ompt_control_tool` callback, the tool should receive a callback with the users arguments. However, in llvm#112924, it was discovered that this only happens after at least one host side directive or runtime call calling into `__kmp_do_middle_initialize` has been executed. The check for `__kmp_init_middle` in `FTN_CONTROL_TOOL` did not try to do the middle initialization and instead always returned `-2` (no tool). A tool therefore received no callback. The user program did not get the info that there is a tool attached. To fix this, change the explicit return to a call of `__kmp_middle_initialize()`, as done in several other places of `libomp`. Further handling is then done in `__kmp_control_tool`, where the values `-2` (no tool), `-1` (no callback), or the tools return value are returned. Also expand the tests to introduce checks where no callaback is registered, or `omp_control_tool` is called before any OpenMP directive. Fixes llvm#112924 CC @jprotze, @hansangbae Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
…(NFC) (llvm#191430) CompilationGraph owns all nodes and edges via `unique_ptr`, but exposes pointers to the underlying objects. Make them non-movable to maintain stable addresses. Make them non-copyable since we don't want to copy `Command` objects they hold or create duplicate root nodes. Apply full rule-of-five to `CompilationGraph`.
…m IntegerExpandSetCCOperands. NFC (llvm#191353) LHSLo and RHSLo must have the same type, we don't need to check both. Same for LHSHi and RHSHi.
While running in server mode, multiple clients can be connected at the same time. In LLDBUtils we had a static mutex that can cause other clients to hang due to the single static lock. Instead, I adjusted the logic to take the existing SBMutex as a paremter and guard that mutex during command handling.
…vm#191591) Reverts llvm#191550 Merged without understanding getImplicitAddend and test convention, and less than 4 hours after a colleague rubber stamping with "I am not ELF or linker expert but to me looks good."
…eption specs (llvm#190593) Functions whose exception spec has not yet been evaluated have no body in the AST. Because the compiler does not generate call sites for these functions before evaluating their spec, they cannot propagate exceptions. Closes llvm#188730
…m#191596) Now that MCAsmInfo stores the MCTargetOptions pointer (set by TargetRegistry::createMCAsmInfo llvm#180464), MCContext can retrieve it via MCAsmInfo. Remove the redundant MCTargetOptions parameter from the MCContext constructor and update all callers.
…lvm#184032) https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019/17 The recently-introduced .prefalign only worked when each function was in its own section (-ffunction-sections), because the section size gave the function body size needed for the alignment rule. This led to -ffunction-sections and -fno-function-sections AsmPrinter differences (llvm#155529), which is rather unusual. This patch fixes this AsmPrinter difference by extending .prefalign to accept an end symbol and a required fill operand: .prefalign <log2_align>, <end_sym>, nop .prefalign <log2_align>, <end_sym>, <fill_byte> The first operand is a log2 alignment value (e.g. 4 means 16-byte alignment). The body size (end_sym_offset - start_offset) determines the alignment: body_size < pref_align => ComputedAlign = std::bit_ceil(body_size) body_size >= pref_align => ComputedAlign = pref_align To also enforce a minimum alignment, emit a .p2align before .prefalign. The fill operand is required: `nop` generates target-appropriate NOP instructions via writeNopData, while an integer in [0,255] fills the padding with that byte value. Initialize MCSection::CurFragList to nullptr and add a null check to skip ELFObjectWriter-created sections like .strtab/.symtab that never receive changeSection calls. relaxPrefAlign is called in both layoutSection and relaxFragment. The layoutSection call ensures correct initial padding before relaxOnce, and is also needed for the post-finishLayout re-layout where relaxOnce is not used. relaxPrefAlign walks forward to the end symbol to compute BodySize (summing fragment sizes), avoiding dependence on stale downstream symbol offsets.
…mpile jobs (llvm#191610) In `createClangModulePrecompileJob`, the `PrependArg` parameter was not being passed for the newly created Clang module precompile job. This causes failures for setups where the clang executable is a wrapper (e.g., the llvm-driver wrapper). See llvm#191258 (comment)
The test has been failing flakily for a while; see PRs llvm#170911, llvm#171469, llvm#188441. Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>
This removes fixes implemented in llvm@ea8c637, llvm@4a58116, and llvm@2b957ed. We don't need them anymore after llvm#130374. --- A little (unfortunate) winding history, mostly for my mental bookeeping. Read the below only if you are curious: There is a function called `findUnwindDestinations` in `SelectionDAGBuilder.cpp`. https://github.com/llvm/llvm-project/blob/c94f79886035a61bb5f3dc992f75fe0c08bdcd4b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp#L2107-L2164 This function adds unwind successors to BBs with `invoke`s. In case of Itanium EH, you only add one `landingpad` BB. In WinEH, `catchswitch` may not catch an exception, so you add all possible unwind destionations. For example, ```ll entry: invoke void @foo() to label %try.cont unwind label %catch.dispatch catch.dispatch: %0 = catchswitch within none [label %catch.start] unwind label %catch.dispatch1 catch.start: ... catch.dispatch1: %7 = catchswitch within none [label %catch.start1] unwind to caller catch.start1: ... ``` `catchswitch` BBs are removed in iSel. So in this case, both `catch.start` and `catch.start1` BBs are added as unwind successors to `entry`, because an exception may not be caught by `catch.dispatch` and unwind further to `catch.dispatch1`. In the beginning of 2019, I added our own `findWasmUnwindDestinations` in llvm@d6f4878. This was when I was implementing [the V2 (pre-legacy) proposal,](https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/pre-legacy/Exceptions-v2.md) which had `exnref` and `try`-`catch_all` (It was named `catch`, but semantically it was `catch_all`) The rationale was, even though we were using WinEH, we only had one catchpad and `catch` caught everything. So I figured adding only the first catchpad successor, `catch.start` in the example above, would simpify things. By the end of 2020, we changed the proposal to [the V3 (legacy) proposal](https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/legacy/Exceptions.md), which removed `exnref` and introduced separate `catch` and `catch_all` instructions. The previous invariant "`catch` always catches everything" didn't hold anymore, but I left `findWasmUnwindDestinations` as was with some updated comments in llvm@9e4eade. The comments could be summed up as "there will always be an `invoke` instruction in the first catchpad that unwinds to the next unwind destination. (which later turned out to be false) And in 2021, I tweaked the ExceptionInfo algorithm to fix exception grouping (llvm@ea8c637, llvm@4a58116, and llvm@2b957ed) The bug was, in tl;dr: "Your next unwind destination can be (accidentally) dominated by your current catchpad, making your unwind destination a subexception of the current exception). For example: ```cpp try { try { foo(); } catch (int) { // EH pad ... } } catch (...) { // unwind destination } ``` Here the outer `catch` is (accidentally) dominated by the inner `catch`, because we only added the first catchpad (inner `catch`) as an unwind successor of `foo()` BB, and hoped that some `invoke`s within the inner `catch` to unwind it to the outer `catch`. But this caused us to `delegate` to a middle of an inner scope. So I tweaked the algorithm to take the outer `catch` out to form a separate exception. I didn't realize `findWasmUnwindDestinations` was actually the source of problem then. Fast forward to 2025. The 2020 assumption of "There will always be an `invoke` instruction in the first catchpad" turned out to be false. So I just removed `findWasmUnwindDestinations` and switched to use the common `findUnwindDestinations` in llvm#130374, which recently accidentally discovered another bug (llvm#187302). While investigating llvm#187302, I realized we don't need those tweaks in WebAssemblyExceptionInfo anymore, because `findUnwindDestinations` adds all unwind destinations as successors. (llvm#187302 is actually not related to this; it was just a trigger to investigate things) So in case of the little C++ example above, the outer `catch` BB will also be added as an unwind successor of the `foo()` BB. I actually think we may not even need WebAssemblyExceptionInfo analysis at all if we only use [the latest standard (exnref) proposal](https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md). But we still need to keep the legacy support, so we need it for now.
`gfx90a` added a set of MFMA instructions that are not available on prior GFXIPs. The Clang builtins for these were requiring the `mai-insts` feature, which is incorrect (`gfx908` supports this and does not support the added MFMAs). This led to opaque bugs where we'd check with `__has_builtin` for the availability of the builtin, target 908, and get an ISEL failure.
…vm#183990) This makes the test `fold_left` and `fold_left_with_iter` with and without telemetrics similar to what we do in `check_iterator`.
…use vpopcntb cttz expansion (llvm#191618) Test coverage for llvm#191520
Previously, getValueType() always returned the compared operand type (e.g. i32) for CmpInst, which was incorrect for gather cost estimation and codegen where the result type (i1) is needed. This caused ad-hoc fixups scattered across getEntryCost, calculateTreeCostAndTrimNonProfitable, and vectorizeTree that overrode ScalarTy back to i1 for CmpInsts. Add a LookThroughCmp parameter to getValueType() (default: false) so callers that need the operand type for vector width calculations can explicitly opt in. This removes the need for the scattered CmpInst special cases: - getEntryCost gather path: remove `if (isa<CmpInst>) ScalarTy = i1` - calculateTreeCostAndTrimNonProfitable: remove same override - vectorizeTree: simplify `if (!isa<CmpInst>) ScalarTy = getValueType(V)` to just `getValueType(V)` For the ICmp/FCmp cost case in getEntryCost, add a fallthrough from ICmp/FCmp to Select that overrides ScalarTy with the compared operand type via getValueType(VL0, true), since getCmpSelInstrCost expects the compared type as its first argument. Fix the condition type argument passed to getCmpSelInstrCost for both scalar and vector paths: use the actual condition/result type instead of always Builder.getInt1Ty(). Reviewers: hiraditya, RKSimon Pull Request: llvm#190618
…calar The LLVM cost model uses integer-valued throughput costs which cannot represent fractional costs. For 2-element vectors, this rounding can make vectorization appear profitable when it actually produces more instructions than the scalar code — the overhead from shuffles, inserts, extracts, and buildvectors is underestimated. Add an instruction-count safety check in getTreeCost that estimates the number of vector instructions (including gathers, shuffles, and extracts) and compares against the number of scalar instructions. If the vector code would produce more instructions, reject the tree regardless of what the cost model says. This catches cases where fractional cost rounding hides real overhead. The check is gated behind -slp-inst-count-check (default: on) and only applies to 2-element root trees where rounding errors matter most. Reviewers: hiraditya, bababuck, RKSimon Pull Request: llvm#190414
…vm#191627) Fixes llvm#191549. Assisted-by: claude-4.6-opus
When SLPReVec is enabled, getValueType returns the vector result type for InsertElement instructions rather than the scalar element type. This caused getEntryCost to propagate an incorrect ScalarTy (e.g. <4 x float> instead of float) into getScalarizationOverhead and getWidenedType, triggering an assertion failure and producing wrong cost estimates. Narrow ScalarTy to its element type when costing vectorized InsertElement entries whose inserted operands are scalars. Fixes llvm#191175. Reviewers: Pull Request: llvm#191628
Fixes: ``` warning: format specifies type 'long' but the argument has type 'intptr_t' ... ```
…91299) After llvm#189372 both minimum iteration checks for epilogue vectorization are created in VPlan, which removes the last blocker for unconditionally running materializeConstantVectorTripCount. This enables additional folds for plans in the native path, as well as removes some trip count computations with epilogue vectorization. PR: llvm#191299
…#191498) NSSW/NUSW on a wider AddRec does not imply NSSW/NUSW on a narrower AddRec. Fixes llvm#191382.
The output currently contains
```
"unicode32"
'u' or "unsigned decimal"
'p' or
"pointer"
"char[]"
"int8_t[]"
```
The 'p' and "pointer" are supposed to appear on the same line. When
we're about to print "pointer," we check whether it would exceed the
column limit (in which case, we insert a line feed). This check only
checks for spaces as separators, but in this case, "words" may be
separated by newlines as well. Look for them too.
…n (NFC) (llvm#189489) This NFC prepares the scheduler's rematerialization stage for integration with the target-independent rematerializer. It brings various small design changes and optimizations to the stage's internal state to make the not-exactly-NFC rematerializer integration as small as possible. The main changes are, in no particular order: - Sort and pick useful rematerialization candidates by their index in the vector of candidates instead of directly sorting objects within the candidate vector. This reduces the amount of data movement and simplifies the candidate selection logic. - Move some data members from `PreRARematStage::RematReg` to `PreRARematStage::ScoredRemat`. This makes the former a simplified version of the rematerializer's own internal register representation (`Rematerializer::Reg`), which can be cleanly deleted during integration. - Remove an inferable argument to `modifyRegionSchedule`. This allows the stage to stop tracking the parent block of each region. - Use a boolean (`RevertAllRegions`) to track scheduling revert decision post rematerialization instead of clearing `RescheduleRegions`. This allows to avoid re-computing the latter during rollback. - Estimate usefulness of rematerialization from `GCNRegPressure` instead of from `Register` (requires adding a new method variant in `GCNRPTarget`).
We had a report of some assertion failures in llvm#190054 (comment), and some msan failures in llvm#190056. These appear to be due to default constructed StringRef's being used in some cases. To address, we can provide default initializers that should prevent such cases from causing further problems.
Collaborator
Author
|
PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/11/builds/196 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.