Open
Conversation
Apparently required by some older libstdc++ versions.
…dify-Write Sequence, Fix llvm#189183 (llvm#190350) This patch improves the SystemZ cost model to identify Read-Modify-Write sequences that can be folded into a single instruction (e.g., ASI, NI, OI). If a load, a scalar arithmetic operation (ADD, SUB, AND, OR, XOR) with an immediate, and a store all target the same memory location and have no external uses, the cost of the arithmetic and store insn should bw 0. This implementation does not include TTI::TCK_RecipThroughput CostKind, as it causes regression in non-power-2-subvector-extract.ll. Fixes llvm#189183. (Refer it for example) --------- Co-authored-by: anoopkg6 <anoopkg6@github.com>
Summary: Naked functions are intended to allow the user to write the entirety of the function block, so we shouldn't include the `waitcnt` instructions for them.
…#191208) This moves the test of whether the iteration variable of an affected DO loop is marked as threadprivate. This makes the `ordCollapseLevel` member unnecessary. Issue: llvm#191249
Added the generate-libc-headers custom target depending on libc-headers. This allows troubleshooting headers without needing to install them first.
…vm#191375) While in this area I also removed unnecessary annotations for wchar_size and also cleaned up some more function attributes.
…1408) Failure to read all required fields for msgbuf isn't ObjectFile's fault but FreeBSD-Kernel-Core plugin specific. Thus this should be logged through `LLDBLog::Process` rather than `LLDBLog::Object`. Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
…lvm#186981) This PR follows suit of the Extensions.md document and provides the same file for OpenMP API extensions. These have previously been stored in OpenMPSupport.md. Having a more centralized view and place for these extensions seems useful. --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
llvm#191289) Also, update the conformance script to look for closed issues when searching for unlinked issues.
…ne table coverage in isolation (llvm#183790) Patch 2 of 3 to add to llvm-dwarfdump the ability to measure DWARF coverage of local variables in terms of source lines, as discussed in [this RFC](https://discourse.llvm.org/t/rfc-debug-info-coverage-tool-v2/83266). This patch adds the ability to compare a variable’s coverage against a baseline, e.g. an unoptimised compilation of the same code. This is provided using the optional `--coverage-baseline` argument. When a baseline is provided, the output also includes a per-variable measure of the line table’s coverage (`LT`, `LTRatio`), distinct from the variable’s coverage proper. See section 2.2 of the RFC for details on this metric.
Reworked libc/docs/gpu/building.rst to match the style of getting_started.rst: * Removed mkdir and cd commands. * Used -S and -B flags for CMake. * Used -C flag for Ninja. * Split commands into smaller blocks with brief explanations. Use the same terminology as elsewhere in the LLVM libc docs and move away from the deprecated runtime terms. * Standard runtimes build -> Bootstrap Build * Runtimes cross build -> Two-stage Cross-compiler Build
In llvm#178306, I made an incorrect assumption that traversing `allproc` in reverse direction would give incremental pid order based on the fact that new processes are added at the head of allproc. However, this assumption is false under certain circumstance such as reusing pid number, thus failing to sort threads correctly. Without using any assumption, explicitly sort threads based on pid retrieved from memory. Fixes: 5349c66 (llvm#178306) --------- Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
llvm#191231) …ties Some of the utilities may be used in symbol resolution which is before the expression analysis is done. In such situations, the typedExpr's normally stored in parser::Expr may not be available. To be able to obtain the numeric values of expressions, using the analyzer directly may be necessary, which requires SemanticsContext to be provided.
…m#191098) The motivation of this PR is to refactor and expose DSO helper functions so they can be used by all compiler-rt libraries, including the profile library, without duplicating dlopen/dlsym (non-Windows) or LoadLibrary/GetProcAddress (Windows) logic in each runtime. Implement the helpers in namespace __interception in interception_linux.cpp for non-Windows targets and interception_win.cpp for Windows, and use them from the existing Linux interception path for RTLD_NEXT/RTLD_DEFAULT/dlvsym lookups. This is NFC for existing libraries that already use interception's public APIs; sanitizer and interception lit behavior is unchanged.
In some cases the use of *-DAG seemed to confuse the update scripts because of the clash with FileCheck's built-in -DAG suffix.
Specialize linalg.generic to linalg.mmt4d based on index map
…erage (llvm#187368) We don't need to run the full exhaustive test for all floating points, as long as we're testing the radix sort code path (which we are, since radix sort triggers at 1024 elements). This reduces the test execution time on my machine from 20s to 12s. Fixes llvm#187329
Fix iterator misuse in four BOLT passes, caught by _GLIBCXX_DEBUG (enabled via LLVM_ENABLE_EXPENSIVE_CHECKS=ON). * AllocCombiner: combineAdjustments() erases instructions while iterating in reverse via llvm::reverse(BB), invalidating the reverse iterator. Defer erasures to after the loop using a SmallVector. * ShrinkWrapping: processDeletions() uses std::prev(BB.eraseInstruction(II)) which is undefined when II == begin(). Restructure to standard forward iteration with erase. * DataflowAnalysis: run() unconditionally dereferences BB->rbegin(), which crashes on empty basic blocks (possible after the ShrinkWrapping fix). Guard with an emptiness check. * IndirectCallPromotion: rewriteCall() dereferences the end iterator via &(*IndCallBlock.end()). Replace with &IndCallBlock.back(). * TailDuplication: constantAndCopyPropagate() uses std::prev(OriginalBB.eraseInstruction(Itr)) which is undefined when Itr == begin(). Restructure to standard forward iteration with erase.
…8271) Example: int foo(int a, int b) { return a - 1 + ~b; } Before, on AArch64: mvn w8, w1 add w8, w0, w8 sub w0, w8, #1 After (matches gcc): sub w0, w0, w1 sub w0, w0, #2 Proof: https://alive2.llvm.org/ce/z/g_bV01
…#191413) Squelch the stage-2 compile time regression introduced by the variadic m_Combine(And|Or) matchers, by replacing the std::apply on a std::tuple with a recursive inheritance.
…ORTED for zOS (llvm#190835) Tests in `llvm/test/Examples` and `llvm/test/ExecutionEngine` use JIT which is unsupported for zOS causing the tests to fail. --------- Co-authored-by: Bahareh Farhadi <bahareh.farhadi@ibm.com>
The default inliner policy changed slighlty, which was expected after PR llvm#190168.
Coro haven't yet been fixed up for profcheck, so new tests are likely to fail. mtune.ll exercises loop vectorizer (not fixed)
When a user calls `omp_control_tool`, a tool is attached and it registered the `ompt_control_tool` callback, the tool should receive a callback with the users arguments. However, in llvm#112924, it was discovered that this only happens after at least one host side directive or runtime call calling into `__kmp_do_middle_initialize` has been executed. The check for `__kmp_init_middle` in `FTN_CONTROL_TOOL` did not try to do the middle initialization and instead always returned `-2` (no tool). A tool therefore received no callback. The user program did not get the info that there is a tool attached. To fix this, change the explicit return to a call of `__kmp_middle_initialize()`, as done in several other places of `libomp`. Further handling is then done in `__kmp_control_tool`, where the values `-2` (no tool), `-1` (no callback), or the tools return value are returned. Also expand the tests to introduce checks where no callaback is registered, or `omp_control_tool` is called before any OpenMP directive. Fixes llvm#112924 CC @jprotze, @hansangbae Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
…(NFC) (llvm#191430) CompilationGraph owns all nodes and edges via `unique_ptr`, but exposes pointers to the underlying objects. Make them non-movable to maintain stable addresses. Make them non-copyable since we don't want to copy `Command` objects they hold or create duplicate root nodes. Apply full rule-of-five to `CompilationGraph`.
…m IntegerExpandSetCCOperands. NFC (llvm#191353) LHSLo and RHSLo must have the same type, we don't need to check both. Same for LHSHi and RHSHi.
While running in server mode, multiple clients can be connected at the same time. In LLDBUtils we had a static mutex that can cause other clients to hang due to the single static lock. Instead, I adjusted the logic to take the existing SBMutex as a paremter and guard that mutex during command handling.
…calar The LLVM cost model uses integer-valued throughput costs which cannot represent fractional costs. For 2-element vectors, this rounding can make vectorization appear profitable when it actually produces more instructions than the scalar code — the overhead from shuffles, inserts, extracts, and buildvectors is underestimated. Add an instruction-count safety check in getTreeCost that estimates the number of vector instructions (including gathers, shuffles, and extracts) and compares against the number of scalar instructions. If the vector code would produce more instructions, reject the tree regardless of what the cost model says. This catches cases where fractional cost rounding hides real overhead. The check is gated behind -slp-inst-count-check (default: on) and only applies to 2-element root trees where rounding errors matter most. Reviewers: hiraditya, bababuck, RKSimon Pull Request: llvm#190414
…vm#191627) Fixes llvm#191549. Assisted-by: claude-4.6-opus
When SLPReVec is enabled, getValueType returns the vector result type for InsertElement instructions rather than the scalar element type. This caused getEntryCost to propagate an incorrect ScalarTy (e.g. <4 x float> instead of float) into getScalarizationOverhead and getWidenedType, triggering an assertion failure and producing wrong cost estimates. Narrow ScalarTy to its element type when costing vectorized InsertElement entries whose inserted operands are scalars. Fixes llvm#191175. Reviewers: Pull Request: llvm#191628
Fixes: ``` warning: format specifies type 'long' but the argument has type 'intptr_t' ... ```
…91299) After llvm#189372 both minimum iteration checks for epilogue vectorization are created in VPlan, which removes the last blocker for unconditionally running materializeConstantVectorTripCount. This enables additional folds for plans in the native path, as well as removes some trip count computations with epilogue vectorization. PR: llvm#191299
…#191498) NSSW/NUSW on a wider AddRec does not imply NSSW/NUSW on a narrower AddRec. Fixes llvm#191382.
The output currently contains
```
"unicode32"
'u' or "unsigned decimal"
'p' or
"pointer"
"char[]"
"int8_t[]"
```
The 'p' and "pointer" are supposed to appear on the same line. When
we're about to print "pointer," we check whether it would exceed the
column limit (in which case, we insert a line feed). This check only
checks for spaces as separators, but in this case, "words" may be
separated by newlines as well. Look for them too.
…n (NFC) (llvm#189489) This NFC prepares the scheduler's rematerialization stage for integration with the target-independent rematerializer. It brings various small design changes and optimizations to the stage's internal state to make the not-exactly-NFC rematerializer integration as small as possible. The main changes are, in no particular order: - Sort and pick useful rematerialization candidates by their index in the vector of candidates instead of directly sorting objects within the candidate vector. This reduces the amount of data movement and simplifies the candidate selection logic. - Move some data members from `PreRARematStage::RematReg` to `PreRARematStage::ScoredRemat`. This makes the former a simplified version of the rematerializer's own internal register representation (`Rematerializer::Reg`), which can be cleanly deleted during integration. - Remove an inferable argument to `modifyRegionSchedule`. This allows the stage to stop tracking the parent block of each region. - Use a boolean (`RevertAllRegions`) to track scheduling revert decision post rematerialization instead of clearing `RescheduleRegions`. This allows to avoid re-computing the latter during rollback. - Estimate usefulness of rematerialization from `GCNRegPressure` instead of from `Register` (requires adding a new method variant in `GCNRPTarget`).
We had a report of some assertion failures in llvm#190054 (comment), and some msan failures in llvm#190056. These appear to be due to default constructed StringRef's being used in some cases. To address, we can provide default initializers that should prevent such cases from causing further problems.
…leSpec when checking LoadScriptFromSymFile setting (llvm#191473) We were incorrectly passing the script's `FileSpec` into `GetScriptLoadStyleForModule`. Meaning if a script name wasn't actually the same as the module name, the `target.auto-load-scripts-for-modules` didn't take effect. This patch passes the module's `FileSpec` instead. For `dSYM`s we save the original `FileSpec` because the loop tries to strip extensions until it finds a script. But we still want to use the module's name. **AI Usage**: - Used Claude to write the unit-test skeletons. Then reviewed/adjusted them manually
Ensure all StringRef members are default initialized to avoid potential bugs.
…eter packs (llvm#191484) I believe that is the intent of SubstIndex in AssociatedConstraint. So this enforces the checking explicitly, in case nested SubstIndexes confuses our poor constraint evaluator. I reverted the previous fix 257cc5a because that was wrong. As a drive-by fix, this also removes an strange assertion and an unnecessary SubstIndex setup in nested requirement transform. No release note because this is a regression fix. Fixes llvm#188505 Fixes llvm#190169
AsmPrinter needs to hold state between doInitialization, runOnMachineFunction, and doFinalization, which are all separate passes in the NewPM. Storing this state externally somewhere like MachineModuleInfo or a new analysis is possible, but a bit messy given some state, particularly EHHandler objects, has backreferences into the AsmPrinter and assumes there is a single AsmPrinter throughout the entire compilation. So instead, store AsmPrinter in an analysis that stays constant throughout compilation which solves all these problems. This also means we can also just let AsmPrinter continue to own the MCStreamer, which means object file emission should work after this as well. This does require passing the ModuleAnalysisManager into buildCodeGenPipeline to register the AsmPrinterAnalysis, but that seems pretty reasonable to do. Reviewers: paperchalice, RKSimon, arsenm Pull Request: llvm#191535
…m#186766) ## Description When `AMDGPUTargetLowering::performStoreCombine` inserts a synthetic bitcast to convert vector types (e.g. `<1 x float>` → `i32`) for stores, the bitcast inherits the **store's** SDLoc. When `DAGCombiner::visitBITCAST` later folds `bitcast(load)` → `load`, the resulting load loses its original debug location. ## Analysis The bitcast is **not** present in the initial SelectionDAG — it is inserted during DAGCombine by `AMDGPUTargetLowering::performStoreCombine`. This can be observed with `-debug-only=isel,dagcombine`: ``` Initial selection DAG: no bitcast, load is v1f32 directly used by store Combining: t17: ch = store ... /tmp/beans.c:6:14 ... into: t20: ch = store ... /tmp/beans.c:6:14 Combining: t19: i32 = bitcast [ORD=3] # D:1 t13, /tmp/beans.c:6:14 ... into: t21: i32,ch = load ... /tmp/beans.c:6:14 ``` In `performStoreCombine` (`AMDGPUISelLowering.cpp`): ```cpp SDLoc SL(N); // N = store node → SL has store's DebugLoc ... SDValue CastVal = DAG.getNode(ISD::BITCAST, SL, NewVT, Val); // bitcast gets store's DebugLoc, not load's ``` When `visitBITCAST` folds `bitcast(load)` → `load`, it uses `SDLoc(N)` (the bitcast's loc = store's loc), so the resulting load loses its original debug location. ``` Before (initial DAG): t13: v1f32 = load ... line 2 ; original load t14: ch = store t13, ... line 3 ; store After performStoreCombine: t13: v1f32 = load ... line 2 ; original load t19: i32 = bitcast t13 line 3 ; synthetic bitcast (store's loc!) t20: ch = store t19, ... line 3 After visitBITCAST folds (incorrect): t21: i32 = load ... line 0 ; lost debug location After visitBITCAST folds (expected): t21: i32 = load ... line 2 ; preserves load's location ``` ## Fix Target-specific fix in `AMDGPUISelLowering.cpp` `performStoreCombine`: use `DAG.getBitcast()` instead of `DAG.getNode(ISD::BITCAST, SL, ...)`. `getBitcast()` internally uses `SDLoc(V)` (the value operand's SDLoc), so the synthetic bitcast naturally inherits the load's DebugLoc instead of the store's: ```cpp // Before: SDValue CastVal = DAG.getNode(ISD::BITCAST, SL, NewVT, Val); if (OtherUses) { SDValue CastBack = DAG.getNode(ISD::BITCAST, SL, VT, CastVal); // After: SDValue CastVal = DAG.getBitcast(NewVT, Val); if (OtherUses) { SDValue CastBack = DAG.getBitcast(VT, CastVal); ``` This is consistent with `performLoadCombine` where the bitcast also uses the load's `SDLoc`.
…vm#190543) Qt 6.11 added `OVERRIDE` and `VIRTUAL` keywords to the [property system](https://doc.qt.io/qt-6.11/properties.html).
When building the LLVM installer on Windows, fix CRT / dllimport mismatch and unused locals / tautological comparisons in env handling.
We prefer statically linking all library dependencies.
Fixes a few warnings found while building the LLVM installer with `llvm/utils/release/build_llvm_release.bat --x64 --version 23.0.0 --skip-checkout --local-python`.
…romoting. (llvm#191568) The conversion needs to be done by promoting to f32. If we're already at LMUL=8, we need to split before we can promote.
Collaborator
Author
|
PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/11/builds/198 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.