Skip to content

[LLVM][Transforms][Attributor] - Improvements in AAIntraFnReachability::isReachableImpl#2182

Open
bhandarkar-pranav wants to merge 2 commits intoamd-stagingfrom
amd/dev/bhandarkar-pranav/attributor-dfs_num_visited
Open

[LLVM][Transforms][Attributor] - Improvements in AAIntraFnReachability::isReachableImpl#2182
bhandarkar-pranav wants to merge 2 commits intoamd-stagingfrom
amd/dev/bhandarkar-pranav/attributor-dfs_num_visited

Conversation

@bhandarkar-pranav
Copy link
Copy Markdown

Reduce overhead in the Attributor's reachability (intra-function) analysis.

Improve AAIntraFnReachabilityFunction::isReachableImpl - Replace per-query SmallPtrSet<BasicBlock*> visited set with a persistent vector indexed by DominatorTree DFS numbers. Each basic block has a unique DFSNumIn in [0, N), making it a perfect dense array index. A monotonically increasing query ID avoids clearing the vector between queries: a block is "visited" iff VisitedMap[DFSNumIn] == CurrentQueryID. This eliminates per-query allocation, pointer hashing, and destruction for BFS traversal.
Motivated by a representative workload where we were seeing ~381K BFS traverals during LTO with many GPU kernels (e.g., Fortran OpenMP offloading applications).

(Cherry-picked from llvm#189132)

  • Approved by Jan upstream
  • Gated by Johannes though, so opening it downstream so we can get this in a drop to the DCGPU apps team.

Reduce overhead in the Attributor's reachability (intra-function)
analysis.

Improve AAIntraFnReachabilityFunction::isReachableImpl -
Replace per-query SmallPtrSet<BasicBlock*> visited set with a persistent
vector<unsigned> indexed by DominatorTree DFS numbers. Each basic block
has a unique DFSNumIn in [0, N), making it a perfect dense array index.
A monotonically increasing query ID avoids clearing the vector between
queries: a block is "visited" iff VisitedMap[DFSNumIn] == CurrentQueryID.
This eliminates per-query allocation, pointer hashing, and destruction
for BFS traversal.
Motivated by a representative workload where we were seeing ~318K
BFS traverals when multiple during LTO with many GPU kernels (e.g., Fortran
OpenMP offloading applications).
@z1-cciauto
Copy link
Copy Markdown
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants