Skip to content

Fix nested branch miscompilation: uninitialized taped predicate when outer enzyme_const guard is false#2782

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/fix-enzyme-nested-branch-issue
Draft

Fix nested branch miscompilation: uninitialized taped predicate when outer enzyme_const guard is false#2782
Copilot wants to merge 2 commits intomainfrom
copilot/fix-enzyme-nested-branch-issue

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 9, 2026

When a function has an outer branch guarded by an enzyme_const bool and an inner branch on an active predicate, Enzyme's reverse pass could read an uninitialized tape cache for the inner predicate when the outer guard is false—causing wrong gradients or crashes.

Root cause

branchToCorrespondingTarget's 3-target optimization eagerly called lookupM(bi2->getCondition(), ...) (the inner predicate) before establishing control-flow guard on the outer predicate. When the outer guard is not taken, the inner predicate's tape slot is never written; reading it yields undef.

; Pattern that triggered the bug:
entry:   br i1 %fan, label %if.fan, label %merge   ; fan = enzyme_const
if.fan:  %cond = fcmp ogt double %a0, 0.0          ; active predicate → taped
         br i1 %cond, label %inner, label %merge
inner:   ...
merge:   ; 3-target merge — bug was here in reverse CFG reconstruction

Changes

  • GradientUtils.cpp — disable fast-path when replacePHIs != nullptr: The PHI-rewriting path cannot introduce new control flow; the eager cond2 materialization it required was the primary bug source. Falls back to the generic cache-based path.

  • GradientUtils.cpp — defer cond2 into staging block: For the replacePHIs == nullptr path, lookupM(bi2->getCondition(), ...) is now called after BuilderM.SetInsertPoint(staging), so it only executes on the branch where cond1 (outer guard) holds. The staging block is also registered in reverseBlockToPrimal to satisfy lookupM invariants.

  • CacheUtility.cpp — zero-init i1 predicate caches: Boolean branch-predicate caches (sublimits.size() == 0, type is i1) are initialized to false instead of undef. When the outer guard is false the inner cache is never written; false is the correct "not executed" sentinel.

  • Test updates (condtriload.ll, scase.ll): Expected IR updated for new cache-based gradient routing (replacing xor/and/select with icmp eq i8 on the switch cache).

  • insertsort.ll: undeffalse in the i1 cache phi, reflecting the zero-init fix.

  • New test nested_inactive_outer_active_inner.ll: Regression for the exact pattern—verifies that cond_unwrap is evaluated only inside staging (reachable only when fan=true) and that _cache.0 carries false (not undef) for the outer-guard-false path.

- GradientUtils.cpp: Disable 3-target fast-path when replacePHIs != nullptr,
  move cond2 lookup inside staging block so it's only evaluated under the
  outer guard, and register staging block in reverseBlockToPrimal.
- CacheUtility.cpp: Force zero-initialization for i1 predicate caches so
  unwritten cache slots default to false instead of undef.
- Update condtriload.ll, insertsort.ll, scase.ll tests for new IR output.
- Add nested_inactive_outer_active_inner.ll regression test for issue #2629.

Agent-Logs-Url: https://github.com/EnzymeAD/Enzyme/sessions/b9b6fadb-dbdb-4364-a768-95b3cd77720e

Co-authored-by: minansys <149007967+minansys@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix enzyme miscompilation of nested branch with outer guard Fix nested branch miscompilation: uninitialized taped predicate when outer enzyme_const guard is false Apr 9, 2026
Copilot AI requested a review from minansys April 9, 2026 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enzyme miscompiles nested branch with outer: uninitialized taped predicate causes inactive path to execute

2 participants