Fix nested branch miscompilation: uninitialized taped predicate when outer enzyme_const guard is false#2782
Draft
Fix nested branch miscompilation: uninitialized taped predicate when outer enzyme_const guard is false#2782
Conversation
- GradientUtils.cpp: Disable 3-target fast-path when replacePHIs != nullptr, move cond2 lookup inside staging block so it's only evaluated under the outer guard, and register staging block in reverseBlockToPrimal. - CacheUtility.cpp: Force zero-initialization for i1 predicate caches so unwritten cache slots default to false instead of undef. - Update condtriload.ll, insertsort.ll, scase.ll tests for new IR output. - Add nested_inactive_outer_active_inner.ll regression test for issue #2629. Agent-Logs-Url: https://github.com/EnzymeAD/Enzyme/sessions/b9b6fadb-dbdb-4364-a768-95b3cd77720e Co-authored-by: minansys <149007967+minansys@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix enzyme miscompilation of nested branch with outer guard
Fix nested branch miscompilation: uninitialized taped predicate when outer enzyme_const guard is false
Apr 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a function has an outer branch guarded by an
enzyme_constbool and an inner branch on an active predicate, Enzyme's reverse pass could read an uninitialized tape cache for the inner predicate when the outer guard is false—causing wrong gradients or crashes.Root cause
branchToCorrespondingTarget's 3-target optimization eagerly calledlookupM(bi2->getCondition(), ...)(the inner predicate) before establishing control-flow guard on the outer predicate. When the outer guard is not taken, the inner predicate's tape slot is never written; reading it yieldsundef.Changes
GradientUtils.cpp— disable fast-path whenreplacePHIs != nullptr: The PHI-rewriting path cannot introduce new control flow; the eagercond2materialization it required was the primary bug source. Falls back to the generic cache-based path.GradientUtils.cpp— defercond2intostagingblock: For thereplacePHIs == nullptrpath,lookupM(bi2->getCondition(), ...)is now called afterBuilderM.SetInsertPoint(staging), so it only executes on the branch wherecond1(outer guard) holds. Thestagingblock is also registered inreverseBlockToPrimalto satisfylookupMinvariants.CacheUtility.cpp— zero-init i1 predicate caches: Boolean branch-predicate caches (sublimits.size() == 0, type isi1) are initialized tofalseinstead ofundef. When the outer guard is false the inner cache is never written;falseis the correct "not executed" sentinel.Test updates (
condtriload.ll,scase.ll): Expected IR updated for new cache-based gradient routing (replacingxor/and/selectwithicmp eq i8on the switch cache).insertsort.ll:undef→falsein the i1 cache phi, reflecting the zero-init fix.New test
nested_inactive_outer_active_inner.ll: Regression for the exact pattern—verifies thatcond_unwrapis evaluated only insidestaging(reachable only whenfan=true) and that_cache.0carriesfalse(notundef) for the outer-guard-false path.