Skip to content

fix: time travel when going back to interrupt node#7498

Open
Sydney Runkle (sydney-runkle) wants to merge 4 commits intomainfrom
sr/time-travel-bug
Open

fix: time travel when going back to interrupt node#7498
Sydney Runkle (sydney-runkle) wants to merge 4 commits intomainfrom
sr/time-travel-bug

Conversation

@sydney-runkle
Copy link
Copy Markdown
Collaborator

@sydney-runkle Sydney Runkle (sydney-runkle) commented Apr 13, 2026

Fix: Create fork checkpoint on subgraph time travel

Problem

When time-traveling to a subgraph checkpoint that has an interrupt, and then resuming, the resume would load the wrong state — it would pick up the original execution's latest checkpoint instead of the time-traveled one.

This happened because replaying from a subgraph checkpoint never created a new parent checkpoint. If the replay hit an interrupt before after_tick() ran, no checkpoint was written at all, so the parent's "latest" checkpoint was still the old one from the original execution.

Fix

When the loop detects a time-travel replay (not an update_state fork), it now eagerly writes a fork checkpoint at the start of the tick. This ensures:

  1. The parent thread's latest checkpoint points to the replayed state
  2. Subsequent Command(resume=...) calls find the correct checkpoint
  3. Stale INTERRUPT pending writes from the old checkpoint are cleared (they reference old task IDs)

Additionally, the subgraph replay logic now uses the parent checkpoint ID (from prev_checkpoint_config) when resolving subgraph checkpoints during time-travel, matching the existing behavior for update_state forks.

Checkpoint flow diagrams

Before fix: time travel leaves no fork

Original execution:

  C0 (start) --> C1 (step_a) --> C2 (ask_1 interrupt) --> C3 (resume) --> C4 (ask_2 interrupt) --> C5 (done)

Time travel to C2 (subgraph config):

  Replay runs... hits interrupt... no new checkpoint written.
  Parent "latest" is still C5.

  Command(resume="new_answer"):
    Loads C5 (wrong!) instead of the replayed C2 state.

After fix: time travel creates a fork

Original execution:

  C0 --> C1 --> C2 --> C3 --> C4 --> C5 (done)

Time travel to C2 (subgraph config):

  C0 --> C1 --> C2 --> C3 --> C4 --> C5
                  \
                   F1 (fork, source="fork")  <-- new latest

  Command(resume="new_answer"):
    Loads F1 (correct!) --> resumes from the right state.

  After full resume:

  C0 --> C1 --> C2 --> C3 --> C4 --> C5
                  \
                   F1 --> F2 (ask_1 result) --> F3 (ask_2 interrupt) --> F4 (done)

Manual fork via update_state (unchanged)

  C0 --> C1 --> C2 --> C3
                  \
                   U1 (source="update")  <-- created by update_state()

  This path already worked. The fix skips update/fork sources
  so existing behavior is preserved.

Changes

  • libs/langgraph/langgraph/pregel/_loop.py:
    • Extract is_time_traveling flag from the existing replay detection logic for reuse
    • Write a fork checkpoint (source="fork") eagerly at the start of a time-travel tick, before execution begins
    • Clear stale INTERRUPT pending writes when creating the fork (they reference old task IDs that won't match the new checkpoint)
    • Unify subgraph replay ID resolution: check source in ("update", "fork") instead of a separate is_time_traveling condition, since the new fork checkpoint now has source="fork"
  • libs/langgraph/tests/test_time_travel.py and test_time_travel_async.py: Added 4 new test cases (sync + async):
    • test_replay_from_before_interrupt_then_resume — replays from a checkpoint before an interrupt, resumes with a new answer, and verifies the full checkpoint history (source, next, values) at each stage
    • test_subgraph_time_travel_resume_from_first_interrupt — time-travels to a subgraph's first interrupt, resumes both interrupts with new answers, and verifies the fork creates a new branch while preserving the original
    • test_subgraph_time_travel_resume_from_second_interrupt — time-travels to a subgraph's second interrupt, resumes with a new answer, and verifies the first interrupt's original answer is preserved
    • test_subgraph_time_travel_checkpoint_pattern — verifies the fork checkpoint branches from the correct replay point and that the full checkpoint tree is correct after resume
  • libs/langgraph/tests/test_pregel.py / test_pregel_async.py: Updated existing test_weather_subgraph_state to account for the new fork checkpoint appearing in history (history length increases by 1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants