Skip to content

MMU-based epoch interruption#12990

Draft
erikrose wants to merge 13 commits intobytecodealliance:mainfrom
erikrose:epoch-mmu
Draft

MMU-based epoch interruption#12990
erikrose wants to merge 13 commits intobytecodealliance:mainfrom
erikrose:epoch-mmu

Conversation

@erikrose
Copy link
Copy Markdown
Contributor

@erikrose erikrose commented Apr 8, 2026

This is an implementation of #1749, specifically @cfallin's roadmap, with the goal of reducing the overhead of checking for the end of epochs.

Paul ran some benchmarks on this (broadly agreeing with our real-world experiments) which tell us:

  • Current compare-against-a-deadline -Wepoch-interruption=y is a 14.4% hit versus doing nothing.
  • Doing only dead loads in function prologues and loop headers (which were all that was implemented in this patch at the time of the bench) brings that down to a 2.8% hit. There will be some additional hit from the signal handler that actually effects the task switch, but that's on the cold path.

The above numbers are from SpiderMonkey, which I deem the most representative benchmark.

Status:

  • Add unmapped-on-interrupt page and pointer to it in vmctx.
  • Add method for embedder to call to bring an epoch to a close.
  • Add DeadLoadWithContext Cranelift instruction, and use it.
  • Have DeadLoadWithContext emit metadata into compiled-artifact tables to tell the signal handler this is an interruption-point load.
  • Add logic to signal handler that, when seeing such a PC, updates state to redirect to the stub, saving the original PC (probably in the scratch register).
  • Add that stub, which saves all register state and invokes a hostcall with the recovered vmctx.

If the TLB shootdown arising from the frobbing of privs on the "interrupt page" proves too expensive, we can try a more indirect load instead, where, instead of messing with page privs, we mess with the address we're dead-loading from so it points to either a (statically) allowed or forbidden page. (Chris floated this idea at the 2026-04-08 Cranelift meeting.) Not many of the other mechanics need change.

…and a test that shows it makes it through.

TODO: There are other little corners where epoch_interruption comes up in the code where we should mention epoch_interruption_via_mmu_too.
…poch checks to loop headers as well.

Cache the interrupt page ptr in a local for speed, as we did with the epoch deadline.

Here is how I interpret the generated code in epoch-interruption-mmu.wat:
```
;; @001B                               v2 = load.i64 notrap aligned readonly can_move v0+8 ;; Skip over magic number (4b) and alignment (another 4b).
;; @001B                               v3 = load.i64 notrap aligned v2+16 ;; Get interrupt page ptr.
;; @001B                               v4 = load.i32 aligned readonly v3 ;; Read from page ptr.
```
These are all just 8-byte offset increases from the addition of my epoch interrupt page ptr field to `VMStoreContext`. This script helped me show it:

```python
"""Compare runs of - and + blocks of a diff, and assert that the only
differences between them are differences in hex and decimal numbers therein.
Further, assert that those differences are a rise of 8, representing the size of
the field I added.

Output the diff with the proven-correct regions resolved in favor of the +
lines. Any remaining diff lines are suspicious and should be manually examined.
"""

import re
from sys import argv

def is_diff_line(s, plus_or_minus):
    return bool(re.match(r"^ +" + "\\" + plus_or_minus, s))

def is_minus_line(s):
    return is_diff_line(s, "-")

def is_plus_line(s):
    return is_diff_line(s, "+")

def check_line_pairs(file_path):
    with open(file_path, 'r') as file:
        lines = file.readlines()

    i = 0
    while i < len(lines):
        if is_minus_line(lines[i]):
            minus_block = []
            while i < len(lines) and is_minus_line(lines[i]):
                minus_block.append(lines[i])
                i += 1

            plus_block = []
            while i < len(lines) and is_plus_line(lines[i]):
                plus_block.append(lines[i])
                i += 1

            if len(minus_block) != len(plus_block):
                print(" + BLOCK LENGTHS DIFFERED.")
                print("".join(minus_block))
                print("".join(plus_block))
                continue

            # Compare the two blocks line by line
            for line1, line2 in zip(minus_block, plus_block):
                # Extract numbers (both decimal and hexadecimal) from both lines
                numbers1 = [int(num, 16) if num.startswith("0x") else int(num)
                            for num in re.findall(r'0x[0-9a-fA-F]+|\d+', line1)]
                numbers2 = [int(num, 16) if num.startswith("0x") else int(num)
                            for num in re.findall(r'0x[0-9a-fA-F]+|\d+', line2)]

                # Check if the numbers differ by 0 or 8
                if len(numbers1) == len(numbers2) and all(n2 - n1 in (0, 8) for n1, n2 in zip(numbers1, numbers2)):
                    # It's just an increment (or nothing), so keep the new line:
                    print(re.sub(r"^( +)\+", r"\1 ", line2), end="")
                else:
                    print(line1, end="")
                    print(line2, end="")
        else:
            print(lines[i], end="")
            i += 1

check_line_pairs(argv[1])
```
Now it is initted only if the `epoch-interruption-via-mmu` option is on. And, because the only instantiation of `VMStoreContext` is in the course of instantiating a `StoreOpaque`, a decent place to dispose of it is in `Drop for StoreOpaque`.
Keep the guts of the page-protecting operation on `VMStoreContext` next to where the page is mapped and unmapped.
…with_context` instruction.

This will give us a convenient place to keep track of dead-load instruction locations and help us reserve the particular registers we need.

* Add `mem_flags_aligned_read_only` helper so we can construct aligned-and-read-only `MemFlags`es in ISLE.
* Add a `preg_rdi` constructor so we can refer to RDI in ISLE.

TODO: Reserve a scratch register to hold the return address.
This gives us a place to put regalloc constraints and to gather metadata (specifically, instruction locations).

Add a compile disas test to make sure `dead_load_with_context` is still emitting an acceptable dead load. It is. The only difference is that it's loading into `rdx` rather than `edx` now, probably due to my new regalloc constraints:

```
-      movl    (%rdx), %edx
+      movq    (%rdx), %rdx
```

Also...
* Make the new instruction a `.call()` in consistency with `stack_switch` being one.
* Move the RDI-specificity to the regalloc constraints, which lets us remove the preg_rdi ISLE constructor I had added.
* Reserve r10 as a scratch register.
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift:docs cranelift:meta Everything related to the meta-language. wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:config Issues related to the configuration of Wasmtime labels Apr 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

  • If you added a new Config method, you wrote extensive documentation for
    it.

    Details

    Our documentation should be of the following form:

    Short, simple summary sentence.
    
    More details. These details can be multiple paragraphs. There should be
    information about not just the method, but its parameters and results as
    well.
    
    Is this method fallible? If so, when can it return an error?
    
    Can this method panic? If so, when does it panic?
    
    # Example
    
    Optional example here.
    
  • If you added a new Config method, or modified an existing one, you
    ensured that this configuration is exercised by the fuzz targets.

    Details

    For example, if you expose a new strategy for allocating the next instance
    slot inside the pooling allocator, you should ensure that at least one of our
    fuzz targets exercises that new strategy.

    Often, all that is required of you is to ensure that there is a knob for this
    configuration option in wasmtime_fuzzing::Config (or one
    of its nested structs).

    Rarely, this may require authoring a new fuzz target to specifically test this
    configuration. See our docs on fuzzing for more details.

  • If you are enabling a configuration option by default, make sure that it
    has been fuzzed for at least two weeks before turning it on by default.


Details

To modify this label's message, edit the .github/label-messager/wasmtime-config.md file.

To add new label messages or remove existing label messages, edit the
.github/label-messager.json configuration file.

Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift:docs cranelift:meta Everything related to the meta-language. cranelift Issues related to the Cranelift code generator wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:config Issues related to the configuration of Wasmtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant