Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion chapters/en/chapter1/8.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ The prefill phase is like the preparation stage in cooking - it's where all the
2. **Embedding Conversion**: Transforming these tokens into numerical representations that capture their meaning
3. **Initial Processing**: Running these embeddings through the model's neural networks to create a rich understanding of the context

This phase is computationally intensive because it needs to process all input tokens at once. Think of it as reading and understanding an entire paragraph before starting to write a response.
This phase is computationally intensive because it needs to process all input tokens at once and it populates the *KV Cache* with the *Keys* and *Values* for all prompt tokens to avoid redundant math later. Think of it as reading and understanding an entire paragraph before starting to write a response.

You can experiment with different tokenizers in the interactive playground below:

Expand All @@ -86,6 +86,12 @@ The decode phase involves several key steps that happen for each new token:

This phase is memory-intensive because the model needs to keep track of all previously generated tokens and their relationships.

| Phase | Operation | GPU Utilization | Goal |
| :--- | :--- | :--- | :--- |
| **Prefill** | Parallel (All-at-once) | High (Compute-bound) | Build Cache + First Word |
| **Decoding** | Sequential (One-by-one) | Low (Memory-bound) | Finish the Sentence |


## Sampling Strategies

Now that we understand how the model generates text, let's explore the various ways we can control this generation process. Just like a writer might choose between being more creative or more precise, we can adjust how the model makes its token selections.
Expand Down