Skip to content

Fix missing comma in GRPO equation#1207

Open
ab490 wants to merge 1 commit intohuggingface:mainfrom
ab490:fix-grpo-equation
Open

Fix missing comma in GRPO equation#1207
ab490 wants to merge 1 commit intohuggingface:mainfrom
ab490:fix-grpo-equation

Conversation

@ab490
Copy link
Copy Markdown

@ab490 ab490 commented Mar 9, 2026

This PR fixes a missing comma in the GRPO objective equation in Chapter 12.

The comma is added after the first (A_i), clearly separating the two arguments of the min(...) function.

The corrected equation is:

$$J_{GRPO}(\theta) = \left[\frac{1}{G} \sum_{i=1}^{G} \min \left( \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)} A_i, \text{clip}\left( \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)}, 1 - \epsilon, 1 + \epsilon \right) A_i \right)\right]- \beta D_{KL}(\pi_{\theta} || \pi_{ref})$$

Fixes #1174

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing comma in the $J_{GRPO}$ equation

1 participant