From e8747d9907f0f5255aa3c9ed597777c3878b95d9 Mon Sep 17 00:00:00 2001
From: ab490 <anooshkabajaj@gmail.com>
Date: Sun, 8 Mar 2026 22:48:54 -0400
Subject: [PATCH] Fix missing comma in GRPO equation

---
 chapters/en/chapter12/3b.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/chapters/en/chapter12/3b.mdx b/chapters/en/chapter12/3b.mdx
index a849c0b9d..e4d16a275 100644
--- a/chapters/en/chapter12/3b.mdx
+++ b/chapters/en/chapter12/3b.mdx
@@ -84,7 +84,7 @@ The final step is to use these advantage values to update our model so that it b
 
 The target function for policy update is:
 
-$$J_{GRPO}(\theta) = \left[\frac{1}{G} \sum_{i=1}^{G} \min \left( \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)} A_i \text{clip}\left( \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)}, 1 - \epsilon, 1 + \epsilon \right) A_i \right)\right]- \beta D_{KL}(\pi_{\theta} \|\| \pi_{ref})$$
+$$J_{GRPO}(\theta) = \left[\frac{1}{G} \sum_{i=1}^{G} \min \left( \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)} A_i, \text{clip}\left( \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)}, 1 - \epsilon, 1 + \epsilon \right) A_i \right)\right]- \beta D_{KL}(\pi_{\theta} \|\| \pi_{ref})$$
 
 This formula might look intimidating at first, but it's built from several components that each serve an important purpose. Let's break them down one by one.