[`feat`] Introduce GlobalOrthogonalRegularizationLoss by tomaarsen · Pull Request #3654 · huggingface/sentence-transformers

tomaarsen · 2026-02-03T15:20:46Z

Hello!

Pull Request overview

Introduce GlobalOrthogonalRegularizationLoss

Details

This PR supersedes #3651 and implements #3622. Specifically, I was reviewing #3651 and looked into the actual implementation of the GOR vs EmbeddingGemma's variant. EG only uses the second moment term, but it's easiest for us to simply implement the GOR and add weights. In my review, I came to a simple implementation, which I'm proposing here. My apologies that this might override other work done by contributors.

I adapted a simple training script and trained these models:

https://huggingface.co/tomaarsen/mpnet-base-gooaq-mnrl-baseline (MNRL/InfoNCE)
https://huggingface.co/tomaarsen/mpnet-base-gooaq-gor (GOR only)
https://huggingface.co/tomaarsen/mpnet-base-gooaq-infonce-gor-old (InfoNCE + GOR)

I still have to rerun the final script to see if it's still correct as I made some changes re. mean and sum for each dataset column.
Edit: I've trained a new one here: https://huggingface.co/tomaarsen/mpnet-base-gooaq-infonce-gor And it still seems to work correctly. The benchmarks below use the -old model.

Some results between InfoNCE and InfoNCE + GOR on NanoBEIR (MSMARCO + NQ):

fp32:
base: 0.5204 NDCG@10
+gor: 0.5148 NDCG@10

int8:
base: 0.5082 NDCG@10
+gor: 0.5004 NDCG@10

binary:
base: 0.3038 NDCG@10
+gor: 0.3249 NDCG@10

There's only gains on the binary quantization in this setup, sadly, but those scores are too weak to be usable either way.

And some similarity values:

from sentence_transformers import SentenceTransformer

baseline_model = SentenceTransformer("tomaarsen/mpnet-base-gooaq-mnrl-baseline")
gor_model = SentenceTransformer("tomaarsen/mpnet-base-gooaq-infonce-gor")

query = "Which planet is known as the Red Planet?"
passages = [
    "Venus is often called Earth's twin because of its similar size and proximity.",
    "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
    "Jupiter, the largest planet in our solar system, has a prominent red spot.",
    "Saturn, famous for its rings, is sometimes mistaken for the Red Planet.",
]
base_query_embedding = baseline_model.encode(query)
base_passage_embeddings = baseline_model.encode(passages)
base_similarities = baseline_model.similarity(base_query_embedding, base_passage_embeddings)
print("Baseline similarities:", base_similarities)
# tensor([[0.3545, 0.7807, 0.5983, 0.6696]])

gor_query_embedding = gor_model.encode(query)
gor_passage_embeddings = gor_model.encode(passages)
gor_similarities = gor_model.similarity(gor_query_embedding, gor_passage_embeddings)
print("GOR similarities:", gor_similarities)
# tensor([[0.2902, 0.7518, 0.5637, 0.6127]])

As expected, the GOR model has lower similarity values as embeddings are placed further apart.
The core issue now is that this primarily helps with quantization, but evaluation with quantization is lacking in Sentence Transformers. That'll also be required if we're adding QAT (https://arxiv.org/abs/1712.05877).

Tom Aarsen

Copilot

Pull request overview

This pull request introduces the GlobalOrthogonalRegularizationLoss, a regularization loss function that encourages embeddings to be well-distributed in the embedding space. The loss is designed to improve quantization robustness of sentence transformer models by penalizing high mean similarities and high variance in the embedding space.

Changes:

Adds GlobalOrthogonalRegularizationLoss implementation with configurable mean and second moment terms
Includes comprehensive documentation and a training example combining GOR with InfoNCE loss
Updates package documentation to include the new regularization loss category

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
sentence_transformers/losses/GlobalOrthogonalRegularizationLoss.py	Core implementation of the GOR loss with configurable weights and aggregation methods
sentence_transformers/losses/init.py	Adds import and export of GlobalOrthogonalRegularizationLoss
examples/sentence_transformer/training/other/training_gooaq_infonce_gor.py	Example training script demonstrating combined InfoNCE + GOR loss on GooAQ dataset
docs/sentence_transformer/loss_overview.md	Adds new "Regularization" section documenting the GOR loss
docs/package_reference/sentence_transformer/losses.md	Adds reference documentation for GlobalOrthogonalRegularizationLoss

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sentence_transformers/losses/GlobalOrthogonalRegularizationLoss.py

examples/sentence_transformer/training/other/training_gooaq_infonce_gor.py

sentence_transformers/losses/GlobalOrthogonalRegularizationLoss.py

daegonYu · 2026-02-06T09:27:37Z

Using your InfoNCE + GOR loss, we observed significant performance improvements over the original InfoNCE loss on the NanoBEIR benchmark. In some subsets, InfoNCE + GOR outperformed the original InfoNCE, while in others, the original InfoNCE outperformed it. On average, InfoNCE + GOR outperformed the original InfoNCE.
Of course, this will vary depending on the training data and training configuration, but I wanted to emphasize that it may not be solely due to quantization.

tomaarsen · 2026-02-06T10:28:38Z

Oh wow, thank you for testing this! I only did some rudimentary tests via https://huggingface.co/tomaarsen/mpnet-base-gooaq-infonce-gor with in-domain testing, where InfoNCE + GOR is about the same as pure InfoNCE, and I wanted to get #3655 with QAT & quantization-evaluation merged before really experimenting.

Tom Aarsen

daegonYu · 2026-02-06T23:02:39Z

I understand. So that's what you intended when you conducted the experiment. Thank you for your response.

tomaarsen added 5 commits February 3, 2026 16:07

Introduce GlobalOrthogonalRegularizationLoss

87ea682

If weight is 0.0 or None, exclude from output results

e4f4318

Add aggregation parameter

0d12661

Remove old output_dir

7e319ad

Fix InfoNCEGORLoss typing

5ce8bc3

tomaarsen linked an issue Feb 3, 2026 that may be closed by this pull request

Feature Request: Introduce global orthogonal regularizer (GOR) loss for SentenceTransformer models #3622

Closed

tomaarsen requested a review from Copilot February 5, 2026 12:21

Copilot started reviewing on behalf of tomaarsen February 5, 2026 12:22 View session

Update docs slightly

4212037

Copilot AI reviewed Feb 5, 2026

View reviewed changes

tomaarsen added 2 commits February 5, 2026 13:39

Add example, get_config_dict, address comments

3bf335b

Add second example

ff661d5

tomaarsen merged commit 2caeddf into huggingface:main Feb 5, 2026
10 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`feat`] Introduce GlobalOrthogonalRegularizationLoss#3654

[`feat`] Introduce GlobalOrthogonalRegularizationLoss#3654
tomaarsen merged 8 commits intohuggingface:mainfrom
tomaarsen:feat/gor_loss

tomaarsen commented Feb 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daegonYu commented Feb 6, 2026

Uh oh!

tomaarsen commented Feb 6, 2026

Uh oh!

daegonYu commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tomaarsen commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request overview

Details

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daegonYu commented Feb 6, 2026

Uh oh!

tomaarsen commented Feb 6, 2026

Uh oh!

daegonYu commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tomaarsen commented Feb 3, 2026 •

edited

Loading