Skip to content

[feat] Introduce GlobalOrthogonalRegularizationLoss#3654

Merged
tomaarsen merged 8 commits intohuggingface:mainfrom
tomaarsen:feat/gor_loss
Feb 5, 2026
Merged

[feat] Introduce GlobalOrthogonalRegularizationLoss#3654
tomaarsen merged 8 commits intohuggingface:mainfrom
tomaarsen:feat/gor_loss

Conversation

@tomaarsen
Copy link
Copy Markdown
Member

@tomaarsen tomaarsen commented Feb 3, 2026

Hello!

Pull Request overview

  • Introduce GlobalOrthogonalRegularizationLoss

Details

This PR supersedes #3651 and implements #3622. Specifically, I was reviewing #3651 and looked into the actual implementation of the GOR vs EmbeddingGemma's variant. EG only uses the second moment term, but it's easiest for us to simply implement the GOR and add weights. In my review, I came to a simple implementation, which I'm proposing here. My apologies that this might override other work done by contributors.

I adapted a simple training script and trained these models:

I still have to rerun the final script to see if it's still correct as I made some changes re. mean and sum for each dataset column.
Edit: I've trained a new one here: https://huggingface.co/tomaarsen/mpnet-base-gooaq-infonce-gor And it still seems to work correctly. The benchmarks below use the -old model.

Some results between InfoNCE and InfoNCE + GOR on NanoBEIR (MSMARCO + NQ):

fp32:
base: 0.5204 NDCG@10
+gor: 0.5148 NDCG@10

int8:
base: 0.5082 NDCG@10
+gor: 0.5004 NDCG@10

binary:
base: 0.3038 NDCG@10
+gor: 0.3249 NDCG@10

There's only gains on the binary quantization in this setup, sadly, but those scores are too weak to be usable either way.

And some similarity values:

from sentence_transformers import SentenceTransformer

baseline_model = SentenceTransformer("tomaarsen/mpnet-base-gooaq-mnrl-baseline")
gor_model = SentenceTransformer("tomaarsen/mpnet-base-gooaq-infonce-gor")

query = "Which planet is known as the Red Planet?"
passages = [
    "Venus is often called Earth's twin because of its similar size and proximity.",
    "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
    "Jupiter, the largest planet in our solar system, has a prominent red spot.",
    "Saturn, famous for its rings, is sometimes mistaken for the Red Planet.",
]
base_query_embedding = baseline_model.encode(query)
base_passage_embeddings = baseline_model.encode(passages)
base_similarities = baseline_model.similarity(base_query_embedding, base_passage_embeddings)
print("Baseline similarities:", base_similarities)
# tensor([[0.3545, 0.7807, 0.5983, 0.6696]])

gor_query_embedding = gor_model.encode(query)
gor_passage_embeddings = gor_model.encode(passages)
gor_similarities = gor_model.similarity(gor_query_embedding, gor_passage_embeddings)
print("GOR similarities:", gor_similarities)
# tensor([[0.2902, 0.7518, 0.5637, 0.6127]])

As expected, the GOR model has lower similarity values as embeddings are placed further apart.
The core issue now is that this primarily helps with quantization, but evaluation with quantization is lacking in Sentence Transformers. That'll also be required if we're adding QAT (https://arxiv.org/abs/1712.05877).

  • Tom Aarsen

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces the GlobalOrthogonalRegularizationLoss, a regularization loss function that encourages embeddings to be well-distributed in the embedding space. The loss is designed to improve quantization robustness of sentence transformer models by penalizing high mean similarities and high variance in the embedding space.

Changes:

  • Adds GlobalOrthogonalRegularizationLoss implementation with configurable mean and second moment terms
  • Includes comprehensive documentation and a training example combining GOR with InfoNCE loss
  • Updates package documentation to include the new regularization loss category

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sentence_transformers/losses/GlobalOrthogonalRegularizationLoss.py Core implementation of the GOR loss with configurable weights and aggregation methods
sentence_transformers/losses/init.py Adds import and export of GlobalOrthogonalRegularizationLoss
examples/sentence_transformer/training/other/training_gooaq_infonce_gor.py Example training script demonstrating combined InfoNCE + GOR loss on GooAQ dataset
docs/sentence_transformer/loss_overview.md Adds new "Regularization" section documenting the GOR loss
docs/package_reference/sentence_transformer/losses.md Adds reference documentation for GlobalOrthogonalRegularizationLoss

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tomaarsen tomaarsen merged commit 2caeddf into huggingface:main Feb 5, 2026
10 of 17 checks passed
@daegonYu
Copy link
Copy Markdown
Contributor

daegonYu commented Feb 6, 2026

Using your InfoNCE + GOR loss, we observed significant performance improvements over the original InfoNCE loss on the NanoBEIR benchmark. In some subsets, InfoNCE + GOR outperformed the original InfoNCE, while in others, the original InfoNCE outperformed it. On average, InfoNCE + GOR outperformed the original InfoNCE.
Of course, this will vary depending on the training data and training configuration, but I wanted to emphasize that it may not be solely due to quantization.

@tomaarsen
Copy link
Copy Markdown
Member Author

Oh wow, thank you for testing this! I only did some rudimentary tests via https://huggingface.co/tomaarsen/mpnet-base-gooaq-infonce-gor with in-domain testing, where InfoNCE + GOR is about the same as pure InfoNCE, and I wanted to get #3655 with QAT & quantization-evaluation merged before really experimenting.

  • Tom Aarsen

@daegonYu
Copy link
Copy Markdown
Contributor

daegonYu commented Feb 6, 2026

I understand. So that's what you intended when you conducted the experiment. Thank you for your response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Introduce global orthogonal regularizer (GOR) loss for SentenceTransformer models

3 participants