[feat] Introduce GlobalOrthogonalRegularizationLoss#3654
[feat] Introduce GlobalOrthogonalRegularizationLoss#3654tomaarsen merged 8 commits intohuggingface:mainfrom
feat] Introduce GlobalOrthogonalRegularizationLoss#3654Conversation
There was a problem hiding this comment.
Pull request overview
This pull request introduces the GlobalOrthogonalRegularizationLoss, a regularization loss function that encourages embeddings to be well-distributed in the embedding space. The loss is designed to improve quantization robustness of sentence transformer models by penalizing high mean similarities and high variance in the embedding space.
Changes:
- Adds GlobalOrthogonalRegularizationLoss implementation with configurable mean and second moment terms
- Includes comprehensive documentation and a training example combining GOR with InfoNCE loss
- Updates package documentation to include the new regularization loss category
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| sentence_transformers/losses/GlobalOrthogonalRegularizationLoss.py | Core implementation of the GOR loss with configurable weights and aggregation methods |
| sentence_transformers/losses/init.py | Adds import and export of GlobalOrthogonalRegularizationLoss |
| examples/sentence_transformer/training/other/training_gooaq_infonce_gor.py | Example training script demonstrating combined InfoNCE + GOR loss on GooAQ dataset |
| docs/sentence_transformer/loss_overview.md | Adds new "Regularization" section documenting the GOR loss |
| docs/package_reference/sentence_transformer/losses.md | Adds reference documentation for GlobalOrthogonalRegularizationLoss |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
examples/sentence_transformer/training/other/training_gooaq_infonce_gor.py
Show resolved
Hide resolved
|
Using your InfoNCE + GOR loss, we observed significant performance improvements over the original InfoNCE loss on the NanoBEIR benchmark. In some subsets, InfoNCE + GOR outperformed the original InfoNCE, while in others, the original InfoNCE outperformed it. On average, InfoNCE + GOR outperformed the original InfoNCE. |
|
Oh wow, thank you for testing this! I only did some rudimentary tests via https://huggingface.co/tomaarsen/mpnet-base-gooaq-infonce-gor with in-domain testing, where InfoNCE + GOR is about the same as pure InfoNCE, and I wanted to get #3655 with QAT & quantization-evaluation merged before really experimenting.
|
|
I understand. So that's what you intended when you conducted the experiment. Thank you for your response. |
Hello!
Pull Request overview
Details
This PR supersedes #3651 and implements #3622. Specifically, I was reviewing #3651 and looked into the actual implementation of the GOR vs EmbeddingGemma's variant. EG only uses the second moment term, but it's easiest for us to simply implement the GOR and add weights. In my review, I came to a simple implementation, which I'm proposing here. My apologies that this might override other work done by contributors.
I adapted a simple training script and trained these models:
I still have to rerun the final script to see if it's still correct as I made some changes re. mean and sum for each dataset column.
Edit: I've trained a new one here: https://huggingface.co/tomaarsen/mpnet-base-gooaq-infonce-gor And it still seems to work correctly. The benchmarks below use the
-oldmodel.Some results between InfoNCE and InfoNCE + GOR on NanoBEIR (MSMARCO + NQ):
There's only gains on the binary quantization in this setup, sadly, but those scores are too weak to be usable either way.
And some similarity values:
As expected, the GOR model has lower similarity values as embeddings are placed further apart.
The core issue now is that this primarily helps with quantization, but evaluation with quantization is lacking in Sentence Transformers. That'll also be required if we're adding QAT (https://arxiv.org/abs/1712.05877).