[feat] Add MultiVectorEncoder Support (a.k.a late-interaction models or ColBERT-style models) by AymenKallala · Pull Request #3614 · huggingface/sentence-transformers

AymenKallala · 2026-01-24T13:59:10Z

Summary

This PR introduces MultiVectorEncoder, a new model class for ColBERT-style multi-vector encoding in sentence-transformers. Unlike standard SentenceTransformer which produces a single embedding per text, MultiVectorEncoder produces multiple embeddings (one per token) and computes similarity via MaxSim (maximum similarity) between token embeddings.

Key Features

MultiVectorEncoder class: Extends SentenceTransformer with multi-vector encoding capabilities
LateInteractionPooling module: A pooling layer that preserves token-level embeddings with optional:
- Dimension projection (e.g., 768 → 128)
- Special token masking ([CLS], [SEP])
- L2 normalization per token
MaxSim similarity functions: Implementations of late interaction similarity computation
Query/Document encoding: Dedicated encode_query() and encode_document() methods with automatic prompt handling
Ranking: Built-in rank() method for document ranking

Changes

File	Lines	Description
`sentence_transformers/multi_vec_encoder/MultiVectorEncoder.py`	534	Main encoder class extending SentenceTransformer
`sentence_transformers/multi_vec_encoder/LateInteractionPooling.py`	197	Token-preserving pooling layer with projection
`sentence_transformers/multi_vec_encoder/similarity.py`	111	MaxSim similarity functions
`sentence_transformers/multi_vec_encoder/__init__.py`	12	Package exports
`sentence_transformers/__init__.py`	+2	Added MultiVectorEncoder export
`tests/multi_vec_encoder/test_multi_vec_encoder.py`	230	Comprehensive pytest tests

Usage Example

Option 1: Create from a pre-trained transformer

from sentence_transformers import MultiVectorEncoder

# Automatically creates Transformer + LateInteractionPooling pipeline
model = MultiVectorEncoder("bert-base-uncased")

# Encode queries and documents
query_embeddings = model.encode_query(["What is machine learning?"])
doc_embeddings = model.encode_document([
    "Machine learning is a subset of artificial intelligence.",
    "The weather is nice today.",
])

# Each embedding is a 2D tensor: [num_tokens, dim]
print(f"Query shape: {query_embeddings[0].shape}")  # [7, 128]
print(f"Doc shape: {doc_embeddings[0].shape}")      #[11, 128]

# Compute similarity scores using MaxSim
scores = model.similarity(query_embeddings, doc_embeddings)
print(f"Scores: {scores}")  # Shape: [1, 2]

Option 2: Create from custom modules

from sentence_transformers import MultiVectorEncoder
from sentence_transformers.multi_vec_encoder import LateInteractionPooling
from sentence_transformers.models import Transformer

# Create custom pipeline with specific configuration
transformer = Transformer("bert-base-uncased")
pooling = LateInteractionPooling(
    word_embedding_dimension=transformer.get_word_embedding_dimension(),
    output_dimension=128,      # Project to 128 dimensions
    normalize=True,            # L2-normalize each token
    skip_cls_token=False,      # Keep [CLS] token
    skip_sep_token=False,      # Keep [SEP] token
)

model = MultiVectorEncoder(modules=[transformer, pooling])

Document Ranking

from sentence_transformers import MultiVectorEncoder

model = MultiVectorEncoder("bert-base-uncased")

documents = [
    "Machine learning is a subset of artificial intelligence.",
    "The weather is nice today.",
    "Deep learning uses neural networks with many layers.",
]

# Rank documents by relevance to query
results = model.rank(
    query="What is machine learning?",
    documents=documents,
    top_k=2,
    return_documents=True,
)

for result in results:
    print(f"Score: {result['score']:.2f} - {result['text']}")

Similarity Scores

# Compute similarity for all pairs
queries = ["What is AI?", "How's the weather?"]
documents = ["AI is artificial intelligence.", "It's sunny outside."]

q_emb = model.encode_query(queries)
d_emb = model.encode_document(documents)

# score[i,j] = similarity(query[i], doc[j])
similarity_scores = model.similarity(q_emb, d_emb)
print(f"Similarity scores: {similarity_scores}")  # Shape: [2,2]

Pairwise Similarity

# Compute similarity for corresponding pairs only
queries = ["What is AI?", "How's the weather?"]
documents = ["AI is artificial intelligence.", "It's sunny outside."]

q_emb = model.encode_query(queries)
d_emb = model.encode_document(documents)

# Pairwise: score[i] = similarity(query[i], doc[i])
pairwise_scores = model.similarity_pairwise(q_emb, d_emb)
print(f"Pairwise scores: {pairwise_scores}")  # Shape: [2]

Future Work

Pre-trained model integration: Load existing ColBERT checkpoints (e.g., colbert-ir/colbertv2.0, Stanford ColBERT weights) directly via MultiVectorEncoder
Model card: Add MultiVectorEncoderModelCardData for proper model documentation
Training support: Add training related losses and evaluations

This is quite solid, quite reminiscent of PyLate. I'm quite interested in this architecture in Sentence Transformers, although I planned it after the #3554 refactor. This refactor introduces new Base... classes (Model, Trainer, DataCollator, etc.), and would simplify new architectures like multi-vector models. You may have already noticed that although subclassing SentenceTransformer is convenient, you also borrow some features that multi-vector models don't outright use (e.g. truncate_dim). The refactor changes that.

I think this is a very strong start though, and I'd be glad to work on top of this after #3554. For context, my current TODO is:

Release v5.3 very soon (in e.g. ~1 week) with some recent useful PRs like [feat] Add NO_DUPLICATES_HASHED: optional hashing for NoDuplicatesBatchSampler #3611
Release v5.4 as soon as possible afterwards with (almost) only [v5.4] Introduce cross-modality and multi-modality support; modularize CrossEncoder class #3554. This is a large refactor, but isn't intended to break anything for the average user.
Release v6.0 afterwards with multi-vector support (with multiple modalities), including training

I think sticking to that order is best for the project, so then I'll likely get back to this after v5.4 is merged. What do you think?

Tom Aarsen

AymenKallala · 2026-02-01T17:16:53Z

Hello!

This is quite solid, quite reminiscent of PyLate. I'm quite interested in this architecture in Sentence Transformers, although I planned it after the #3554 refactor. This refactor introduces new Base... classes (Model, Trainer, DataCollator, etc.), and would simplify new architectures like multi-vector models. You may have already noticed that although subclassing SentenceTransformer is convenient, you also borrow some features that multi-vector models don't outright use (e.g. truncate_dim). The refactor changes that.

I think this is a very strong start though, and I'd be glad to work on top of this after #3554. For context, my current TODO is:

Release v5.3 very soon (in e.g. ~1 week) with some recent useful PRs like [feat] Add NO_DUPLICATES_HASHED: optional hashing for NoDuplicatesBatchSampler #3611

Release v5.4 as soon as possible afterwards with (almost) only [feat] Introduce cross-modality and multi-modality support; modularize CrossEncoder class #3554. This is a large refactor, but isn't intended to break anything for the average user.

Release v6.0 afterwards with multi-vector support (with multiple modalities), including training

I think sticking to that order is best for the project, so then I'll likely get back to this after v5.4 is merged. What do you think?

Tom Aarsen

Sure! Thanks for giving it a first review. I am happy to keep working on it when it will be more of a priority.

AymenKallala added 15 commits January 23, 2026 11:46

base LateInteractionPooling module

dc4e8f9

init file

26db93e

maxsim and maxsim_pairwise functions

3caf697

rename base folder to multi_vec_encoder

131fa9b

move late interaction pooling into dir

7812639

move late interaction pooling into multi_vec_encoder dir

bb058a9

Base MultiVectorEncoder class

bb50e1e

init files

5424a46

concise docstrings and comments

0576dde

fix circular import bug

ac0cf2e

ruff format and linting

e0b2874

unit tests

b22ad09

bugfix in special_tokens masking.

4a71d25

docstring

9ef2df3

prepare_embeddings_for_similarity improvements

f78f775

AymenKallala marked this pull request as ready for review January 25, 2026 12:50

AymenKallala added 2 commits January 27, 2026 08:57

ruff pre-commit

c5f0395

Merge branch 'main' into feat/late-interaction-models-support

2444940

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add MultiVectorEncoder Support (a.k.a late-interaction models or ColBERT-style models)#3614

[feat] Add MultiVectorEncoder Support (a.k.a late-interaction models or ColBERT-style models)#3614
AymenKallala wants to merge 17 commits intohuggingface:mainfrom
AymenKallala:feat/late-interaction-models-support

AymenKallala commented Jan 24, 2026

Uh oh!

tomaarsen commented Jan 30, 2026 •

edited

Loading

Uh oh!

AymenKallala commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AymenKallala commented Jan 24, 2026

Summary

Key Features

Changes

Usage Example

Option 1: Create from a pre-trained transformer

Option 2: Create from custom modules

Document Ranking

Similarity Scores

Pairwise Similarity

Future Work

Related

Uh oh!

tomaarsen commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AymenKallala commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tomaarsen commented Jan 30, 2026 •

edited

Loading