max_seq_length should not be larger than any options#3255
max_seq_length should not be larger than any options#3255amitport wants to merge 2 commits intohuggingface:mainfrom
Conversation
when loading an auto-model, max_seq_length is read directedly from huggingface and it cannot be overwritten easily.
|
Hello! This seems to be an issue only for the models where a This value is indeed seen as "user-provided", which has priority over any values from You can avoid this with: from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-small-en-v1.5", tokenizer_kwargs={"model_max_length": 32})
model.max_seq_length = 32
assert model.max_seq_length == 32, f"expected 32, but got {model.max_seq_length=}"
assert model[0].max_seq_length == 32, f"expected 32, but got {model[0].max_seq_length=}"but that too isn't ideal.
|
|
@tomaarsen I used the workaround and it's fine, but the current behavior is still a bug in IMHO (and a silent one, that may make models fail unexpectedly fro the user) |
|
Fair enough, I'll try to revisit this PR and see if there's a solid solution that doesn't break backwards compatibility, but also fixes this issue.
|
Hi,
when loading an auto-model, max_seq_length is read directedly from huggingface and it cannot be overwritten easily.
example:
This PR ensure that
max_seq_lengthis overwritten, even when it exists