Release Unsupervised Sentence Embedding Learning · UKPLab/sentence-transformers

Unsupervised Sentence Embedding Learning

This release integrates methods that allows to learn sentence embeddings without having labeled data:

TSDAE: TSDAE is using a denoising auto-encoder to learn sentence embeddings. The method has been presented in our recent paper and achieves state-of-the-art performance for several tasks.
GenQ: GenQ uses a pre-trained T5 system to generate queries for a given passage. It was presented in our recent BEIR paper and works well for domain adaptation for (semantic search)[https://www.sbert.net/examples/applications/semantic-search/README.html]

MSMARCO Dot-Product Models: We trained models using the dot-product instead of cosine similarity as similarity function. As shown in our recent BEIR paper, models with cosine-similarity prefer the retrieval of short documents, while models with dot-product prefer retrieval of longer documents. Now you can choose what is most suitable for your task.
MSMARCO MiniLM Models: We uploaded some models based on MiniLM: It uses just 384 dimensions, is faster than previous models and achieves nearly the same performance

MSMARCO Re-ranking-Models v2: We trained new significantly faster and significantly better CrossEncoder re-ranking models on the MSMARCO dataset. It outperforms BERT-large models in terms of accuracy while being 18 times faster. Trainingcode is available

You can now pass to the CrossEncoder class a default_activation_function, that is applied on-top of the output logits generated by the class.
You can now pre-process images for the CLIP Model. Soon I will release a tutorial how to fine-tune the CLIP Model with your data.