Skip to content

Unsupervised Sentence Embedding Learning

Compare
Choose a tag to compare
@nreimers nreimers released this 21 Apr 13:12
· 857 commits to master since this release

Unsupervised Sentence Embedding Learning

This release integrates methods that allows to learn sentence embeddings without having labeled data:

  • TSDAE: TSDAE is using a denoising auto-encoder to learn sentence embeddings. The method has been presented in our recent paper and achieves state-of-the-art performance for several tasks.
  • GenQ: GenQ uses a pre-trained T5 system to generate queries for a given passage. It was presented in our recent BEIR paper and works well for domain adaptation for (semantic search)[https://www.sbert.net/examples/applications/semantic-search/README.html]

New Models - SentenceTransformer

  • MSMARCO Dot-Product Models: We trained models using the dot-product instead of cosine similarity as similarity function. As shown in our recent BEIR paper, models with cosine-similarity prefer the retrieval of short documents, while models with dot-product prefer retrieval of longer documents. Now you can choose what is most suitable for your task.
  • MSMARCO MiniLM Models: We uploaded some models based on MiniLM: It uses just 384 dimensions, is faster than previous models and achieves nearly the same performance

New Models - CrossEncoder

New Features

  • You can now pass to the CrossEncoder class a default_activation_function, that is applied on-top of the output logits generated by the class.
  • You can now pre-process images for the CLIP Model. Soon I will release a tutorial how to fine-tune the CLIP Model with your data.