Skip to content

Commit

Permalink
V0.1.17 (#51)
Browse files Browse the repository at this point in the history
* Updated readme

* Updated gitignore

* Travis for all branches

* Fixed setup.py

* Updated gitignore

* blacked

* Updated travis

* Fixed travis branches

* Updated ci for PR checks

* Fixed Typing Issue

* Removed 3.10 build
  • Loading branch information
Oliver Borchers authored Nov 27, 2021
1 parent 616c3dd commit 72008d2
Show file tree
Hide file tree
Showing 21 changed files with 9,556 additions and 397 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
*.o
*.so
*.pyc
*.pyo
*.pyd

# Packages #
############
Expand Down
2 changes: 2 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
if: (type = push AND branch IN (master, develop)) OR (type = pull_request AND NOT branch =~ /no-ci/)
sudo: false

cache:
Expand All @@ -12,6 +13,7 @@ python:
- "3.6"
- "3.7"
- "3.8"
- "3.9"

branches:
only:
Expand Down
37 changes: 24 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,26 @@ Fast Sentence Embeddings (fse)

Fast Sentence Embeddings is a Python library that serves as an addition to Gensim. This library is intended to compute *sentence vectors* for large collections of sentences or documents.

**Disclaimer**: I am currently working full time. Unfortunately, I have yet to find time to add all the features I'd like to see. Especially the API needs some overhaul and we need support for gensim 4.0.0. If you want to support [fse](https://forms.gle/8uSU323fWUVtVwcAA), take a quick survey to improve it :-)
**Disclaimer**: I am working full time. Unfortunately, I have yet to find time to add all the features I'd like to see. Especially the API needs some overhaul and we need support for gensim 4.0.0.

I am looking for active contributors to keep this package alive. Please feel free to ping me at <o.borchers@oxolo.com> if you are interested.

Audience
------------

This package builds upon Gensim and is intenteded to compute sentence/paragraph vectors for large databases. Use this package if:
- (Sentence) Transformers are too slow
- Your dataset is too large for existing solutions (spacy)
- Using GPUs is not an option.

The average (online) inference time for a well optimized (and batched) sentence-transformer is around 1ms-10ms per sentence.
If that is not enough and you are willing to sacrifice a bit in terms of quality, this is your package.


Features
------------

Find the corresponding blog post(s) here:
Find the corresponding blog post(s) here (code may be outdated):

- [Visualizing 100,000 Amazon Products](https://towardsdatascience.com/vis-amz-83dea6fcb059)
- [Sentence Embeddings. Fast, please!](https://towardsdatascience.com/fse-2b1ffa791cf9)
Expand Down Expand Up @@ -57,20 +71,12 @@ Key features of **fse** are:
I regularly observe 300k-500k sentences/s for preprocessed data on my Macbook (2016).
Visit **Tutorial.ipynb** for an example.

Things I will work on next:

**[ ]** MaxPooling / Hierarchical Pooling Embedding

**[ ]** Approximate Nearest Neighbor Search for SentenceVectors




Installation
------------

This software depends on NumPy, Scipy, Scikit-learn, Gensim, and Wordfreq.
You must have them installed prior to installing fse. Required Python version is 3.6.
You must have them installed prior to installing fse.

As with gensim, it is also recommended you install a BLAS library before installing fse.

Expand Down Expand Up @@ -157,6 +163,11 @@ Model | [STS Benchmark](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark#Re
Changelog
-------------

0.1.17:
- Fixed dependency issue where you cannot install fse properly
- Updated readme
- Updated travis python versions (3.6, 3.9)

0.1.15 from 0.1.11:
- Fixed major FT Ngram computation bug
- Rewrote the input class. Turns out NamedTuple was pretty slow.
Expand Down Expand Up @@ -186,9 +197,9 @@ Proceedings of the 3rd Workshop on Representation Learning for NLP. (Toulon, Fra
Copyright
-------------

Author: Oliver Borchers <borchers@bwl.uni-mannheim.de>
Author: Oliver Borchers

Copyright (C) 2019 Oliver Borchers
Copyright (C) 2021 Oliver Borchers

Citation
-------------
Expand Down
24 changes: 14 additions & 10 deletions fse/__init__.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,24 @@
import logging

from fse import models

from .inputs import BaseIndexedList
from .inputs import IndexedList
from .inputs import CIndexedList
from .inputs import SplitIndexedList
from .inputs import SplitCIndexedList
from .inputs import CSplitIndexedList
from .inputs import CSplitCIndexedList
from .inputs import IndexedLineDocument
from .inputs import (
BaseIndexedList,
CIndexedList,
CSplitCIndexedList,
CSplitIndexedList,
IndexedLineDocument,
IndexedList,
SplitCIndexedList,
SplitIndexedList,
)

import logging

class NullHandler(logging.Handler):
def emit(self, record):
pass

logger = logging.getLogger('fse')

logger = logging.getLogger("fse")
if len(logger.handlers) == 0: # To ensure reload() doesn't add another one
logger.addHandler(NullHandler())
Loading

0 comments on commit 72008d2

Please sign in to comment.