This application runs our concatenated p-means model. It automatically downloads all required resources from the web and starts a webserver that allows researchers to generate sentence embeddings over a simple HTTP API.
- Python 2.7 (also works with python 3)
- Run
pip install -r requirements.txt
to install all required python packages
- You can run the application using
python main.py
. It will standardly load theen-de
model and start a webserver on port 5000 with a HTTP API that allows you to generate sentence embeddings. - Run
python main.py --help
to see all possible options.
When the application finished loading all word embeddings, just visit http://localhost:5000 in your webbrowser for further instructions and example python code.
It is important to mention that we standardly load a smaller version of fasttext with only the 300k most frequent tokens for reduced file size and faster downloads. To fully reproduce our cross-lingual experiments, use the full fasttext files (see comments in main.py).
The application can be extended with further word embeddings, p-means, and other moments.
The file main.py
defines a variable named embeddings
that holds a dictionary defining all models. For each model
there is a list of word embedding definitions. You can freely extend this list, or add new models by adding new entries
to the dictionary.
In the file sentence_embeddings.py
we define a variable named operations
that holds a dictionary that specifies all
available operations (compression strategies, p-means). You can add new entries here and define arbitrary
operations such as additional p-means, moments, or further compression and summarization strategies.
All of those newly defined operations (and word embeddings) will appear in the web interface.
We created the TF-Hub modules (see readme.md in project root) with the tfhub.py
. The behavior of the TF-Hub modules differs slightly from the python version because we do not automatically lowercase strings if the word embeddings are lowercased.