Ancient greek word2vec

This is a latent space for ancient greek trained on 149883 sentences from First1K project.

The vocabulary was lemmatized (534258 words + 96443 words not lemmatizable). 10,25,50,75,100 and 300 dimensions latent space are provided (gr...vec). As a comparison another model is proposed (Nov22_RW) based on this repo. Models ending by mc3 only take into account words that are present more the 3 times.

Online demonstration

Running a binder instance

Graph Output sample

Display a sample graph : This page is an example of graph output. Nodes are double-clickable to query the dictionary.

Precomputed standalone

A precomputed app for 300 dimensions latent space and 10 closest words baed on the 20000 most frequent greek words is available but only allow to display the graph, not distances calculations. access to the app

Sense addition

The classical example king+woman-man=queen doesn't work properly with Fist1Kgreek dataset dataset maybe because queen (βασίλισσα) appears only 4 times. It works with Ryder Wishart's dataset (automatically selected on example)

Installation

Dev

clone this repo and install environment.yml

Production

You can build the docker container:

docker build -t yourtag/latentgreek .
or
docker buildx build --platform linux/amd64,linux/arm64 -t yourtag/latentgreek --push .
for apple silicon compatible

docker run -p 8888:8888 yourtag/latentgreek

This runs a server with the GUI.

References

Řehůřek, Radim, et Petr Sojka. « Software Framework for Topic Modelling with Large Corpora ». In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 45‑50. Valletta, Malta: ELRA, 2010.

Muellner, Leonard. "The Free First Thousand Years of Greek". Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution, edited by Monica Berti, Berlin, Boston: De Gruyter Saur, 2019, pp. 7-18 https://doi.org/10.1515/9783110599572-002

https://github.com/ryderwishart/ancient-greek-word2vec

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
docs		docs
js		js
resources		resources
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
dbsimgr.xlsx		dbsimgr.xlsx
environment.yml		environment.yml
gr10.vec		gr10.vec
gr100.vec		gr100.vec
gr20.vec		gr20.vec
gr50.vec		gr50.vec
gr75.vec		gr75.vec
greek100mc3.vec		greek100mc3.vec
greek10mc3.vec		greek10mc3.vec
greek200mc3.vec		greek200mc3.vec
greek20mc3.vec		greek20mc3.vec
greek300mc2CBOW.vec		greek300mc2CBOW.vec
greek300mc3.vec		greek300mc3.vec
greek50mc3.vec		greek50mc3.vec
greek75mc3.vec		greek75mc3.vec
grf8_300.vec		grf8_300.vec
grf8_300.vec.syn1neg.npy		grf8_300.vec.syn1neg.npy
grf8_300.vec.wv.vectors.npy		grf8_300.vec.wv.vectors.npy
index.ipynb		index.ipynb
nov2022RW.vec		nov2022RW.vec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ancient greek word2vec

Online demonstration

Running a binder instance

Graph Output sample

Precomputed standalone

Sense addition

Installation

Dev

Production

References

About

Releases

Packages

Languages

l0d0v1c/Ancient-greek-word2vec

Folders and files

Latest commit

History

Repository files navigation

Ancient greek word2vec

Online demonstration

Running a binder instance

Graph Output sample

Precomputed standalone

Sense addition

Installation

Dev

Production

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages