multivac-ml

Pre-trained Apache Spark's ML Pipeline for NLP, Classification, etc.

Project Structure

models Offline ML Models (for downloads)
- models/word2vec (Word2Vec Model)
- models/nlp (Part of Speech Models)
demo Demo project

Facts and Figures

POS Tagger models

Enlgish POS tagger model (UD_English-EWT) Only en_ewt-ud-train.conllu file was used to train the model:

Precision, Recall and F1-Score against the test dataset en_ewt-ud-test.conllu

Tokens	Precision	Recall	F1-Score
25831	0.93	0.91	0.92

Precision, Recall and F1-Score against the training dataset en_ewt-ud-train.conllu

Tokens	Precision	Recall	F1-Score
63785	0.98	0.98	0.98

Precision is "how useful the POS results are", and Recall is "how complete the results are". Precision can be seen as a measure of exactness or quality, whereas recall is a measure of completeness or quantity. https://en.wikipedia.org/wiki/Precision_and_recall

The F1 score is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0. https://en.wikipedia.org/wiki/F1_score

Open Data

Multivac ML data: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/WSWU7K

Multivac Open Data: https://dataverse.harvard.edu/dataverse/multivac

Dataset Citation

Panahi, Maziyar;Chavalarias, David, 2018, "Multivac Machine Learning Models", https://doi.org/10.7910/DVN/WSWU7K, Harvard Dataverse, V2

Code of Conduct

This, and all github.com/multivacplatform projects, are under the Multivac Platform Open Source Code of Conduct. Additionally, see the Typelevel Code of Conduct for specific examples of harassing behavior that are not tolerated.

Name		Name	Last commit message	Last commit date
Latest commit History 311 Commits
data		data
demo/src/main		demo/src/main
ml/src		ml/src
models		models
project		project
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
download_conllu.sh		download_conllu.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

multivac-ml

Project Structure

Facts and Figures

POS Tagger models

Open Data

Dataset Citation

Code of Conduct

Copyright and License

About

Releases

Packages

Contributors 3

Languages

License

multivacplatform/multivac-ml

Folders and files

Latest commit

History

Repository files navigation

multivac-ml

Project Structure

Facts and Figures

POS Tagger models

Open Data

Dataset Citation

Code of Conduct

Copyright and License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages