commentAnalysisSystem

This is a series of scripts dedicated to analyzing Internet comments about an undisclosed Canadian company.

Instructions:

Run lda.py on a .csv/.tsv/etc. file. The input table must be in the following format:

insert_date	text	language	...
2019-05-27 21:06:48	keep up the good work	en	...
2019-05-27 21:06:48	Vendez un Bar-B-Q déjà toute monté	fr	...
...	...	...	...

Afterwards, you can use
- keyword_analysis.py: corpus comparison based on the K-means algorithm, keyword search among comments using logical expressions or SpaCy rule-based matching;
- similarity_model.py: word2vec model training on the obtained corpus and the search among keywords that is based on it.

How to modify patterns.jsonl:

New entities for entity_recognition.py script can be added here. This file uses SpaCy pattern keys. For more information, see available token attributes and available labels.

Unicode symbols (such as è) must be written using their respective source code (in this case, \u00e8). Don't put any whitespaces, as they'll break the script.

Finally, if you wish to exclude a specific entity from SpaCy lemmatization (for example, you want "food_basics" to always stay plural), add "/l-excluded" to its id (see the file itself for some examples).

How to modify usuk2ca-dictionary.txt:

This dictionary is based on a British-American one from this site, and, though not comprehensive, it gets the job done.

New entries can be added in the non-canadian-spelling'\t'canadian-spelling format (see the file).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LDAvis		LDAvis
corpora		corpora
lda results (en)		lda results (en)
lda results (fr)		lda results (fr)
models		models
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
csv_rw.py		csv_rw.py
entity_recognition.py		entity_recognition.py
keyword_analysis.py		keyword_analysis.py
lda.py		lda.py
patterns.jsonl		patterns.jsonl
requirements.txt		requirements.txt
similarity_model.py		similarity_model.py
translation.py		translation.py
usuk2ca-dictionary.txt		usuk2ca-dictionary.txt
usuk2ca.py		usuk2ca.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

commentAnalysisSystem

Instructions:

How to modify patterns.jsonl:

How to modify usuk2ca-dictionary.txt:

About

Releases

Packages

Languages

MicAPic/commentAnalysisSystem

Folders and files

Latest commit

History

Repository files navigation

commentAnalysisSystem

Instructions:

How to modify patterns.jsonl:

How to modify usuk2ca-dictionary.txt:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages