Introduction

With the advent of deep learning, new significant advances are being made at a rapid pace in the natural language processing (NLP) literature. However, for smaller companies, it can be difficult to integrate these new technologies without spending substantial resources on research and development. In our project we developed a Chat Log Language Analysis Suite to enable companies to readily understand the linguistic patterns of their user base. Many applications are integrated with social chatting systems, including video games, dating apps, and social media. With our product, even verbose chatters with thousands of logged chat messages can be summarized, evaluated, and compared with other users at a glance. Then, downstream use cases include flagging/suspending/banning toxic users, recommending advertisements or posts to users, and even using their “virtual” chatbot counterpart to predict their behavior to new inputs.

A presentation for this project can be found here.

Documentation & Testing Framework

Our documentation is located under "Docs/_build/html". Note that contents under nlp_suite/chatbot/style_transfer_paraphrase are neither tested nor documented, as we did not develop that code. For more information, please visit that project.

All tests can be run by:
pytest test_chat_log_suite.py

If you don't have a GPU, you exclude the tests which require one:
pytest -m "not chatbot_gpu" test_chat_log_suite.py

When running coverage with all tests, using coverage run --source=. -m pytest test_chat_log_suite.py we currently achieve a 84% code coverage:

One can also run coverage without the GPU tests with
coverage run --source=. -m pytest -m "not chatbot_gpu" test_chat_log_suite.py,
but this will result in a lower coverage.

Data

Full Dataset

The full dataset that we used is located at:
https://www.kaggle.com/jef1056/discord-data

This dataset was generated from scraping chat logs from many public discord servers, and contains about 110 million chat messages.

Example Inputs

The following are some example inputs containing chat logs that can be uploaded to our NLP suite dashboard.

User "muffins", discord server #298954459172700181
https://nlpsuite.s3-us-west-2.amazonaws.com/inputs/muffins.zip
User "Saysora", discord server #691542050578890802
https://nlpsuite.s3-us-west-2.amazonaws.com/inputs/Saysora.zip
User "circus", discord server #731254148678549595
https://nlpsuite.s3-us-west-2.amazonaws.com/inputs/circus.zip

Cached Data

The following are pretrained style transfer transformer weights for the chatbot. They can be placed in the "cached_user_data" directory.

User "muffins", discord server #298954459172700181
https://nlpsuite.s3-us-west-2.amazonaws.com/muffins.zip
User "Saysora", discord server #691542050578890802
https://nlpsuite.s3-us-west-2.amazonaws.com/Saysora.zip
User "circus", discord server #731254148678549595
https://nlpsuite.s3-us-west-2.amazonaws.com/circus.zip

Deployment Setup

With Chatbot

The NLP suite, with chatbot functionality, requires at least 6 GB of GPU memory. It can be set up is as follows.

Launch an AWS EC2 instance (we used a p2.xlarge instance with the "Ubuntu Server 18.04 LTS (HVM), SSD Volume Type" AMI), and SSH into it.
Install NVIDIA drivers
Install Anaconda
Clone this repo, and create a conda environment with the required dependencies:
conda env create --file deployment/nlp_suite_conda_env.yaml
Under the created conda environment, run a Jupyter Notebook, or alternatively, Voila:
jupyter notebook --no-browser --port=5666 --NotebookApp.token='' --NotebookApp.password='' --ip='0.0.0.0' --allow-root
or
voila dashboard.ipynb --no-browser --port 5666
Port forward from your EC2 instance to your local PC, in a linux or WSL terminal:
ssh -i path/to/private_key.pem -N -L 8081:localhost:5666 ubuntu@instance_ip_address
Now, you can view the NLP suite at http://localhost:8081/ on your local PC.

Without Chatbot

The NLP suite can also be run on an instance without a GPU (say, t2.micro) but there will be no chatbot functionality. To do this, simply follow the instructions in the "with chatbot" case above, but omit installing NVIDIA drivers.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
Docs		Docs
Graphics		Graphics
cached_user_data		cached_user_data
cluster_0_plots		cluster_0_plots
cluster_1_plots		cluster_1_plots
cluster_2_plots		cluster_2_plots
cluster_3_plots		cluster_3_plots
cluster_4_plots		cluster_4_plots
deployment		deployment
nlp_suite		nlp_suite
plots		plots
test_files		test_files
.coverage		.coverage
.coveragerc		.coveragerc
.gitignore		.gitignore
README.md		README.md
basic_info_widget.py		basic_info_widget.py
cluster_widget.py		cluster_widget.py
dashboard.ipynb		dashboard.ipynb
pytest.ini		pytest.ini
senti_analysis_widget.py		senti_analysis_widget.py
stylized_chatbot_widget.py		stylized_chatbot_widget.py
test_chat_log_suite.py		test_chat_log_suite.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Documentation & Testing Framework

Data

Full Dataset

Example Inputs

Cached Data

Deployment Setup

With Chatbot

Without Chatbot

About

Releases

Packages

Contributors 6

Languages

b7leung/Chat-Log-Statistical-Linguistic-Analysis

Folders and files

Latest commit

History

Repository files navigation

Introduction

Documentation & Testing Framework

Data

Full Dataset

Example Inputs

Cached Data

Deployment Setup

With Chatbot

Without Chatbot

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages