Skip to content

Python CLI tool to generate customized word clouds from documents, especially large documents such as dissertations, master and bachelor thesis

Notifications You must be signed in to change notification settings

lasupernova/Thesis-and-Whatsapp-Chat-Word-Cloud-Generator

Repository files navigation

Word Cloud Generator

Python CLI tool to generate customized word clouds from documents, especially large documents such as dissertations, master and bachelor thesis (or export your WhatsApp Chat and use it on your conversations with different people!). Depending on wordcloud and nltk
Don't want to work with the command line? Use the jupyter notebook instead (see instructions and examples below)


Example of word cloud, Low Height

Example text file for practice:

Saved as example.txt. This is a text file containing the book "The Count of Monte Cristo".

Usage:

python generate_cloud.py

By default text information is taken from a file called "doc.txt", so be sure to move a copy of your thesis to you working directory and to rename it to "doc.txt". Alternatively, use a command line argument to change the name of the input file.

Usage - Word Cloud from WhatsApp Chat Exports:

python generate_cloud.py - whatsapp

This will pre-process the WhatsApp chat export file, to exclude dates and other text-parts added by WhatsApp to generate export file (e.g. "Media omitted" text that is inserted inplace of media sent).


Customization:

A number of different parameters can be customized:

Parameter Command Line Argument Type
Name of input file -file_path string
Text color -hue integer
Stopwords -sw
(NOTE: these stopword will not replace
generic stopwords but will be added)
list
Background Color -bg string
Image Width (pixel) -w integer
Image Heigt (pixel) -height integer
Maximum number of words to display -maxwords integer
Ratio of words to display horizontally -h_ratio integer
(from 0-1)
Saturation -s integer
(from 0-100)
Lightness -l integer
(from 0-100)
File name to store output -o string
(NOTE: should end with '.png')
Words to replace in text -x1 string
(NOTE: can be multiple strings)
(NOTE: always needs to be used together with -x2)
Substitutes for words passed in -x1 -x2 string
(NOTE: can be multiple strings)
(NOTE: always needs to be used together with -x1)
WhatsApp export-file usage -whatsapp simply add "-whatsapp"
Use when a WhatsApp chat export file is used as text
Matrix Effect -matrix simply add "-matrix"
The program will then automatically ste all parameters for a matrix-like word cloud
(see below for example)

Example:
python generate_cloud.py -file_path my_thesis_final_version.txt -bg black -h_ratio 0.6 -o wordcloud_thesis.png

  • This example will take a text file named 'my_thesis_final_version.txt' and save the wordcloud to 'wordcloud_thesis.png'. The word cloud will have a black background and only 60% of the words will be displayed horizontally (and 40% vertically).


Alternative: Jupyter Notebook:

If you don't want to use the command line, you can use the Jupiter Notebook instead:

  • Install Jupyter Notebook
  • Download Github repository
  • Open Notebook
  • replace example.txt with the name of your text file / thesis (in the notebook); or save your file in the same folder as the jupyter notebook and rename it example.txt
  • go to Cell - click Run all
  • check you working directory: the word cloud image should be saved there now under a name similar to wc_Size1500_1000_hslColorH322 (unless you changed the parameter for the output)


Examples:

A few examples of different custom settings and the results:

  • Regular usage: python generate_cloud.py


    Let's change 'count' to 'Simon Basset' ( ...looking at you Bridgerton... ) and use a black background

  • Custom usage: python generate_cloud.py -x1 count -x2 Simon_Hastings -f example.txt -o bridgerton2.png -bg black

    I only replaced one word (count -> simon hastings), but multiple words can be replaced at the same time.
    E.g: -x1 count Monte_Cristo -x2 simon_hastings London changes "count" to "simon hastings" and "Monte Cristo" to "London".
    Note that words that belong together, such as "Monte Cristo", should be connected with an underscore.


  • Matrix usage: python generate_cloud.py -matrix

    Automatically created word cloud with matrix-like style. This specific word cloud was generated using the "-whatsapp" option using a WhatsApp chat export file and I used -x1/-x2 in order to censor names and addresses. You can still specify "-whatsapp", and the input (-f) and output (-o) files.


    Custom usage:
    * Left (saturation and lightness adjusted): python generate_cloud.py -s 25 -l 90

    * Right (allow for random word colors): python generate_cloud.py -hue None

About

Python CLI tool to generate customized word clouds from documents, especially large documents such as dissertations, master and bachelor thesis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published