A simple Python π script to generate a square wordcloud βοΈ from one (or more) text file(s). Supporting both Python 2 and 3 (2.7+ and 3.4+).
Based on the great word_cloud module by @amueller.
1. Requirements
The usual module matplotlib is needed for the plotting, docopt is needed for the command line interface, and word_cloud is needed for the actual work (generating the cloud of words after reading the files).
The required Python (2 or 3) modules can be installed with pip, either directly:
# Directly:
sudo pip install matplotlib docopt word_cloud
Or with the requirements.txt file:
sudo pip install -r requirements.txt
Note: if ansicolortags is available, it will be used to print nice colors in the help and during the generation of word clouds.
Clone the repository, copy the script (generate-word-cloud.py) somewhere in your PATH (e.g., ~/.local/bin/
).
You can also just download the script itself:
$ wget https://raw.githubusercontent.com/Naereen/generate-word-cloud.py/master/generate-word-cloud.py
$ cp generate-word-cloud.py /path/to/a/directory/in/your/PATH/
Note: The script is also available from PyPI : pypi.python.org/pypi/generatewordcloud. You can install it using pip.
$ pip install generatewordcloud
$ # Or maybe you need sudo rights:
$ sudo pip install generatewordcloud
$ generate-word-cloud.py --help
Generate a wordcloud from two txt
files in the current directory, save it to wordcloud_txt.png
.
$ generate-word-cloud.py -o ./wordcloud_txt.png ./file1.txt ./file2.txt
Generate a wordcloud from the textfile hamlet.txt
(~ 8000 lines), saving to hamlet.png
:
$ generate-word-cloud.py -o ./hamlet.png ./hamlet.txt
(It should work on pretty big text files without any issue.)
Generate a wordcloud from the README.md and generate-word-cloud.py files of this very project, save it to wordcloud_meta.png
!
$ generate-word-cloud.py -o ./wordcloud_meta.png ./*.md ./*.py
- Support one or more input file(s), will cleanly skip any file it fails to find or fails to read,
- Custom output file, won't be overwritten (except with
-f
flag), - Nice command line interface (argparse powered). I switched to docopt after realizing how awesome it is!
- Has a command line option for every important parameter (max nb of words, width, height etc).
- Input filenames with spaces in their name were seen as several files (e.g.
this file.txt
), FIXED with the switch to docopt.
$ generate-word-cloud.py -h | --help
Usage:
generate-word-cloud.py [-s | --show] [-f | --force] [-o OUTFILE | --outfile=OUTFILE]
[-t TITLE | --title=TITLE] [-m MAX | --max=MAX]
[-w WIDTH | --width=WIDTH] [-H HEIGHT | --height=HEIGHT]
INFILE...
generate-word-cloud.py (-h | --help)
generate-word-cloud.py (-v | --version)
Options:
-h --help Show this help message and exit.
-v --version Show program's version number and exit.
-s --show Show the image but do not save it [default False].
-f --force Force to write the image, even if present (default is to ask before overwriting an existing file) [default False].
-o OUTFILE --outfile=OUTFILE
Filename for the generated image [default 'wordcloud.png'].
-t TITLE --title=TITLE
Title for the image [default None].
-m MAX --max MAX
Max number of words to display on the cloud word [default 150].
-w WIDTH --width WIDTH
Width of the generate image [default 400].
-H HEIGHT --height HEIGHT
Height of the generate image [default 300].
INFILE A text file to read.
- Start it, from this example,
- Run it on some interesting examples, embed them here (as images),
- Check on weird encodings? (i.e., not UTF-8). It works fine!
- Test it against π VERY large files (millions of lines) ? It works fine, slowly but fine.
- Test it against π LOTS of files (several thousands) ? It works fine, slowly but fine.
- Publish it on PyPI: it is available at pypi.python.org/pypi/generatewordcloud/
- Write a small article about it for my blog.
- Only tested on (X)Ubuntu (15.10), but it should work on other GNU/Linux distribution and Mac OS X (and probably Windows), if they support docopt and has both docopt and word_cloud installed.
Use the issue tracker to notify me of a bug!
There already is a lot of good cloud word generator online, e.g. wordle.net.
- I wanted a way to visualize the major keywords of Bash and Python (my two favorite programming languages) and of Markdown/Strapdown, reStructuredText and LaTeX (my favorite typeset documents system),
- The original project word_cloud seemed cool. And it is. Great job @amueller π !
- Clouds of words are interesting! And Python is awesome!
This plug-in is published under the terms of the GPLv3 License (file LICENSE), Β© Lilian Besson, 2016.