Skip to content

Simple set of Python tools for handling Twitter data in the JSONL (JSON Lines text) format.

License

Notifications You must be signed in to change notification settings

derekgreene/twitter-jsonl-tools

Repository files navigation

twitter-jsonl-tools

Simple set of Python tools for handling Twitter data in the JSONL (JSON Lines text) format, also called newline-delimited JSON. See http://jsonlines.org for more details on the format.

Dependencies

Tested with Python 3.6, and requiring the following packages, which are available via PIP:

For faster JSON decoding, the scripts optionally support the use of the UltraJSON package.

Basic Usage: Tweets

The tools used to process tweets expect one or more JSONL files as inputs, where each line contains a JSON-formatted tweet as retrieved from the Twitter API.

To get basic summary statistics for a JSONL file containing tweets:

python jsonl-tweet-stats.py sample/sample-tweets-500.jsonl

To get list of the most frequently-tweeting users for a JSONL file containing tweets:

python jsonl-tweet-authors.py sample/sample-tweets-500.jsonl 

To get list of the most frequently-mentioned users for a JSONL file containing tweets:

python jsonl-tweet-mentions.py sample/sample-tweets-500.jsonl 

To get list of the most frequently-appearing hashtags for a JSONL file containing tweets:

python jsonl-tweet-hashtags.py sample/sample-tweets-500.jsonl

To export all tweets to a simple CSV (comma-separated) format:

python jsonl-tweet-export.py sample/sample-tweets-500.jsonl -o sample/sample-tweets.csv

To generate a tab-separated CSV file containing pairwise hashtag cooccurrence frequencies for one or more JSONL files:

python jsonl-hashtag-cooccur.py sample/sample-tweets-500.jsonl -o hashtag-cooccurrences.csv

Basic Usage: Users

The tools used to process user data expect one or more JSONL files as inputs, where each line contains a JSON-formatted user profile data as retrieved from the Twitter API.

To export all user metadata to a simple CSV (comma-separated) format:

python jsonl-user-export.py sample/sample-users-50.jsonl -o sample/sample-users.csv

About

Simple set of Python tools for handling Twitter data in the JSONL (JSON Lines text) format.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages