This repository is deprecated. The most recent fork is maintained by researchers at the University of Arkansas.
University of Pittsburgh Center for Research on Media, Technology, and Health
This is the documentation for using the RITHM software framework to work with real-time Twitter data for public health research. This code is provided as-is, with no expectation that it will work exactly as you want it to. However, we will do our best to be responsive to reasonable questions/issues posted here, make updates, and provide additional documentation. This is an ongoing project and the repository (and documentation) will be updated with new developments.
The RITHM code has been tested on Windows and Linux systems. It is designed to run in a Python 3.x environment. In order to set up a RITHM implementation, you will need access to the Twitter API (to connect and collect tweets) and Git (to get the RITHM code from here to the machine that you are running it on). Please see our Getting Started Guide for additional details.
The streamer implements the Twython package to interface with the Twitter Streaming API. This allows for easy integration of technical updates if Twitter's API changes in the future. In order to access the Twitter API, you need to (1) have a Twitter account and (2) register a new application associated with that account. Please see our Getting Started Guide for additional details.
Once that is all set, please see documentation for running the RITHM streamer, in the ./streamer/ folder.
The parser re-formats raw Twitter data to human-readable format for coding and analysis. It includes features for in-depth search and retreival from raw data files, recoding emoji, and data sub-sampling. That documentation is in the ./parser/ folder.
If you use the RITHM software or resources for your research, please cite our development paper:
Colditz JB, Chu K, Emery SL, Larkin CR, James AE, Welling J, Primack BA. Toward real-time infovellience of Twitter health messages. American Journal of Public Health. 2018;108(8), 1009-1014. doi: 10.2105/AJPH.2018.304497
01 Sep. 2018
We received an new research allocation to use the Bridges infrastructure at Pittsburgh Supercomputing Center (XRAC-SBE180005; PI: Primack). Future publications from our group should acknowledge: "Technical infrastructure was supported through NSF awards ACI-1548562 & 1445606, at the Pittsburgh Supercomputing Center (PSC)."
24 Jul. 2018
New Twitter Developer rules will require additional documentation and review of applications for Twitter API access.
04 Apr. 2018
- Our research paper related to RITHM development has been accepted for publication at American Journal of Public Health. This paper provides practical considerations for enhancing the validity and reproducibility of Twitter content analysis.
01 Mar. 2018
- Our ongoing research is now funded through a grant from the US National Cancer Institute (R01-CA225773; PI: Primack).
07 Nov. 2017
- The Twitter platform has extended the maximum length of tweets from 140 to 280 characters. This has not affected the fidelity of RITHM data collection and makes negligible difference on overall raw data file sizes. However, this may impact the validity of analyses conducted across the transitional period.
01 Apr. 2016
- We received an initial start-up allocation to use the Bridges infrastructure at Pittsburgh Supercomputing Center (XRAC-TG-DBS160002; PI: Primack). Future publications from our group should acknowledge that this work "is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC)."