Skip to content

Available data sources

Sal Hagen edited this page Aug 20, 2024 · 16 revisions

On this page we list the scripts for data sources. Some of these are fully functional, others are deprecated. Let us know if you have a new data source to add.

For datasource-specific information, check the README files in the folder of the respective data source.

Name Source Active Objects Local (Continuous scraper) Notes
4chan 4chan API Yes Comments + OPs Yes We wrote several scripts to import data from 4chan archives in the helper-scripts folder, e.g. this script to import csv dumps from 4plebs.
8chan 4chan API No (Archives only) Comments + OPs Yes 8chan is now defunct. We scraped live data when it was still online. Let us know in case you are interested in a database copy.
8kun 8chan API Yes Comments + OPs Yes Similar to the 4chan data source.
9gag ZeeSchuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
Bitchute Scraping No (issue) Videos + comments No Uses BitChute's web search endpoint, and scrapes data from the live website.
Douban Scraping Yes Comments + OPs No Small datasets can be collected; due to rate-limiting, large searches may not complete properly.
Douyin ZeeSchuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
Imgur ZeeSchuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
Import from tool Files from other tools Yes - No This to import files from tools like CrowdTangle.
Instagram ZeeSchuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
LinkedIn ZeeSchuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
Parler Parler API No Posts No Uses Parler's unofficial web API; requires a valid Parler login for usage.
Reddit Pushshift API No (Archive only) Comments + OPs No Data retrieved via Pushshift. Unavailable after Reddit's increase of API prices in July 2023.
Telegram Telegram API Yes Messages in open groups No Requires a personal API key, which can be obtained by anyone with a Telegram account here.
TikTok ZeeSchuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
Tumblr Tumblr API Yes Posts + reblogs No Requires API keys which you can obtain here
X/Twitter Twitter API & ZeeSchuimer Yes Tweets No Must be actively scraped via your browser and the Zeeschuimer plugin.
Usenet - Comments + OPs Yes Requires a local, static Usenet database. VK