Youtube Annotation Archive

YouTube annotations were removed around 15:00 UTC on January 15th, 2019. The tracker was taken down around a day later, so workers are no longer required, and the URL they connect to will no longer work. See here for more information about the future of this project.

For cloudrac3r's work, see README.md in the node folder.

Youtube Annotation Archive

Provides scripts for archiving YouTube Annotations. See the wiki for information about how it works.

Annotations on every YouTube video will be deleted forever on the 15th of January. The purpose of this project is to archive as much annotation data as possible before that happens.

The current process is to scrape as many channel IDs as possible, then to scrape video IDs from those channels, then to download annotation data for those videos.

If you would like to make sure specific channels are archived before the 15th, you can use this tool.

Usage

Installing and running a worker (Node.js):

With Docker:

Download the Dockerfile located in the /docker folder with

$ wget https://github.com/omarroth/archive/raw/master/docker/Dockerfile

Then in the same directory run the following command to build the image:

$ docker build -t archive .

Use the following commands to create a container with the image and run it to begin the archiving process:

$ docker create --name=archive-worker archive:latest
$ docker container start archive-worker

On Ubuntu:

# Install dependencies
$ sudo apt-get install curl python-software-properties
$ curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash -
$ sudo apt-get install nodejs gcc g++ make

$ git clone https://github.com/omarroth/archive
$ cd archive/node
$ npm install
$ cd worker
$ node index.js

With Heroku

Create a new Heroku app and point it to https://github.com/omarroth/archive on the branch "heroku", and trigger a manual deploy. You can do this by creating a Heroku account and visiting this link: https://dashboard.heroku.com/new?template=https://github.com/omarroth/archive/tree/heroku.

Enable automatic deploys to receive the latest updates automatically.

The webserver is just a placeholder — open the logs to see what's currently going on.

Installing and running a worker (Crystal):

On Ubuntu:

# Install dependencies
$ curl -sSL https://dist.crystal-lang.org/apt/setup.sh | sudo bash
$ sudo apt-get update
$ sudo apt-get install crystal libssl-dev libxml2-dev libyaml-dev libgmp-dev libreadline-dev librsvg2-dev

$ git clone https://github.com/omarroth/archive
$ cd archive
$ shards
$ crystal build src/worker.cr --release
$ ./worker -u https://archive.omar.yt -t 20

$ ./worker -h
    -u URL, --batch-url=URL          Master server URL
    -t THREADS, --max-threads=THREADS
                                     Number of threads for downloading annotations
    -h, --help                       Show this help

Contributors

Omar Roth - creator and maintainer
cloudrac3r - JavaScript developer

Name		Name	Last commit message	Last commit date
Latest commit History 205 Commits
config		config
docker		docker
node		node
spec		spec
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
app.json		app.json
setup.sh		setup.sh
shard.yml		shard.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Youtube Annotation Archive

Usage

Installing and running a worker (Node.js):

With Docker:

On Ubuntu:

With Heroku

Installing and running a worker (Crystal):

On Ubuntu:

Contributors

About

Releases

Packages

Contributors 5

Languages

License

omarroth/archive

Folders and files

Latest commit

History

Repository files navigation

Youtube Annotation Archive

Usage

Installing and running a worker (Node.js):

With Docker:

On Ubuntu:

With Heroku

Installing and running a worker (Crystal):

On Ubuntu:

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages