github-statistics is a workflow repository designed to pull data from the GitHub Repositories API and GitHub Users API on a regularly scheduled basis to generate distribution statistics based on a subset of GitHub early repositories and users.
https://docs.google.com/spreadsheets/d/1HBSwxr0jkUoMulQxyVTC81YHN2mr2lUZbc8kkdmnQWY/edit?usp=sharing
As of 2021, GitHub has over 73 million registered users. The github-users.db
SQLite database in this repository includes the first 1.5 million registered
users. It reflects 15 CI runs, pulling 100,000 users per run, compressed with
Zstandard, the same compression algorithm GitHub uses for actions/cache@v3
.
The planned studies to be produced by this repository will be bounded by GitHub repository limits in order to follow recommendations set out by the Managing large files article. 1.5 million users is the maximum amount of users that can fit in a full series of 100,000 user inserts after compressed with Zstandard.
As of Jun 17 2022, github-statistics adds repositories.
Note: Do not use Git LFS. It is not possible to remove Git LFS objects from a repository without deleting and recreating the repository.
github-repositories.db
github-users.db
repositories
NEW
GitHub repositories as listed byGET /repositories
repositories_stargazers
NEW
GitHub repositories fromrepositories
and their stargazer countsusers
GitHub users as listed byGET /users
users_followers
GitHub users fromusers
and their follower counts
zstd -d github-users.tzst
tar xf github-users.tar
tar --use-compress-program zstd -xf github-users.tzst
GNU General Public License v2.0