Web Scraping GitHub Topics

This is a Python web scraping project that extracts information from the GitHub Topics page (https://github.com/topics). The script retrieves a list of topics and, for each topic, gathers the topic title, topic page URL, and topic description. Additionally, for each topic, it collects the top 25 repositories within the topic from the topic page, and for each repository, it grabs the repository name, username, stars, and repository URL.

Installation

1.Ensure you have Python 3.x installed on your system.

2.Clone this repository to your local machine:

git clone https://github.com/rohit-s-s/scrapping-githhub-topic-repository

3.Navigate to the project directory:

cd scrapping_github_topics_repositories.ipynb

4.Install the required dependencies by running:

pip install requests
pip install beautifulsoup4
pip install pandas

Usage

run the web scraping script

The script will start scraping the GitHub Topics page and collect the required information for each topic and repository. The data will be stored in CSV files, with one CSV file for each topic. The CSV files will be saved in the data directory within the project folder.

CSV Format

The CSV files will follow the format as shown below:

Topic Title,Topic URL,Topic Description,Repository Name,Username,Stars,Repository URL

The CSV files will contain data for the top 25 repositories within each topic, along with their corresponding information.

Note GitHub's website structure may change over time, which could affect the web scraping process. The script may require adjustments if there are any significant changes to the website's layout or class names.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
scrapping_github_topics_repositories.ipynb		scrapping_github_topics_repositories.ipynb
topic.csv		topic.csv
topic_repo.csv		topic_repo.csv
webpage.html		webpage.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping GitHub Topics

Installation

Usage

CSV Format

About

Releases

Packages

Languages

rohit-s-s/scrapping-githhub-topic-repository

Folders and files

Latest commit

History

Repository files navigation

Web Scraping GitHub Topics

Installation

Usage

CSV Format

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages