This repository contains a Jupyter Notebook (youtube_data_download.ipynb
) that demonstrates how to download data from YouTube using the YouTube Data API.
This Python script performs the following tasks:
- Retrieves channel information (name, ID, handle) for specified YouTube channels.
- Collects video data from these channels, including views, likes, comments, and other statistics.
- Processes and formats the data for SQL compatibility.
- Stores the collected data in a MySQL database.
The primary goal of this project is to serve as a portfolio piece and a learning exercise in working with APIs, data processing, and database management.
- Python 3.10.9
- MySQL 8.0.37
- YouTube Data API v3
Before you begin, ensure you have met the following requirements:
- You have installed Python 3.x.
- You have installed Jupyter Notebook.
- You have a Google account to access the YouTube Data API.
For those who want to get up and running quickly:
- Clone the repository:
git clone https://github.com/RockManRK/YouTubeDataCollector
- Install dependencies:
pip install -r requirements.txt
- Set up your MySQL database:
- Open the
schema.sql
file and replace{DATABASE_NAME}
with your desired database name. - Run the SQL script in your MySQL environment:
mysql -u your_username -p < schema.sql
- Open the
- Update the
.env
file with your database credentials and YouTube API key - Run the Jupyter notebook:
jupyter notebook youtube_data_download.ipynb
- Execute all cells in the notebook
For more detailed instructions, see the full installation and usage sections below.
YouTubeDataCollector/
│
├── .env.example # Example environment variable file
├── .gitignore # Git ignore rules
├── LICENSE # License file (MIT)
├── requirements.txt # Python dependencies
├── schema.sql # SQL script to set up the initial database
├── youtube_data_download.ipynb # Main Jupyter notebook with the code
└── README.md # This file
- Clone the repository:
git clone https://github.com/RockManRK/YouTubeDataCollector
- Navigate to the project directory:
cd YouTubeDataCollector
- Install the required Python packages:
pip install -r requirements.txt
-
In the project directory, you'll find a file named
.env.example
. This file contains template environment variables. -
Create a copy of this file and name it
.env
. You can do this in several ways:- On Unix-like systems (Linux, macOS), you can use the terminal:
cp .env.example .env
- On Windows, you can use the command prompt:
copy .env.example .env
- Alternatively, you can simply create a new file named
.env
and copy the contents of.env.example
into it using any text editor.
- On Unix-like systems (Linux, macOS), you can use the terminal:
-
Open the newly created
.env
file in a text editor. -
Update the values in the
.env
file with your specific details:- Replace
YOUR_YOUTUBE_API_KEY
with the API key you obtained from the Google Developers Console. - Update the database connection details (DB_HOST, DB_USER, DB_PASSWORD, DB_NAME) with your MySQL database information.
- Replace
-
Save the
.env
file.
This configuration file will be used by the script to access your YouTube API key and connect to your database.
If you haven't obtained a YouTube API key yet, follow these steps:
- Go to the Google Developers Console.
- Create a new project or select an existing one.
- Enable the YouTube Data API v3 for your project.
- Create credentials (API Key) for the YouTube Data API.
- Copy the API Key and add it to your
.env
file as described above.
For detailed instructions, refer to the YouTube Data API documentation.
This project requires a MySQL database. Follow these steps to set up the required database structure:
-
Open the
schema.sql
file in a text editor. -
Replace all occurrences of
{DATABASE_NAME}
with your desired database name. -
Run the SQL script in your MySQL environment. You can do this in several ways:
a. Via command line:
mysql -u your_username -p < schema.sql
Replace
your_username
with your MySQL username. You'll be prompted to enter your password.b. Or, if you prefer to enter your password directly in the command:
mysql -u your_username -pyour_password < schema.sql
Replace
your_username
andyour_password
with your MySQL credentials. Note that there is no space between-p
and your password.c. Alternatively, you can use a MySQL client like MySQL Workbench:
- Open MySQL Workbench and connect to your server
- Open the
schema.sql
file - Execute the script
This will create the necessary database and tables for the YouTubeDataCollector to function.
Note: Ensure that your MySQL server is running before executing these commands. If you encounter any permission issues, you may need to use sudo
(on Unix-like systems) or run your command prompt as an administrator (on Windows).
- Open the Jupyter Notebook:
jupyter notebook youtube_data_download.ipynb
- Run the cells in order to collect data from the specified YouTube channels and store it in your MySQL database.
When you run the notebook, you can expect the following:
- The script will start by executing a function to fetch details of one or more YouTube channels. You'll need to specify the names of the channels you want to collect data from. This function will retrieve the ID, Name, and Handle for each channel.
- Using the retrieved channel information, the script will then authenticate with the YouTube API using your provided key.
- For each channel, it will collect data on recent videos (views, likes, comments, etc.).
- The collected data will be processed and formatted for SQL compatibility.
- Finally, the data will be stored in your configured MySQL database.
Note: The process may take some time depending on the number of channels and videos being processed.
While not part of this repository, the collected data is visualized using Power BI. You can view the dashboard here: YouTube Channel Analytics Dashboard
This project is licensed under the MIT License - see the LICENSE file for details.
Davi Prata
- GitHub: RockManRK
- Email: rockmanrk@hotmail.com
If you have any questions, please open an issue or contact Me.
requirements.txt
: This file contains a list of Python packages required to run the project. Ensure you install these packages using the command provided in the Installation section..env.example
: This file serves as a template for your environment variables. Copy this file to.env
and update it with your specific configuration details.
- YouTube Data API documentation
- MySQL Connector/Python documentation