I built this to automate the process of scraping job listings from Indeed.com, making it easier to collect and analyze data on job postings in a specific location. This project leverages web scraping techniques using Selenium and JSON parsing with Python.
Indeed Job Scraper is designed to fetch job listings from Indeed.com based on specified criteria (e.g., sponsorship, Chicago, IL), then parse the extracted data into a more structured format (JSON) for further analysis. The tool includes rate limiting to prevent overloading the website and ensure smooth operation.
- Web Scraping: Utilizes Selenium to fetch job listings from Indeed.com.
- Rate Limiting: Includes a retry mechanism with delays to avoid overwhelming the website.
- JSON Output: Exports extracted data in JSON format for further processing.
- CSV Conversion: Optionally, parses JSON output into a CSV file.
- Python 3.x (preferably 3.9 or later)
- Selenium WebDriver (ChromeDriver)
- json and csv libraries
- Clone this repository using Git.
- Install required libraries using pip:
pip install selenium
- Download the ChromeDriver from here and add it to your system's PATH.
- Execute the
job_scraper_with_rate_limiting.py
script. - The tool will fetch job listings based on the specified criteria (sponsorship, Chicago, IL).
- It will parse extracted data into JSON format and save it to a file named
log_{timestamp}.json
.
- After running the scraper, execute the
parse_json_file_to_csv.py
script. - This will convert the JSON output from the previous step into a CSV file named
job_data_extended.csv
.
Contributions are welcome! If you'd like to enhance this project or add new features, please follow these steps:
- Fork this repository on GitHub.
- Make your changes in a new branch (e.g.,
feature/new-feature
). - Commit your changes with descriptive commit messages.
- Submit a pull request for review.
Indeed Job Scraper is released under the MIT License.
Indeed, web scraping, Selenium, rate limiting, JSON parsing, CSV conversion