IT Job TopCV Crawler

Overview

This repo crawls data about IT jobs from TopCV.vn (IT Jobs category)
Data can be crawled from a specific webpage or consecutive webpages

Requirements:

requests
beautifulsoup4

How to run

Custom run

In bash shell, type python3 crawler.py a b, where a, b are the index of webpage
This command will crawl data from consecutive webpages from page a to page b

Default Run

Use run.sh to start crawling
This bash scipt will execute simultaneously 14 thread
Each thread crawl data from 10 consecutive pages (1-9,10-19,20-29,...) and save to file naming recruit_a_b.json (so there are 14 files after all)

Result

Data is stored in this repo

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
crawler.py		crawler.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IT Job TopCV Crawler

Overview

Requirements:

How to run

Custom run

Default Run

Result

Author:

About

Releases

Packages

Languages

tienlonghungson/IT-Jobs-TopCV-Crawler

Folders and files

Latest commit

History

Repository files navigation

IT Job TopCV Crawler

Overview

Requirements:

How to run

Custom run

Default Run

Result

Author:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages