How to run locally

Clone the repository in a new folder, create a new virtual environment and install the package.

git clone https://github.com/mizzle-toe/find-your-dream-job.git
cd find-your-dream-job
pyenv virtualenv fydjob-local
pyenv activate fydjob-local
pip install .

Start the Docker service and run:

docker run -e PORT=8000 -p 4050:8000 vladeng/find-your-dream-job:final

Finally:

streamlit run fydjob/FindYourDreamJob.py

If you can't run Streamlit, try deleting the .streamlit folder in your home directory.

Install model

Install package in development mode:

pip install -e .

Package name is fydjob. Example:

import fydjob.utils as utils 
from fydjob.utils import tokenize_text_field

How to get the data locally (final version)

Pull master and merge with your branch.
Download jobs.db from Google_Drive/database.
Place the file in find-your-dream-job/fydjob/database.
cd in the main folder find-your-dream-job
python short-pipeline-run

To load the data:

from fydjob.NLPFrame import NLPFrame
df = NLPFrame().df

Pipeline

The long pipeline (which will be supported by our package) works like this:

IndeedScraper. Scrape jobs from Indeed.
IndeedProcessor. Load scraped jobs and Kaggle data. Integrate, remove job offers with identical text, and export as a dataframe.
Database. Populate the SQLLite database. Ensure not to add duplicates.
Database. Do a whole pass through the database, removing duplicates according to our set similarity measure (long process, up to 30 minutes).
NLPFrame. Export database to dataframe (ndf), add NLP processing columns (such as tokenized fields).
Apply NLP algorithms to dataframe, export results.

The short pipeline will start with stages 5-6. We will deploy our current database to the backend and stages 5-6 will be done on the server.

Upon scraping new job offers, they should be processed, inserted into the database, and a new similarity sweep should be executed.

Indeed Scraper

Scrapes job offers. To use it, download chromedriver from the Google Drive folders and place it in drivers/.

Supports Indeed API parameters. When not specified, the default parameters are:

start = 0 #the job offer at which to start
filter = 1 #the API tries to filter out duplicate postings
sort = 'date' #get the newest job offers (alternative is 'relevant')

To run the scraper:

pip install -r requirements.txt
python -m fydjob.IndeedScraper

Input job title, location, and a limit on the job offers to extract.

Output is saved in fydjob/output/indeed_scrapes/. Filename format is jobtitle_location_date_limit.

Preprocessor

Loads JSON files from fydjob/output/indeed_scrapes and Kaggle file from fydjob/output/kaggle. Joins the dataframes and applies basic preprocessing. To run as a script:

python -m fydjob.IndeedProcessor

To run as a class:

from fydjob.IndeedProcessor import IndeedProcessor
ip = IndeedProcessor()

Output is saved in fydjob/output/indeed_proc

Skills dictionary

The skills dictionary is assembled here. The file spreadsheet is downloaded as Excel file and placed into fydjob/data/dicts/skills_dict.xlsx. Then:

from fydjob import utils
utils.save_skills()     #extracts skills and saves them in JSON
utils.load_skills()     #loads the skills from JSON file

This is just the setup. If you haven't changed the pipeline, just run utils.load_skills to get the skills.

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
fydjob		fydjob
notebooks		notebooks
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
MANIFEST.in		MANIFEST.in
Makefile		Makefile
Procfile		Procfile
README.md		README.md
cloud_sql_proxy		cloud_sql_proxy
requirements.txt		requirements.txt
setup.py		setup.py
setup.sh		setup.sh
typescript		typescript

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to run locally

Install model

How to get the data locally (final version)

Pipeline

Indeed Scraper

Preprocessor

Skills dictionary

About

Releases

Packages

Contributors 3

Languages

mizzle-toe/find-your-dream-job

Folders and files

Latest commit

History

Repository files navigation

How to run locally

Install model

How to get the data locally (final version)

Pipeline

Indeed Scraper

Preprocessor

Skills dictionary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages