HR Analytics: Job Change of Data Scientists

This project is designed to predict whether an employee will leave the current job for a new company. For this, two models will be created using Azure ML:

Using AutoML to get the best algorithm
Using Logistic Regression and tuning the parameters using HyperDrive. The best model will then be deployed using Azure Container Instance which can later be consumed using REST Api.

Project Set Up and Installation

The project requires access to AzureML Studio.

Steps to be followed: 1.Using the dataset provided in this repository, create a new dataset in the Azure ML studio in default Blob Storage. 2. Create a new compute target. 2.Import the notebooks attached in this repository in the Notebooks section in Azure ML studio. 3.Run the autoML and hyperdrive notebooks using the details given in the notebooks. 4.Run endpoint.py file to consume the endpoint created and get back the predicted results.

Dataset

The dataset is taken from kaggle : "https://www.kaggle.com/arashnic/hr-analytics-job-change-of-data-scientists".

The task is to predict whether the employee will leave the current job or not, based on the following factors-

enrollee_id : Unique ID for candidate
city: City code
city_ development _index : Developement index of the city (scaled)
gender: Gender of candidate
relevent_experience: Relevant experience of candidate
enrolled_university: Type of University course enrolled if any
education_level: Education level of candidate
major_discipline :Education major discipline of candidate
experience: Candidate total experience in years
company_size: No of employees in current employer's company
company_type : Type of current employer
lastnewjob: Difference in years between previous job and current job
training_hours: training hours completed
target: 0 – Not looking for job change, 1 – Looking for a job change

Accessing the Data

The data can be accessed by downloading the data into local server and then uploading it to Dataset subsection of Microsoft AzureML.

Automated ML

Automated Machine Learning is the process of automating the time-consuming, iterative tasks of ML model development. It allows to build the models with high scale efficiency & productivity all while sustaining the model quality. In case of classification problem, many models such as XGBoost, RandomForest, StackEnsemble, VotingEnsemble etc. are compared

AutoML Configuration used for this project:

The task is Binary classification, hence we use 'accuracy' as primary metric.
Cross validation of 6 folds is choosen, as it gave better accuracy than 3 or 4 fold.
Iterations are processed concurrently so as to speed up our training time.
Early stopping is enabled to prevent overfitting.
Experiment timeout is set to be 30 minutes.
Featurization parameter is set to be "auto" for auto feature scaling.

Results

After comparing 37 algorithms, the Best model obtained is Voting Ensemble with an Accuracy of 80.17%

The screenshot of the details of various algorithms is shown below:

Best Run model-ID and accuracy, along with other parameters:

The model can be imporved by increasing the number of iterations or trying for various cross-validation folds. Deep learning/neural network based classification can also be used for better results.

Hyperparameter Tuning

Since the problem involved Binary classification, Logistic Regression has been used as it is simple to train and works well as compared to other complex algorithms. The two parameters: '--C' (Inverse of regularization strength. Smaller values cause stronger regularization) and '--max_iter' (Maximum number of iterations to converge) are selected for tuning using HyperDrive.

The choice used for C are (0.001, 0.01, 0.1, 1, 10, 100, 200), while that for max iterations are (50,100,150,200,250,300).

Results

Here's the screenshot of best results and optimized parameters obtained using HyperDrive: Rundetails Widget:

Model Deployment

The model choosen with AutoML is choosen for deployment: Azure Container Instance is used for deployment of the model as webservice. The details for method of deployment can be found in automl.ipynb under Model Deployment section.

The number of cpu cores and memory for the web service has been set to 1 and 1GB respectively.

We can see the deployment state set as healthy below, stating that model has deployed successfully. This model can now be consumed using REST Api by sending HTTP requests to it.

Now we can consume the endpoint using scoring URL genereated after deployment. The sample input to the endpoint is as below. This can also be found in endpoint.py file of the repository.

Result

Screen Recording

Here's a link of Screencast demonstrating the consuming of the Deployed model: https://1drv.ms/u/s!Avt8pJRrCCqEhmNaZOPPxJfcpQlh?e=8Y7BYf

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Dataset		Dataset
Screenshots		Screenshots
env		env
environments		environments
outputs		outputs
README.md		README.md
automl.ipynb		automl.ipynb
automl.log		automl.log
endpoint.py		endpoint.py
hyperparameter_tuning.ipynb		hyperparameter_tuning.ipynb
score.py		score.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HR Analytics: Job Change of Data Scientists

Project Set Up and Installation

Dataset

Accessing the Data

Automated ML

Results

Hyperparameter Tuning

Results

Model Deployment

Result

Screen Recording

About

Releases

Packages

Languages

himanimadaan/Model_Training_and_Deployment_using_AzureML

Folders and files

Latest commit

History

Repository files navigation

HR Analytics: Job Change of Data Scientists

Project Set Up and Installation

Dataset

Accessing the Data

Automated ML

Results

Hyperparameter Tuning

Results

Model Deployment

Result

Screen Recording

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages