In this project we use Microsoft Azure to configure a cloud based Machine Learning model. We also explore the model deployment as an HTTP REST API endpoint, Swagger API documentation, Apache benchmarking of the deployed endpoint and consumption of the endpoint using JSON documents as an HTTP POST request. Finally we see how to create, publish and consume a pipeline using Azure SDK.
The dataset used for this project is the Bank Marketing Dataset.
This data is related with direct marketing campaigns of a Portuguese banking institution (based on phone calls) that whether the person will subscribe to the bank term deposit or not. It contains input attributes such as age
, job
, marital status
, education
etc and output attribute y
which has values yes
or no
which states whether the client has subscribed to a term deposit or not.
The following diagram shows the overall architecture of the system.
-
In this step we enable authentication as it is crucial for the continuous flow of operations. Authentication types supported by Azure are Key- based, Token- based and Interactive.
-
We will also create a Service Principle (SP) and allow the SP to access the workspace we will be working with. Using a service principal is a great way to allow authentication while reducing the scope of permissions, which enhances security.
NOTE: As we are using Azure Account provided by Udacity, we don't need to perform this step.
In this step, we create an experiment using Automated ML, configure a compute cluster, and use that cluster to run the experiment.
- First we need to upload the Bank Marketing dataset (
bankmarketing_train.csv
) that is provided to Azure Machine Learning Studio.
Registered Datasets
Bank Marketing Dataset Uploaded
- Next, we setup a New Automated ML Run by selecting the Dataset we uploaded and creating a new compute cluster. Then we run the Experiment using Classification.
Auto ML Run Completed - The Run takes about 20 minutes to get completed.
Auto Ml Run Details
Auto ML Run Models - The dataset is run on various algorithms to find the best model that gives the highest Accuracy. After the experiment run completes, a summary of all the models and their metrics are shown, including explanations.
Auto ML Run Best Model - Voting Ensemble was the best model with 91.92% Accuracy.
In this step we deploy the Best Model using Azure Container Instance as it will allow us to interact with the HTTP API service and also interact with the model by sending data over POST requests.
Deployed Model Details
Now that the Best Model is deployed, we enable Application Insights and retrieve logging output to monitor the health of the deployed service.
To do so, we run the logs.py
file and this will dynamically authenticate to Azure, enable Application Insights and display the logs for deployed model.
The logs generated after running the script are -
The Application Insights Enabled
is now set to true
In this step, we will consume the deployed model using Swagger to interact with the HTTP REST API endpoint documentation.
For this we will first downlad the swagger.json
file that Azure provides for deployed models. Next we will run the swagger.sh
file to pull the latest swagger-ui docker image and run it on port 9000. Finally we run serve.py
script to serve the swagger.json
for our model on an HTTP server.
To interact with deployed web service's API resources, we go to localhost:9000 and change URL on Swagger UI to http://localhost:8000/swagger.json as shown below -
HTTP API GET Operation -
HTTP API POST Operation -
HTTP API Response for POST Operation -
Models -
Consume Model Endpoints
Now that the model is deployed, use the endpoint.py
script provided to interact with the trained model. In this step, we run the script, modifying both the scoring_uri
and the key
to match the key for our service and the URI that was generated after deployment.
Response received from running the script is as follows -
Benchmark
In this step we benchmark our model using Apache Bench to load-test our model. For this we run the benchmark.sh
file and receive the following output -
For this step, we will use aml-pipelines-with-automated-machine-learning-step.ipynb
Notebook to create, publish and consume a Pipeline. Publishing a Pipeline will allow us to interact with it over HTTP.
Pipeline Runs
Pipeline Endpoints
new-experiment
Running
pipeline-rest-endpoint
Running
new-experiment
Run Completed
new-experiment
Run Detail Widget
pipeline-rest-endpoint
Run Completed
pipeline-rest-endpoint
Run Detail Widget
Published Pipeline Overview showing the the REST endpoints and the status Active
Screen recording explaining the steps followed in the project can be viewed using this Link.
-
Class balancing - The dataset used for training the model was imbalanced with size of smallest class = 3692 out of 32950 training samples. This imbalanced data can lead to a falsely perceived positive effect of a model's accuracy because the input data has bias towards one class. Thus the training data can be made more balanced to remove this bias.
-
Using Deep Learning to get better Accuracy.