In this project, we demonstrate how to use the Azure ML Python SDK to train a model to predict mortality due to heart failure using Azure AutoML and Hyperdrive services. After training, we are going to deploy the best model and evaluate the model endpoint by consuming it.
This trained and deployed predictive model can potentially impact clinical practice, becoming a new supporting tool for physicians when assessing the increased risk of mortality among heart failure patients.
- Project Set Up and Installation
- Dataset
- Automated ML
- Hyperparameter Tuning
- Automated ML and Hyperparameter Tuning Comparison
- Model Deployment
- Screen Recording
- Future Improvements
- Standout Suggestions
- Citation
To set this project, we require access to Azure ML Studio. The application flow for the project design is as follows:
- Create an Azure ML workspace with a compute instance.
- Create an Azure ML compute cluster.
- Upload the Heart Failure prediction dataset to Azure ML Studio from this repository.
- Import the notebooks and scripts attached in this repository to the Notebooks section in Azure ML Studio.
- All instructions to run the cells are detailed in the notebooks.
The Heart Failure Prediction dataset is used for assessing the severity of patients with heart failure. It contains the medical records of 299 heart failure patients collected at the Faisalabad Institute of Cardiology and at the Allied Hospital in Faisalabad (Punjab, Pakistan), during April–December 2015. The patients, who are aged 40 years and above, comprise of 105 women and 194 men who have all previously had heart failures.
The dataset contains 13 features, which report clinical, body, and lifestyle information and is use as the training data for predicting heart failure risks. Regarding the dataset imbalance, the survived patients (death event = 0) are 203
, while the dead patients (death event = 1) are 96
.
Additional information about this dataset can be found in the original dataset curators publication.
The task here is to predict mortality due to heart failure. Heart failure is a common event caused by Cardiovascular diseases (CVDs), and it occurs when the heart cannot pump enough blood to meet the needs of the body. The main reasons behind heart failure include diabetes, high blood pressure, or other heart conditions or diseases. By applying machine learning procedure to this analysis, we will have a predictive model that assesses the severity of patients with heart failure.
The objective of the task is to train a binary classification model that predict the target column DEATH_EVENT, which indicates if a heart failure patient will survive or not before the end of the follow-up period. This is based on the information provided by the 11 clinical features (or risk factors). The time feature is dropped before training since we cannot get a time value for new patients after deployment. The predictors variables are as follows:
- Age: age of patient (years)
- Anaemia: Decrease of red blood cells or hemoglobin. It has a value of 1 or 0 with 1 being the patient does have this condition
- Creatinine Phosphokinase: Level of the CPK enzyme in the blood (mcg/L)
- Diabetes: Is a 1 or 0 - whether the patient suffers from diabetes or not
- Ejection Fraction: Percentage of blood leaving the heart at each contraction (percentage)
- High Blood Pressure: Is a 1 or 0 - If the patient has hypertension
- Platelets: Platelets in the blood (kiloplatelets/mL)
- Serum Creatinine: Level of serum creatinine in the blood (mg/dL)
- Serum Sodium: Level of serum sodium in the blood (mEq/L)
- Sex: Woman or man (binary)
- Smoking: If the patient smokes or not
- Time: Follow-up period (days)
Target variable - Death Event: If the patient died during the follow-up period
Death Event = 1
for dead patients and Death Event = 0
for survived patients
The data for this project can be accessed in our workspace through the following steps:
-
Download the data from UCI Machine learning repository or the uploaded dataset in this GitHub repository
-
Register the dataset either using AzureML SDK or AzureML Studio using a weburl or from local files.
-
For this project, we registered the dataset in our workspace using a weburl in Azure SDK and retrieve the data from the csv file using the TabularDatasetFactory Class.
We have used following configuration for AutoML.
automl_settings = {
"experiment_timeout_minutes": 30,
"max_concurrent_iterations": 5,
"primary_metric" : 'AUC_weighted'
}
automl_config = AutoMLConfig(
compute_target=compute_target,
task="classification",
training_data=dataset,
label_column_name="DEATH_EVENT",
n_cross_validations=5,
debug_log="automl_errors.log",
**automl_settings
)
As shown in above code snippet, the AutoML settings are:
- The task for this machine learning problem is classification
- The primary_metric used is AUC weighted, which is more appropriate than accuracy since the dataset is moderately imbalanced (67.89% negative elements and 32.11% positive elements).
- n_cross_validation of 5 folds rather than 3 is used which gives a better performance.
- An experiment_timeout_minutes of 30 is specified to constrain usage.
- The max_concurrent_iterations to be executed in parallel during training is set to 5 so the process is completed faster.
The Best model is VotingEnsemble
with an AUC value of 0.9229042081949059
Model hyper-parameters used for VotingEnsemble are shown below:
The parameters for the model VotingEnsemble are described in the table below:
StandardScalerWrapper
Parameters | Values |
---|---|
class_name | StandardScaler |
copy | True |
module_name | sklearn.preprocessing._data |
with_mean | True |
with_std | False |
GradientBoostingClassifier
Parameters | Values |
---|---|
ccp_alpha | 0.0 |
criterion | mse |
init | None |
learning_rate | 0.021544346900318822 |
loss | deviance |
max_depth | 8 |
max_features | 0.5 |
max_leaf_nodes | None |
min_impurity_decrease | 0.0 |
min_impurity_split | None |
min_samples_leaf | 0.01 |
min_samples_split | 0.38473684210526315 |
min_weight_fraction_leaf | 0.0 |
n_estimators | 400 |
n_iter_no_change | None |
presort | deprecated |
random_state | None |
subsample | 0.43157894736842106 |
tol | 0.0001 |
validation_fraction | 0.1 |
verbose | 0 |
warm_start | False |
- Increase experiment timeout to allow for model experimentation.
- Remove some features from our dataset that are collinear or not important in making the decision.
AutoML Run Widget provides information about logs recorded in Run
AutoML experiment in Completed state with some model details
Best Model is VottingEnsemble with an AUC value of 0.92290
We use the SKLearn inbuilt Support Vector Machines (SVMs) for classification since it is capable of generating non-linear decision boundaries, and can achieve high accuracies. It is also more robust to outliers than Logistic Regression. This algorithm is used with the Azure ML HyperDrive service for hyperparameter tuning.
The hyperparameters tuned are inverse regularization strength -C and the kernel type -kernel with the search space defined for C as [0.5,1.0]
and kernel as [linear,rbf,poly,sigmoid]
. We used Random Parameter Sampling method to sample over discrete kernel types and returns a C value whose logarithm is uniformly distributed. Random sampling can serve as a benchmark for refining the search space to improve results.
Parameter search space and Hyperdrive configuration.
param_sampling = RandomParameterSampling( {
"--kernel": choice('linear', 'rbf', 'poly', 'sigmoid'),
"--C": loguniform(0.5, 1.0)
})
hyperdrive_run_config = HyperDriveConfig(
run_config=estimator,
hyperparameter_sampling=param_sampling,
policy=early_termination_policy,
primary_metric_name='AUC_weighted',
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
max_total_runs=20,
max_concurrent_runs=5
)
We applied a bandit early termination policy to evaluate our benchmark metric (AUC_weighted). The policy is chosen based on slack factor, avoids premature termination of first 5 runs, and then subsequently terminates runs whose primary metric fall outside of the top 10%. This helps to stop the training process after it starts degrading the AUC_weighted with increased iteration count, thereby improving computational efficiency.
The SVM model achieved an AUC value of 0.8333333333333334
with the following parameters:
Hyperparameter | Value |
---|---|
Regularization Strength (C) | 2.521868105479297 |
Kernel | sigmoid |
- We could improve this model by performing more feature engineering during data preparation phase.
- Adding more hyperparameters to be tuned can increase the model performance.
- Increasing max total runs to try a lot more combinations of hyperparameters, though this could have an impact on cost and training duration.
Hyperdrive Run Widget provides information about logs recorded in the Run
Hyperdrive experiment in Completed state with AUC value for each iteration
Best model: After successfully running the experiment, we have the best model with kernel type as Sigmoid and C value of 2.521
Key | AutoML | Hyperdrive |
---|---|---|
AUC_weighed | 0.92290 | 0.83333 |
Best Model | VotingEnsemble | SVM |
Duration | 39.16 minutes | 91.21 minutes |
As shown in diagram, the VotingEnsemble model of AutoML performed better with an AUC value of 0.9226 compared to 0.8167 in Support Vector Machines through HyperDrive. So we will deploy the AutoML model.
The following steps are required to deploy a model using Azure SDK:
- Register the dataset using SDK
- Find the best model using Automl
- Use the environment of automl's best_run or create a custom environment
- Use the score.py file generated when the model is trained for deployment and evaluation. The scoring script describes the input data the model endpoint accepts.
- Deploy the model using any of the deployment choices - ACI, AKS or local. For our project, we deploy the model as webservice using Azure Container Instance with
cpu_cores = 1
,memory_gb = 1
and application insights enabled. - For inferencing, pass the sample test data in json format to model endpoint to test the webservice. This will be processed by the score.py file to make successful rest api call.
Successful model deployment using ACI (Azure Container Instance) and Application Insights enabled
Sample input data to query the endpoint
data = {
"data":
[
{
'Age':75,
'anaemia':0,
'creatinine_phosphokinase':582,
'diabetes':0,
'ejection_fraction':20,
'high_blood_pressure':1,
'platelets':265000,
'serum_creatinine':1.9,
'serum_sodium':130,
'sex':1,
'smoking':0
}
]
}
Response from webservice: When we make an API call to our endpoint with sample data, we will see the inference output of the model
-
A better performing AutoML model can be detected if the experiment timeout is increased.
-
Addressing the dataset imbalance by applying Synthetic Minority Oversampling Technique (SMOTE) can improve the performance of Hyperdrive model.
-
Converting the model into platform supported formats such as ONNX, TFLITE etc. will help optimize inference or model scoring and achieve scalability.
Enabled application insights during model deployment in order to log useful data about the requests being sent to the webservice.
Davide Chicco, Giuseppe Jurman: "Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone". BMC Medical Informatics and Decision Making 20, 16 (2020) Article.