This document explains how to perform inference of MNIST model using Apache MXNet Model Server (MMS) on Amazon EKS. MMS is a flexible and easy to use tool for serving deep learning models trained by MXNet.
Create EKS cluster using GPU.
In order to run MNIST inferene on EKS, we need to have Docker image and k8s manifest to create inference service backed by deployment.
-
You can either create a docker image from file
samples/mnist/inference/mxnet/Dockerfile
or use an existing imagergaut/deeplearning-mxnet:inference
.The MXNet model is bundled with the Docker image.
-
Create deployment and service for inference:
kubectl create -f samples/mnist/inference/mxnet/mxnet_eks.yaml
Check for the deployment to run:
kubectl get pods --selector=app=mnist-service -w NAME READY STATUS RESTARTS AGE mnist-service-7df4759f74-xhj5x 0/1 ContainerCreating 0 29s mnist-service-7df4759f74-xhj5x 1/1 Running 0 46s
-
Service is exposed as
clusterIP
. Use port forwarding so that the service can be accessed locally:kubectl port-forward \ `kubectl get pods --selector=app=mnist-service -o jsonpath='{.items[0].metadata.name}'` \ 8080:8080 &
-
Run the inference:
curl -X POST localhost:8080/predictions/mnist -T samples/mnist/inference/mxnet/utils/9.png % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 8042 100 56 100 7986 3105 432k --:--:-- --:--:-- --:--:-- 458k Prediction is [9] with probability of 92.52161979675293%
Run another inference:
curl -X POST localhost:8080/predictions/mnist -T samples/mnist/inference/mxnet/utils/7.jpg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 608 100 52 100 556 568 6081 --:--:-- --:--:-- --:--:-- 6109 Prediction is [7] with probability of 99.9999761581%
-
Install Java:
brew tap caskroom/versions brew update brew cask install java8
-
Setup a virtual environment:
pip install virtualenv --user export PATH=~/Library/Python/2.7/bin:$PATH # create a Python2.7 virtual environment virtualenv -p /usr/bin/python /tmp/pyenv2 # Enter this virtual environment source /tmp/pyenv2/bin/activate
Location of
virtualenv
binary may be different. This can be found usingpip show virtualenv
command. -
Install MXNet Model Server for CPU inference:
pip install mxnet-mkl
-
Install MXNet Model Server:
pip install mxnet-model-server
Model Archive is an artifact that MMS can consume natively. This archive package can be easily created with the trained artifacts. A copy of this archive is available at samples/mnist/inference/archived_model/mnist_cnn.mar
.
Skip rest of the section if you are using the pre-generated archive. This section explains how to generate MMS archive from the artifacts produced by model training.
-
Two artifacts were generated at end of the training - symbols file (
mnist_cnn-symbol.json
) and a params file (mnist_cnn-0000.params
). These artifacts are provided in the saved_model directory. Copy these artifacts to/tmp/models
directory.mkdir /tmp/models cp samples/mnist/training/mxnet/saved_model/mnist_cnn-* /tmp/models
-
model-archiver
tool is also installed as part of MMS installation. It can be manually installed:pip install model-archiver
-
Create a
model-store
location undertmp
:mkdir /tmp/model-store
-
Copy the ../../../samples/mnist/inference/mxnet/mnist_cnn_inference.py to
/tmp/models
directory:cp samples/mnist/inference/mxnet/mnist_cnn_inference.py /tmp/models
-
Generate model archive:
model-archiver \ --model-name mnist_cnn \ --model-path /tmp/models \ --export-path /tmp/model-store \ --handler mnist_cnn_inference:handle -f
This command creates an model archive called
mnist_cnn.mar
under/tmp/model-store
.
-
Update
~/.keras/keras.json
so that it looks like:{ "epsilon": 1e-07, "floatx": "float32", "image_data_format": "channels_last", "backend": "mxnet" }
This is to ensure that the
backend
ismxnet
andimage_data_format
ischannels_last
. -
Run MXNet Model Server:
mxnet-model-server \ --start \ --model-store samples/mnist/inference/mxnet/archived_model \ --models mnist=mnist_cnn.mar
The above command creates an endpoint called
mnist
.If you generated your own archive at
/tmp/model-store
, then make sure to specify that directory as parameter to--model-store
. -
In a new terminal, run the inference:
curl -X POST localhost:8080/predictions/mnist -T samples/mnist/inference/mxnet/utils/9.png % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 8042 100 56 100 7986 3105 432k --:--:-- --:--:-- --:--:-- 458k Prediction is [9] with probability of 92.52161979675293%
Run another inference:
curl -X POST localhost:8080/predictions/mnist -T samples/mnist/inference/mxnet/utils/7.jpg % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 608 100 52 100 556 568 6081 --:--:-- --:--:-- --:--:-- 6109 Prediction is [7] with probability of 99.9999761581%