Inference of MNIST using MXNet on Amazon EKS

This document explains how to perform inference of MNIST model using Apache MXNet Model Server (MMS) on Amazon EKS. MMS is a flexible and easy to use tool for serving deep learning models trained by MXNet.

Pre-requisite

Create EKS cluster using GPU.

Run inference using EKS

In order to run MNIST inferene on EKS, we need to have Docker image and k8s manifest to create inference service backed by deployment.

You can either create a docker image from file samples/mnist/inference/mxnet/Dockerfile or use an existing image rgaut/deeplearning-mxnet:inference.

The MXNet model is bundled with the Docker image.

Create deployment and service for inference:

kubectl create -f samples/mnist/inference/mxnet/mxnet_eks.yaml

Check for the deployment to run:

kubectl get pods --selector=app=mnist-service -w
NAME                             READY   STATUS              RESTARTS   AGE
mnist-service-7df4759f74-xhj5x   0/1     ContainerCreating   0          29s
mnist-service-7df4759f74-xhj5x   1/1     Running             0          46s

Service is exposed as clusterIP. Use port forwarding so that the service can be accessed locally:

kubectl port-forward \
   `kubectl get pods --selector=app=mnist-service -o jsonpath='{.items[0].metadata.name}'` \
   8080:8080 &

Run the inference:

curl -X POST localhost:8080/predictions/mnist -T samples/mnist/inference/mxnet/utils/9.png
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                             Dload  Upload   Total   Spent    Left  Speed
100  8042  100    56  100  7986   3105   432k --:--:-- --:--:-- --:--:--  458k
Prediction is [9] with probability of 92.52161979675293%

Run another inference:

curl -X POST localhost:8080/predictions/mnist -T samples/mnist/inference/mxnet/utils/7.jpg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   608  100    52  100   556    568   6081 --:--:-- --:--:-- --:--:--  6109
Prediction is [7] with probability of 99.9999761581%

Run inference using MXNet Model Server locally

Install MXNet Model Server

Install Java:

brew tap caskroom/versions
brew update
brew cask install java8

Setup a virtual environment:

pip install virtualenv --user
export PATH=~/Library/Python/2.7/bin:$PATH
# create a Python2.7 virtual environment
virtualenv -p /usr/bin/python /tmp/pyenv2
# Enter this virtual environment
source /tmp/pyenv2/bin/activate

Location of virtualenv binary may be different. This can be found using pip show virtualenv command.

Install MXNet Model Server for CPU inference:
```
pip install mxnet-mkl
```
Install MXNet Model Server:
```
pip install mxnet-model-server
```

Prepare model archive

Model Archive is an artifact that MMS can consume natively. This archive package can be easily created with the trained artifacts. A copy of this archive is available at samples/mnist/inference/archived_model/mnist_cnn.mar.

Skip rest of the section if you are using the pre-generated archive. This section explains how to generate MMS archive from the artifacts produced by model training.

Two artifacts were generated at end of the training - symbols file (mnist_cnn-symbol.json) and a params file (mnist_cnn-0000.params). These artifacts are provided in the saved_model directory. Copy these artifacts to /tmp/models directory.
```
mkdir /tmp/models
cp samples/mnist/training/mxnet/saved_model/mnist_cnn-* /tmp/models
```
model-archiver tool is also installed as part of MMS installation. It can be manually installed:
```
pip install model-archiver
```
Create a model-store location under tmp:
```
mkdir /tmp/model-store
```
Copy the ../../../samples/mnist/inference/mxnet/mnist_cnn_inference.py to /tmp/models directory:
```
cp samples/mnist/inference/mxnet/mnist_cnn_inference.py /tmp/models
```

Generate model archive:

model-archiver \
	--model-name mnist_cnn \
	--model-path /tmp/models \
	--export-path /tmp/model-store \
	--handler mnist_cnn_inference:handle -f

This command creates an model archive called mnist_cnn.mar under /tmp/model-store.

Run inference

Update ~/.keras/keras.json so that it looks like:
```
{
    "epsilon": 1e-07, 
    "floatx": "float32", 
    "image_data_format": "channels_last", 
    "backend": "mxnet"
}
```
This is to ensure that the backend is mxnet and image_data_format is channels_last.
Run MXNet Model Server:
```
mxnet-model-server \
--start \
--model-store samples/mnist/inference/mxnet/archived_model \
--models mnist=mnist_cnn.mar
```
The above command creates an endpoint called mnist.

If you generated your own archive at /tmp/model-store, then make sure to specify that directory as parameter to --model-store.

In a new terminal, run the inference:

curl -X POST localhost:8080/predictions/mnist -T samples/mnist/inference/mxnet/utils/9.png
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                             Dload  Upload   Total   Spent    Left  Speed
100  8042  100    56  100  7986   3105   432k --:--:-- --:--:-- --:--:--  458k
Prediction is [9] with probability of 92.52161979675293%

Run another inference:

curl -X POST localhost:8080/predictions/mnist -T samples/mnist/inference/mxnet/utils/7.jpg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   608  100    52  100   556    568   6081 --:--:-- --:--:-- --:--:--  6109
Prediction is [7] with probability of 99.9999761581%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mxnet.md

mxnet.md

Inference of MNIST using MXNet on Amazon EKS

Pre-requisite

Run inference using EKS

Run inference using MXNet Model Server locally

Install MXNet Model Server

Prepare model archive

Run inference

Files

mxnet.md

Latest commit

History

mxnet.md

File metadata and controls

Inference of MNIST using MXNet on Amazon EKS

Pre-requisite

Run inference using EKS

Run inference using MXNet Model Server locally

Install MXNet Model Server

Prepare model archive

Run inference