GitHub - Shr2020/Deep-Learning-on-Kubernetes: Developing Container and Kubernetes artefacts to perform DL training and DL inference and hosting it on Kubernetes clusters.

Objective

Develop Container and Kubernetes artefacts perform DL training and DL inference hosting in IBM Kubernetes cluster.

Create the VPC on IBM Cloud
Create the k8 cluster
On the local VM, install all the necessary packages to run IBM cloud CLI, kubectl and minikube (to test locally). I have used the vagrant VM (that was used for the Kubernetes lab) to install all necessary packages mentioned in the slides.

Developed the deep learning model for MNIST data classification.
The files containing the code for training and inference of the model are train.py and inference.py.
The files containing code for the frontend are front.html and backend.html
Two Dockerfiles are used to create two containers: one for train and one for inference. The train container trains the model and saves the model weights. These weights are then used by the inference container and the prediction/classification is done.
The container images are pushed to the docker hub. (Repository: sjdocker3409/k8_dl)
Command For running the containers locally (without the Kubernetes), the working directory of the container was mapped to a local directory while docker was run for training. The same was done while the docker run for inference along with mapping the port of the container to the localhost port.
Then access the server at http://localhost:39000/
Commands used:
sudo docker run -it -v /home:/mnist mnist_train:latest
sudo docker run -it -v /home:/mnist -p 39000:9000 mnist_inference:latest

Once the containers were working as required. Tested the program locally on minikube.
Created for 4 yaml files : deployment.yaml, train.yaml, service.yaml, kustomization.yaml.
The train container was run using train.yaml. Create a Pod (kind:Pod) for training purposes. In the yaml file mounted a local directory to the directory where the model weights will save.
The inference container was run using deployment.yaml. Created a deployment for inference (kind: Deployment) and mounted the same local directory as in the above steps.
The service was created using service.yaml (kind: Service, type: LoadBalancer). This created a service and mapped the port of deployment to an external port so the URL can be accessed from outside.
Also created a kustomization.yaml.
To get the external IP address of service. Use the command : minikube service <servicename> --url
Can access the URL : http://<external-ip>:<node-port>/

Log in to IBM cloud and select your resource group.
Run the command: ibmcloud ks cluster config -c <cluster-id>
Now we can use kubectl on IBM cluster.
Create a new yaml file pvc.yaml. This one is to use the Persistent Volume Claim on IBM Cloud.
Create the train.yaml, inference.yaml, service.yaml and customization.yaml as before. In train.yaml and inference.yaml mount pvc, to be used as the shared volume between the containers.
Run the following commands in order:
cd <directory which contains all yaml files>
kubectl apply -f pvc.yaml. Wait till the pvc is up and ready.
kubectl apply -f train.yaml Wait till the pod has completed training.
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl get service. Get the external IP address. The port will the the “port” specified in the service.yaml.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
templates		templates
Dockerfile(inference)		Dockerfile(inference)
Dockerfile(train)		Dockerfile(train)
README.md		README.md
deployment.yaml		deployment.yaml
inference.py		inference.py
kustomization.yaml		kustomization.yaml
pvc.yaml		pvc.yaml
req.txt		req.txt
service.yaml		service.yaml
train.py		train.py
train.yaml		train.yaml