Skip to content
This repository has been archived by the owner on Feb 6, 2023. It is now read-only.

Latest commit

 

History

History
76 lines (55 loc) · 2.77 KB

README.md

File metadata and controls

76 lines (55 loc) · 2.77 KB

chainer-cfn: Cloudformation Template for ChainerMN on AWS

This template automates to build ChainerMN cluster on AWS. The overview of AWS resources to be created by this template are below:

  • VPC and Subnet where cluster places (you can configure existing VPC/Subnet)
  • S3 Bucket for sharing ephemeral ssh-key which is used to communicate among MPI processes in the cluster
  • Placement group for optimizing network performance
  • ChainerMN cluster which consists
    • 1 master EC2 instance
    • N (>=0) worker instnaces (via AutoScalingGroup)
    • chainer user to run mpi job in each instance
    • hostfile to run mpi job in each instance
    • All the instances are launched from Chainer AMI
  • (Option) Amazon Elastic Filesystem (you can configure existing filesystem)
    • This is mounted on cluster instances automatically to share your code and data.
  • Several required SecurityGroups, IAM Role

Please see template/main.py for detailed resource definitions.

The Latest Published Template

Quick Start

Please also refer to our blog: ChainerMN on AWS with CloudFormation

launch stack

Development Manual

How to build a template

make build

How to test

# Configure AWS account properly first.

# this will create a stack via a template you built.
make create-stack TEST_STACK=YOUR_TEST_STACK_NAME KEY_PAIR_NAME=YOUR_KEY_PAIR_NAME

# perform ChainerMN's train_mnist.py
make e2e-test TEST_STACK=YOUR_TEST_STACK_NAME KEY_PAIR_NAME=YOUR_KEY_PAIR_NAME

# cleanup stack
make delete-stack TEST_STACK=YOUR_TEST_STACK_NAME  KEY_PAIR_NAME=YOUR_KEY_PAIR_NAME

How to release

# Configure AWS account properly first.

# build template
make build

# perform e2e test
make create-stack TEST_STACK=YOUR_TEST_STACK_NAME KEY_PAIR_NAME=YOUR_KEY_PAIR_NAME
make e2e-test TEST_STACK=YOUR_TEST_STACK_NAME KEY_PAIR_NAME=YOUR_KEY_PAIR_NAME
make delete-stack TEST_STACK=YOUR_TEST_STACK_NAME KEY_PAIR_NAME=YOUR_KEY_PAIR_NAME

# publish to stage
make publish STAGE=(production|staging)

Release Notes

Version 0.1.0

License

MIT License (see LICENSE file).