Skip to content

Example (single & multinode) (v0.0.6 ‐ v0.0.10)

Setepenre edited this page Mar 13, 2024 · 2 revisions

Setup

  • 2 DGX 8xA100 SMX 80Go (the example use the hostnames cn-d003and cn-d004)

Script

The script below configure milabench for both single & multi node benchmarks and run both in one shot. It takes 1h to fully run.

Note that this script only works for milabench v0.0.6 - v0.0.10.

To make it work on your system, only the following values need to be tweaked:

  • USERNAME: username used to ssh to both machine
  • SSH_KEY_FILE: key used to ssh to both machine
  • ARCH: GPU arch (cuda/rocm)
  • WORKER_0: ip or hostname to the main machine
  • WORKER_1: ip or hostname to the secondary machone

The script must be put and tweaked on the main machine. Once it is done, one can simply run bash run.sh. You can find the full result folder generated by the example. A report output is also included below.

#!/bin/bash

set -m

#
#
#

echo ">> Configure the benchmark"
echo "=========================="


#
# Tweak the values to fit your system
#

USERNAME=${USER:-"mila"}
SSH_KEY_FILE=$HOME/.ssh/id_rsa
ARCH="cuda"
WORKER_0="cn-d003"
WORKER_1="cn-d004"
VERSION="v0.0.10"

# Derived
IMAGE="ghcr.io/mila-iqia/milabench:$ARCH-$VERSION"


# Create the config file
cat >overrides.yaml <<EOL
opt-6_7b-multinode:
  docker_image: "$IMAGE"

  worker_user: "$USERNAME"
  manager_addr: "$WORKER_0"
  worker_addrs:
    - "$WORKER_1"

  num_machines: 2
  capabilities:
    nodes: 2

opt-1_3b-multinode:
  docker_image: "$IMAGE"

  worker_user: "$USERNAME"
  manager_addr: "$WORKER_0"
  worker_addrs:
    - "$WORKER_1"

  num_machines: 2
  capabilities:
    nodes: 2

EOL

echo "<< ======================="
echo ""
echo ">> Prepare docker images"
echo "========================"

echo ssh $USERNAME@$WORKER_0 "docker pull $IMAGE"
echo ssh $USERNAME@$WORKER_1 "docker pull $IMAGE" 

ssh $USERNAME@$WORKER_0 "docker pull $IMAGE" &
ssh $USERNAME@$WORKER_1 "docker pull $IMAGE" &
fg
fg

echo "<< ====================="
echo ""

#
#
#

echo ">> Run milabench"
echo "================"

if [ "$ARCH" = "cuda" ]; then 
    docker run -it --rm --gpus all --network host --ipc=host --privileged           \
        -v $SSH_KEY_FILE:/milabench/id_milabench                                    \
        -v $(pwd)/results:/milabench/envs/runs                                      \
        $IMAGE                                                                      \
        milabench run --override "$(cat overrides.yaml)"

elif [ "$ARCH" = "rocm" ]; then 
    docker run -it --rm --network host --ipc host --privileged                      \
        --security-opt seccomp=unconfined --group-add video                         \
        -v /opt/amdgpu/share/libdrm/amdgpu.ids:/opt/amdgpu/share/libdrm/amdgpu.ids  \
        -v /opt/rocm:/opt/rocm                                                      \
        -v $(pwd)/results:/milabench/envs/runs                                      \
        $IMAGE                                                                      \
        milabench run --override "$(cat overrides.yaml)"
fi

echo "<< ============="
echo ""

#
#
#

echo ">> Print report"
echo "==============="
docker run -it --rm                                                                 \
    -v $(pwd)/results:/milabench/envs/runs                                          \
    $IMAGE                                                                          \
    milabench report --runs /milabench/envs/runs

echo "<< ============"

Output

===============
Source: /milabench/envs/runs
=================
Benchmark results
=================
                         fail n       perf   sem%   std% peak_memory          score weight
bert-fp16                   0 8     155.09   0.3%   4.3%       24600    1241.393129   0.00
bert-fp32                   0 8      29.56   0.0%   0.5%       31564     236.597358   0.00
bert-tf32                   0 8     119.87   0.3%   5.5%       31566     959.552126   0.00
bert-tf32-fp16              0 8     154.75   0.3%   4.2%       24600    1238.401888   3.00
convnext_large-fp16         0 8     339.21   0.9%  13.3%       27462    2749.709059   0.00
convnext_large-fp32         0 8      45.52   0.6%   9.0%       49582     359.979549   0.00
convnext_large-tf32         0 8     146.50   0.9%  14.2%       49582    1168.371136   0.00
convnext_large-tf32-fp16    0 8     338.60   0.9%  13.6%       27462    2746.288606   3.00
davit_large                 0 8     310.22   0.2%   5.2%       34504    2487.821250   1.00
davit_large-multi           0 1    2311.79   1.0%   7.6%       41824    2311.786629   5.00
dlrm                        0 1  178899.18   1.9%  14.5%        3490  178899.178933   1.00
focalnet                    0 8     387.85   0.2%   5.3%       26350    3112.201303   2.00
opt-1_3b                    0 1      28.32   0.0%   0.2%       42262      28.323743   5.00
opt-1_3b-multinode          0 1      32.37   0.0%   0.2%       41578      32.367767  10.00
opt-6_7b                    0 1      13.20   0.0%   0.1%       56820      13.204616   5.00
opt-6_7b-multinode          0 1      10.79   0.0%   0.0%       47694      10.785362  10.00
reformer                    0 8      61.38   0.0%   1.0%       25404     491.560766   1.00
regnet_y_128gf              0 8      83.80   0.2%   5.2%       31554     671.706002   2.00
resnet152                   0 8     665.49   0.3%   6.1%       36398    5338.088352   1.00
resnet152-multi             0 1    5115.10   1.3%  10.2%       38934    5115.104518   5.00
resnet50                    0 8     998.41   0.5%  11.8%        4730    8008.913503   1.00
stargan                     0 8      43.79   1.4%  30.3%       37426     352.550097   1.00
super-slomo                 0 8      41.50   0.1%   1.9%       33800     331.820253   1.00
t5                          0 8      47.73   0.2%   3.7%       34388     381.825063   2.00
whisper                     0 8     547.80   0.2%   3.3%        9278    4384.366568   1.00

Scores
------
Failure rate:       0.00% (PASS)
Score:             205.26
<< ============
Clone this wiki locally