Supports per GPU compute limits (number of processes, utilization rate, memory usage) on a per-(UNIX)user/worker basis, load-balancing, multiple nodes(machines) and more.
Tested on tensorflow-gpu tasks.
Installation (virtual python environment such as venv/conda is recommended)
cd /path/to/install
git clone https://github.com/jigangkim/nvidia-gpu-scheduler.git
cd /path/to/install/nvidia-gpu-scheduler
pip install . # standard installation
pip install -e . # editable (develop mode) installation
Usage (dummy example: json)
cd /path/to/install/nvidia-gpu-scheduler
# Run job server
python example.py --identity scheduler --config_ext .json
# Run worker
python example.py --identity worker --config_ext .json
Usage (dummy example: gin)
cd /path/to/install/nvidia-gpu-scheduler
# Run job server
python example.py --identity scheduler --config_ext .gin
# Run worker
python example.py --identity worker --config_ext .gin
Usage (OpenAI baselines example)
cd /path/to/install/nvidia-gpu-scheduler
# Run job server
python example_openaibaselines.py --identity scheduler
# Run worker
python example_openaibaselines.py --identity worker