Skip to content

Commit

Permalink
Admin options adjustments (different compute devices support) (#267)
Browse files Browse the repository at this point in the history
Minor fixes, adjusted Admin settings UI and backend part with optional
compute device (NVIDIA/ADM/CPU) daemon configuration.

---------

Signed-off-by: Andrey Borysenko <andrey18106x@gmail.com>
  • Loading branch information
andrey18106 authored Apr 9, 2024
1 parent 5acfac7 commit 9fb115b
Show file tree
Hide file tree
Showing 18 changed files with 321 additions and 138 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).

## [2.5.0 - 2024-04-xx]

### Added

- Different compute device configuration for Daemon (NVIDIA, AMD, CPU)

## [2.4.0 - 2024-04-04]

### Added
Expand Down
3 changes: 2 additions & 1 deletion appinfo/info.xml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ to join us in shaping a more versatile, stable, and secure app landscape.
*Your insights, suggestions, and contributions are invaluable to us.*
]]></description>
<version>2.4.0</version>
<version>2.5.0</version>
<licence>agpl</licence>
<author mail="andrey18106x@gmail.com" homepage="https://github.com/andrey18106">Andrey Borysenko</author>
<author mail="bigcat88@icloud.com" homepage="https://github.com/bigcat88">Alexander Piskun</author>
Expand Down Expand Up @@ -72,6 +72,7 @@ to join us in shaping a more versatile, stable, and secure app landscape.
<install>
<step>OCA\AppAPI\Migration\DataInitializationStep</step>
<step>OCA\AppAPI\Migration\DaemonUpdateV2RepairStep</step>
<step>OCA\AppAPI\Migration\DaemonUpdateGPUSRepairStep</step>
</install>
</repair-steps>
<commands>
Expand Down
11 changes: 7 additions & 4 deletions docs/CreationOfDeployDaemon.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Register

Register Deploy Daemon (DaemonConfig).

Command: ``app_api:daemon:register [--net NET] [--gpu] [--] <name> <display-name> <accepts-deploy-id> <protocol> <host> <nextcloud_url>``
Command: ``app_api:daemon:register [--net NET] [--haproxy_password HAPROXY_PASSWORD] [--compute_device COMPUTE_DEVICE] [--set-default] [--] <name> <display-name> <accepts-deploy-id> <protocol> <host> <nextcloud_url>``

Arguments
*********
Expand All @@ -49,7 +49,7 @@ Options

* ``--net [network-name]`` - ``[required]`` network name to bind docker container to (default: ``host``)
* ``--haproxy_password HAPROXY_PASSWORD`` - ``[optional]`` password for AppAPI Docker Socket Proxy
* ``--gpu GPU`` - ``[optional]`` GPU device to expose to the daemon (e.g. ``/dev/dri``)
* ``--compute_device GPU`` - ``[optional]`` GPU device to expose to the daemon (e.g. ``cpu|cuda|rocm``, default: ``cpu``)
* ``--set-default`` - ``[optional]`` set created daemon as default for ExApps installation

DeployConfig
Expand All @@ -64,7 +64,10 @@ ExApp container.
"net": "host",
"nextcloud_url": "https://nextcloud.local",
"haproxy_password": "some_secure_password",
"gpus": true,
"computeDevice": {
"id": "cuda",
"name": "CUDA (NVIDIA)",
},
}
DeployConfig options
Expand All @@ -73,7 +76,7 @@ DeployConfig options
* ``net`` **[required]** - network name to bind docker container to (default: ``host``)
* ``nextcloud_url`` **[required]** - Nextcloud URL (e.g. ``https://nextcloud.local``)
* ``haproxy_password`` *[optional]* - password for AppAPI Docker Socket Proxy
* ``gpus`` *[optional]* - GPU device to attach to the daemon (e.g. ``/dev/dri``)
* ``computeDevice`` *[optional]* - Compute device to attach to the daemon (e.g. ``{ "id": "cuda", "label": "CUDA (NVIDIA)" }``)

Unregister
----------
Expand Down
27 changes: 25 additions & 2 deletions lib/Command/Daemon/RegisterDaemon.php
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,12 @@ protected function configure(): void {
$this->addOption('net', null, InputOption::VALUE_REQUIRED, 'DeployConfig, the name of the docker network to attach App to');
$this->addOption('haproxy_password', null, InputOption::VALUE_REQUIRED, 'AppAPI Docker Socket Proxy password for HAProxy Basic auth');

$this->addOption('gpu', null, InputOption::VALUE_NONE, 'Enable support of GPUs for containers');
$this->addOption('compute_device', null, InputOption::VALUE_REQUIRED, 'Compute device for GPU support (cpu|cuda|rocm)');

$this->addOption('set-default', null, InputOption::VALUE_NONE, 'Set DaemonConfig as default');

$this->addUsage('local_docker "Docker local" "docker-install" "http" "/var/run/docker.sock" "http://nextcloud.local" --net=nextcloud');
$this->addUsage('local_docker "Docker local" "docker-install" "http" "/var/run/docker.sock" "http://nextcloud.local" --net=nextcloud --set-default --compute_device=cuda');
}

protected function execute(InputInterface $input, OutputInterface $output): int {
Expand All @@ -57,7 +58,7 @@ protected function execute(InputInterface $input, OutputInterface $output): int
'net' => $input->getOption('net') ?? 'host',
'nextcloud_url' => $nextcloudUrl,
'haproxy_password' => $input->getOption('haproxy_password') ?? '',
'gpu' => $input->getOption('gpu') ?? false,
'computeDevice' => $this->buildComputeDevice($input->getOption('compute_device') ?? 'cpu'),
];

if (($protocol !== 'http') && ($protocol !== 'https')) {
Expand Down Expand Up @@ -95,4 +96,26 @@ protected function execute(InputInterface $input, OutputInterface $output): int
$output->writeln('Daemon successfully registered.');
return 0;
}

private function buildComputeDevice(string $computeDevice): array {
switch ($computeDevice) {
case 'cpu':
return [
'id' => 'cpu',
'label' => 'CPU',
];
case 'cuda':
return [
'id' => 'cuda',
'label' => 'CUDA (NVIDIA)',
];
case 'rocm':
return [
'id' => 'rocm',
'label' => 'ROCm (AMD)',
];
default:
throw new \InvalidArgumentException('Invalid compute device value.');
}
}
}
50 changes: 4 additions & 46 deletions lib/DeployActions/AIODockerActions.php
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
*/
class AIODockerActions {
public const AIO_DAEMON_CONFIG_NAME = 'docker_aio';
public const AIO_DAEMON_CONFIG_NAME_GPU = 'docker_aio_gpu';
public const AIO_DOCKER_SOCKET_PROXY_HOST = 'nextcloud-aio-docker-socket-proxy:2375';

public function __construct(
Expand Down Expand Up @@ -46,13 +45,12 @@ public function registerAIODaemonConfig(): ?DaemonConfig {
'net' => 'nextcloud-aio', // using the same host as default network for Nextcloud AIO containers
'nextcloud_url' => 'https://' . getenv('NC_DOMAIN'),
'haproxy_password' => null,
'gpu' => false,
'computeDevice' => [
'id' => 'cpu',
'label' => 'CPU',
],
];

if ($this->isGPUsEnabled()) {
$this->registerAIODaemonConfigWithGPU();
}

$daemonConfigParams = [
'name' => self::AIO_DAEMON_CONFIG_NAME,
'display_name' => 'AIO Docker Socket Proxy',
Expand All @@ -68,44 +66,4 @@ public function registerAIODaemonConfig(): ?DaemonConfig {
}
return $daemonConfig;
}

/**
* Registers DaemonConfig with default params to use AIO Docker Socket Proxy with GPU
*/
private function registerAIODaemonConfigWithGPU(): ?DaemonConfig {
$daemonConfigWithGPU = $this->daemonConfigService->getDaemonConfigByName(self::AIO_DAEMON_CONFIG_NAME_GPU);
if ($daemonConfigWithGPU !== null) {
return $daemonConfigWithGPU;
}

$deployConfig = [
'net' => 'nextcloud-aio', // using the same host as default network for Nextcloud AIO containers
'nextcloud_url' => 'https://' . getenv('NC_DOMAIN'),
'haproxy_password' => null,
'gpu' => true,
];

$daemonConfigParams = [
'name' => self::AIO_DAEMON_CONFIG_NAME_GPU,
'display_name' => 'AIO Docker Socket Proxy with GPU',
'accepts_deploy_id' => 'docker-install',
'protocol' => 'http',
'host' => self::AIO_DOCKER_SOCKET_PROXY_HOST,
'deploy_config' => $deployConfig,
];

return $this->daemonConfigService->registerDaemonConfig($daemonConfigParams);
}

/**
* Check if /dev/dri folder mounted to the container.
* In AIO this means that NEXTCLOUD_ENABLE_DRI_DEVICE=true
*/
private function isGPUsEnabled(): bool {
$devDri = '/dev/dri';
if (is_dir($devDri)) {
return true;
}
return false;
}
}
43 changes: 30 additions & 13 deletions lib/DeployActions/DockerActions.php
Original file line number Diff line number Diff line change
Expand Up @@ -125,11 +125,20 @@ public function createContainer(string $dockerUrl, array $imageParams, array $pa
$containerParams['NetworkingConfig'] = $networkingConfig;
}

if (isset($params['gpu']) && filter_var($params['gpu'], FILTER_VALIDATE_BOOLEAN)) {
if (isset($params['deviceRequests'])) {
$containerParams['HostConfig']['DeviceRequests'] = $params['deviceRequests'];
} else {
$containerParams['HostConfig']['DeviceRequests'] = $this->buildDefaultGPUDeviceRequests();
if (isset($params['computeDevice'])) {
if ($params['computeDevice']['id'] === 'cuda') {
if (isset($params['deviceRequests'])) {
$containerParams['HostConfig']['DeviceRequests'] = $params['deviceRequests'];
} else {
$containerParams['HostConfig']['DeviceRequests'] = $this->buildDefaultGPUDeviceRequests();
}
}
if ($params['computeDevice']['id'] === 'rocm') {
if (isset($params['devices'])) {
$containerParams['HostConfig']['Devices'] = $params['devices'];
} else {
$containerParams['HostConfig']['Devices'] = $this->buildDevicesParams(['/dev/kfd', '/dev/dri']);
}
}
}

Expand Down Expand Up @@ -346,10 +355,15 @@ public function buildDeployParams(DaemonConfig $daemonConfig, array $appInfo): a
$externalApp = $appInfo['external-app'];
$deployConfig = $daemonConfig->getDeployConfig();

if (isset($deployConfig['gpu']) && filter_var($deployConfig['gpu'], FILTER_VALIDATE_BOOLEAN)) {
$deviceRequests = $this->buildDefaultGPUDeviceRequests();
if (isset($deployConfig['computeDevice'])) {
if ($deployConfig['computeDevice']['id'] === 'cuda') {
$deviceRequests = $this->buildDefaultGPUDeviceRequests();
} elseif ($deployConfig['computeDevice']['id'] === 'rocm') {
$devices = $this->buildDevicesParams(['/dev/kfd', '/dev/dri']);
}
} else {
$deviceRequests = [];
$devices = [];
}
$storage = $this->buildDefaultExAppVolume($appId)[0]['Target'];

Expand All @@ -375,8 +389,9 @@ public function buildDeployParams(DaemonConfig $daemonConfig, array $appInfo): a
'port' => $appInfo['port'],
'net' => $deployConfig['net'] ?? 'host',
'env' => $envs,
'computeDevice' => $deployConfig['computeDevice'] ?? null,
'devices' => $devices,
'deviceRequests' => $deviceRequests,
'gpu' => count($deviceRequests) > 0,
];

return [
Expand All @@ -398,10 +413,14 @@ public function buildDeployEnvs(array $params, array $deployConfig): array {
sprintf('NEXTCLOUD_URL=%s', $deployConfig['nextcloud_url'] ?? str_replace('https', 'http', $this->urlGenerator->getAbsoluteURL(''))),
];

// Always set COMPUTE_DEVICE=cpu|cuda|rocm
$autoEnvs[] = sprintf('COMPUTE_DEVICE=%s', $deployConfig['computeDevice']['id']);
// Add required GPU runtime envs if daemon configured to use GPU
if (isset($deployConfig['gpu']) && filter_var($deployConfig['gpu'], FILTER_VALIDATE_BOOLEAN)) {
$autoEnvs[] = sprintf('NVIDIA_VISIBLE_DEVICES=%s', 'all');
$autoEnvs[] = sprintf('NVIDIA_DRIVER_CAPABILITIES=%s', 'compute,utility');
if (isset($deployConfig['computeDevice'])) {
if ($deployConfig['computeDevice']['id'] === 'cuda') {
$autoEnvs[] = sprintf('NVIDIA_VISIBLE_DEVICES=%s', 'all');
$autoEnvs[] = sprintf('NVIDIA_DRIVER_CAPABILITIES=%s', 'compute,utility');
}
}
return $autoEnvs;
}
Expand Down Expand Up @@ -518,8 +537,6 @@ private function isGPUAvailable(): bool {

/**
* Return default GPU device requests for container.
* For now only NVIDIA GPUs supported.
* TODO: Add support for other GPU vendors
*/
private function buildDefaultGPUDeviceRequests(): array {
return [
Expand Down
72 changes: 72 additions & 0 deletions lib/Migration/DaemonUpdateGPUSRepairStep.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<?php

declare(strict_types=1);

namespace OCA\AppAPI\Migration;

use OCA\AppAPI\Db\DaemonConfig;
use OCA\AppAPI\Db\DaemonConfigMapper;
use OCP\DB\Exception;
use OCP\Migration\IOutput;
use OCP\Migration\IRepairStep;
use Psr\Log\LoggerInterface;

class DaemonUpdateGPUSRepairStep implements IRepairStep {
public function __construct(
private readonly DaemonConfigMapper $daemonConfigMapper,
private readonly LoggerInterface $logger,
) {
}

public function getName(): string {
return 'AppAPI Daemons configuration GPU params update';
}

public function run(IOutput $output): void {
$daemons = $this->daemonConfigMapper->findAll();
$daemonsUpdated = 0;
// Update manual-install daemons
/** @var DaemonConfig $daemon */
foreach ($daemons as $daemon) {
$daemonsUpdated += $this->updateDaemonConfiguration($daemon);
}
$output->info(sprintf('Daemons configuration GPU params updated: %s', $daemonsUpdated));
}

private function updateDaemonConfiguration(DaemonConfig $daemonConfig): int {
$updated = false;

$deployConfig = $daemonConfig->getDeployConfig();
if (isset($deployConfig['gpu'])) {
if (filter_var($deployConfig['gpu'], FILTER_VALIDATE_BOOLEAN)) {
$deployConfig['computeDevice'] = [
'id' => 'cuda',
'label' => 'CUDA (NVIDIA)',
];
} else {
$deployConfig['computeDevice'] = [
'id' => 'cpu',
'label' => 'CPU',
];
}
unset($deployConfig['gpu']);
$daemonConfig->setDeployConfig($deployConfig);
$updated = true;
}

if ($updated) {
try {
$this->daemonConfigMapper->update($daemonConfig);
return 1;
} catch (Exception $e) {
$this->logger->error(
sprintf('Failed to update Daemon config (%s: %s)',
$daemonConfig->getAcceptsDeployId(), $daemonConfig->getName()),
['exception' => $e]
);
return 0;
}
}
return 0;
}
}
3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@
"lint": "eslint --ext .js,.vue src",
"lint:fix": "eslint --ext .js,.vue src --fix",
"stylelint": "stylelint src/**/*.vue src/**/*.scss src/**/*.css",
"stylelint:fix": "stylelint src/**/*.vue src/**/*.scss src/**/*.css --fix"
"stylelint:fix": "stylelint src/**/*.vue src/**/*.scss src/**/*.css --fix",
"serve": "NODE_ENV=development webpack serve --allowed-hosts all --config webpack.js"
},
"browserslist": [
"extends @nextcloud/browserslist-config"
Expand Down
Binary file modified screenshots/app_api_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified screenshots/app_api_2.png
100755 → 100644
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified screenshots/app_api_3.png
100755 → 100644
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified screenshots/app_api_4.png
100755 → 100644
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions src/components/AdminSettings.vue
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
:options="['no', 'always', 'unless-stopped']"
:placeholder="t('app_api', 'ExApp container restart policy')"
:aria-label="t('app_api', 'ExApp container restart policy')"
:aria-label-combobox="t('app_api', 'ExApp container restart policy')"
@input="onInput" />
</NcSettingsSection>
</div>
Expand Down
Loading

0 comments on commit 9fb115b

Please sign in to comment.