Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

Releases: lablup/backend.ai-manager

20.09.3

17 May 05:09
20.09.3
Compare
Choose a tag to compare

20.09.3 (2021-01-04)

Fixes

  • Fix reloading of new container registry configs when rescanning, which is a regression due to registry abstraction for Harbor v2 support (#382)
  • Fix cluster_idx for single-node sessions with only one container (main0 should be main1 to be consistent with multi-node sessions) (#383)

20.09.2

17 May 05:09
20.09.2
Compare
Choose a tag to compare

20.09.2 (2020-12-30)

Fixes

  • Fix the duplicate passed/failed predicate list in status_data (#379)
  • Fix a missing kernel ID field during cancel-termination upon failed creation of multi-container sessions (#380)

20.09.1

17 May 05:09
20.09.1
Compare
Choose a tag to compare

20.09.1 (2020-12-28)

Fixes

  • Put aiodocker in the default dependency, not testing. (#377)
  • Remove use of deprecated asyncio API (asyncio.Task.{all_tasks,current_task} -> asyncio.{all_tasks,current_task}) to run with Python 3.9 (#378)

20.09.0

17 May 05:09
20.09.0
Compare
Choose a tag to compare

20.09.0 (2020-12-27)

Features

  • Implement max_container_per_session keypair resource policy as a part of predicate checks (#376)

Fixes

  • Additional stabilization of multi-node multi-container support (#376)
    • Fix destruction of multi-container sessions when using multiple nodes
    • Adopt MultiAgentError to represent and handle partial failures of multi-container session creation

20.09.0 (beta-phase logs)

Breaking Changes

  • The latest API version is now bumped to v6.20200815. (#0)
  • Configuration/DB changes are required for storage proxies (#312)
    • To use vfolders, now the storage proxy must be installed and configured.
    • The "volumes" key in the etcd must include storage proxy configurations,
      as demonstrated in the config/sample.volume.json file.
    • All vfolder hosts are now in the format of "{proxy-name}:{volume-name}"
      where proxy-name is the key in the etcd and volume-name values are retrieved from
      the storage proxies at runtime.
    • All allowed_vfolder_hosts configurations in the database must be updated to
      the above new vfolder host format.
    • Clients should use the same vfolder host format when making API requests.

Features

  • Add support for multi-container sessions (#217)
  • Add a generic query filter expression language parser (ai.backend.manager.models.minilang.queryfilter) for GraphQL paginated list queries using the Lark parser framework (#305)
  • Add support for storage proxies (#312, #337)
    • Storage proxies have a multi-backend architecture, so now storage-specific optimizations such as per-directory quota, performance measurements and fast metadata scanning all becomes available to Backend.AI users.
    • Offload and unify vfolder upload/download operations to storage proxies via the HTTP ranged queries and the tus.io protocol.
    • Support multiple storage proxies configured via etcd, and each storage proxy may provide multiple volumes in the mount points shared with agents.
    • Now the manager instances don't have mount points for the storage volumes, and mount/fstab management APIs skip the manager-side queries and manipulations.
  • Include user information (full_name) in keypair gql query. (#313)
  • Add an endpoint that allows users to leave a shared virtual folder (#317)
  • Make the mgr dbshell command to be smarter to auto-detect the halfstack db container and use the container-provided psql command for maximum compatibility, with an optinal ability to set the container ID or name explicitly via --psql-container option and forward additional arguments to the psql command (#318)
  • Script to migrate /vroot/local to ex. /vroot/vfs structure according with new Storage Proxy implementation. (#319)
  • Make the maximum websocket message size configurable, which affects the operation of streaming APIs including service-port proxy (#320)
  • Add a new vfolder API to clone a vfolder and a property to vfolders for specifying cloneable or not (#323, #338)
  • Add quota argument when creating vfolders (currently only supported in the xfs storage backend) (#325)
  • Add support for listing/updating/deleting/creating domain dotfiles and group dotfiles (#329)
  • Make vfolder's mkdir to accept and deliver parents and exist_ok option. (#336)
  • Add a new API path for per-agent hardware metadata queries (#366)
  • Support destroying PREPARING/TERMINATING/ERROR sessions when forced parameter is delivered as True. (#363)
  • Add support for Harbor v2 and generalize internal abstractions of container registries, making it easier to add new registries in the future (#357)
  • New idle checker to automatically kill sessions using various criteria (#341)
  • Add a new API /session/{session_ref}/shutdown-service for shutting down running in-container services (#327)
  • Add hooking point for AUTHORIZE with FIRST_COMPLETED requirement. (#339)

Fixes

  • Stabilize multi-container sessions further. (#373)
    • Include predicate check results in kernels.status_data for better user-side diagnosis
    • Fix further race condition in the session-level creation routines, by adopting session_creation_id like the kernel_creation_id introduced in #374.
    • Remove the no longer used agents.clusterized column as it is no longer included in agent heartbeats after k8s branch refactoring.
    • Use dual DB connections to rollback agent-specific (resource) updates but to keep kernel-specific (status) updates when scheduling failures occur.
    • Fix a potential duplicate of DB cursors because use-after-free of a DB connection in the manager scheduler.
    • Relax row-level locks in scheduler in favor of the REPEATABLE READ isolation level.
    • Synchronize the candidate agent list when iterating over multiple kernels for a session in the multi-node mode, because they should now be updated
      according to the transaction savepoints.
    • Fix blocked proceeding to the next scaling group when a scheduling failure occurs in a scaling group.
    • Shield the DB queries in the events queue handlers and now interrupting the manager service works better.
  • Fix races of kernel creation events by attaching a unique creation request ID to distinguish and catch the events by the caller manager instance (#374)
  • Unable to request watcher API due to incorrect reference to watcher endpoint. (#370)
  • Replace deprecated reference of kernels.c.role with kernels.c.cluster_role in legacy compute session list query. (#371)
  • Fix various bugs related to multi-node execution paths and provide richer error information with agent-side errors as a new JSON column status_data in the kernels table (#372)
  • Fix a bug that prevents purge users and/or groups due to access to vfroot filesystem from manager directly, instead of delegating the tasks to storage-proxy. (#361)
  • Fix a race condition that sometimes corrupts the kernel status to be stuck at PREPARING even when kernels are successfully created (#368)
  • Fix a regression in image list graph queries due to a missing renaming of local_config (#367)
  • Update aiotools to v1.1.1 to fix a potential memory leak in long-lived taskgroup instances (#362)
  • Fix a hang-up issue when shutting down the gateway daemon with running service-port streaming sessions (#365)
  • Update fixtures to put the default registry as cr.backend.ai and remove unused ones (#358)
  • Improve handling of storage-manager invocation failures using a new VFolderOperationFailed API exception. (#352)
  • Fix missing replacement of config_server to shared_config in GraphQL context objects and references via request.app['registry'].config_server in some API handlers (#354)
  • Fix a regression of fetching live_stat GraphQL field values (#355)
  • Improve the error message for image not found errors when creating new sessions so that users could guess the reason quicklier (#356)
  • Implement and use a new global timer which works regardless of the number of manager instances to sustain periodic tasks such as session scheduling and idle checks (#341)
  • Replace legacy sess_id to session_name. (#348)
  • Fix intermittent resource warnings from aiopg by replacing all fetchone() with first() and scalar() that closes the database cursor automatically always (#349)
  • Update compute_plugins info from heartbeat for an ALIVE agent. (#350)
  • Fix broken signo...
Read more

19.09.19