This repository has been archived by the owner on Aug 2, 2023. It is now read-only.
Releases: lablup/backend.ai-manager
Releases · lablup/backend.ai-manager
20.09.3
20.09.3 (2021-01-04)
Fixes
20.09.2
20.09.1
20.09.0
20.09.0 (2020-12-27)
Features
- Implement
max_container_per_session
keypair resource policy as a part of predicate checks (#376)
Fixes
- Additional stabilization of multi-node multi-container support (#376)
- Fix destruction of multi-container sessions when using multiple nodes
- Adopt
MultiAgentError
to represent and handle partial failures of multi-container session creation
20.09.0 (beta-phase logs)
Breaking Changes
- The latest API version is now bumped to
v6.20200815
. (#0) - Configuration/DB changes are required for storage proxies (#312)
- To use vfolders, now the storage proxy must be installed and configured.
- The "volumes" key in the etcd must include storage proxy configurations,
as demonstrated in theconfig/sample.volume.json
file. - All vfolder hosts are now in the format of "{proxy-name}:{volume-name}"
where proxy-name is the key in the etcd and volume-name values are retrieved from
the storage proxies at runtime. - All
allowed_vfolder_hosts
configurations in the database must be updated to
the above new vfolder host format. - Clients should use the same vfolder host format when making API requests.
Features
- Add support for multi-container sessions (#217)
- Add a generic query filter expression language parser (
ai.backend.manager.models.minilang.queryfilter
) for GraphQL paginated list queries using the Lark parser framework (#305) - Add support for storage proxies (#312, #337)
- Storage proxies have a multi-backend architecture, so now storage-specific optimizations such as per-directory quota, performance measurements and fast metadata scanning all becomes available to Backend.AI users.
- Offload and unify vfolder upload/download operations to storage proxies via the HTTP ranged queries and the tus.io protocol.
- Support multiple storage proxies configured via etcd, and each storage proxy may provide multiple volumes in the mount points shared with agents.
- Now the manager instances don't have mount points for the storage volumes, and mount/fstab management APIs skip the manager-side queries and manipulations.
- Include user information (full_name) in keypair gql query. (#313)
- Add an endpoint that allows users to leave a shared virtual folder (#317)
- Make the
mgr dbshell
command to be smarter to auto-detect the halfstack db container and use the container-provided psql command for maximum compatibility, with an optinal ability to set the container ID or name explicitly via--psql-container
option and forward additional arguments to the psql command (#318) - Script to migrate /vroot/local to ex. /vroot/vfs structure according with new Storage Proxy implementation. (#319)
- Make the maximum websocket message size configurable, which affects the operation of streaming APIs including service-port proxy (#320)
- Add a new vfolder API to clone a vfolder and a property to vfolders for specifying cloneable or not (#323, #338)
- Add
quota
argument when creating vfolders (currently only supported in the xfs storage backend) (#325) - Add support for listing/updating/deleting/creating domain dotfiles and group dotfiles (#329)
- Make vfolder's mkdir to accept and deliver parents and exist_ok option. (#336)
- Add a new API path for per-agent hardware metadata queries (#366)
- Support destroying PREPARING/TERMINATING/ERROR sessions when forced parameter is delivered as True. (#363)
- Add support for Harbor v2 and generalize internal abstractions of container registries, making it easier to add new registries in the future (#357)
- New idle checker to automatically kill sessions using various criteria (#341)
- Add a new API
/session/{session_ref}/shutdown-service
for shutting down running in-container services (#327) - Add hooking point for AUTHORIZE with FIRST_COMPLETED requirement. (#339)
Fixes
- Stabilize multi-container sessions further. (#373)
- Include predicate check results in
kernels.status_data
for better user-side diagnosis - Fix further race condition in the session-level creation routines, by adopting
session_creation_id
like thekernel_creation_id
introduced in #374. - Remove the no longer used
agents.clusterized
column as it is no longer included in agent heartbeats after k8s branch refactoring. - Use dual DB connections to rollback agent-specific (resource) updates but to keep kernel-specific (status) updates when scheduling failures occur.
- Fix a potential duplicate of DB cursors because use-after-free of a DB connection in the manager scheduler.
- Relax row-level locks in scheduler in favor of the
REPEATABLE READ
isolation level. - Synchronize the candidate agent list when iterating over multiple kernels for a session in the multi-node mode, because they should now be updated
according to the transaction savepoints. - Fix blocked proceeding to the next scaling group when a scheduling failure occurs in a scaling group.
- Shield the DB queries in the events queue handlers and now interrupting the manager service works better.
- Include predicate check results in
- Fix races of kernel creation events by attaching a unique creation request ID to distinguish and catch the events by the caller manager instance (#374)
- Unable to request watcher API due to incorrect reference to watcher endpoint. (#370)
- Replace deprecated reference of kernels.c.role with kernels.c.cluster_role in legacy compute session list query. (#371)
- Fix various bugs related to multi-node execution paths and provide richer error information with agent-side errors as a new JSON column
status_data
in the kernels table (#372) - Fix a bug that prevents purge users and/or groups due to access to vfroot filesystem from manager directly, instead of delegating the tasks to storage-proxy. (#361)
- Fix a race condition that sometimes corrupts the kernel status to be stuck at PREPARING even when kernels are successfully created (#368)
- Fix a regression in image list graph queries due to a missing renaming of local_config (#367)
- Update aiotools to v1.1.1 to fix a potential memory leak in long-lived taskgroup instances (#362)
- Fix a hang-up issue when shutting down the gateway daemon with running service-port streaming sessions (#365)
- Update fixtures to put the default registry as
cr.backend.ai
and remove unused ones (#358) - Improve handling of storage-manager invocation failures using a new
VFolderOperationFailed
API exception. (#352) - Fix missing replacement of
config_server
toshared_config
in GraphQL context objects and references viarequest.app['registry'].config_server
in some API handlers (#354) - Fix a regression of fetching
live_stat
GraphQL field values (#355) - Improve the error message for image not found errors when creating new sessions so that users could guess the reason quicklier (#356)
- Implement and use a new global timer which works regardless of the number of manager instances to sustain periodic tasks such as session scheduling and idle checks (#341)
- Replace legacy sess_id to session_name. (#348)
- Fix intermittent resource warnings from aiopg by replacing all
fetchone()
withfirst()
andscalar()
that closes the database cursor automatically always (#349) - Update compute_plugins info from heartbeat for an ALIVE agent. (#350)
- Fix broken signo...
19.09.19
- Update changelog (missing release date for 19.09.19) - @achimnol
- Bump version to 19.09.19 - @achimnol
- Update changelog - @achimnol
- Add missing GQL mutation privilege checks (#254) - @achimnol
- Add ForgetImage GQL mutation and forget_image mutation field (#255) - @achimnol
- backport: cli.etcd: Add forget-image commands - @achimnol
- Update changelog - @achimnol
- Fix some code editing error in #252. oops. - @achimnol
- Support pagination for rescan-images (#252) - @achimnol
- ci: Update mypy to 0.770 and fix a type-error found by the new version - @achimnol
- Strip owner_access_key from IV's input (#251) - @achimnol
- Make debug logs debug (#232) - @achimnol
- Only return the container logs from db for surely dead sessions (#232) - @achimnol
- gateway.admin: Fix type error due to mypy update - @achimnol
- Fix up global eventloop error logging - @achimnol
- Support for forced termination of sessions (#250) - @achimnol
- Restrict Creating directory already exists (#248) - @lizable
- registry.sync_kernel_stats(): Fix SQL syntax error - @achimnol
- alembic: Add a merge migration - @achimnol