Skip to content

Commit

Permalink
add more details about resizing only pvc in asts
Browse files Browse the repository at this point in the history
Signed-off-by: Abner-1 <yuanyuxing.yyx@alibaba-inc.com>
  • Loading branch information
ABNER-1 committed Sep 29, 2024
1 parent fcc9c1b commit 328767a
Show file tree
Hide file tree
Showing 3 changed files with 117 additions and 59 deletions.
Binary file added docs/img/en-asts-resize-pvc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
60 changes: 27 additions & 33 deletions docs/proposals/20240626-asts-volume-resize-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,26 +13,24 @@ status:
# Advanced StatefulSet 支持卷变配

## 目录
* [Advanced StatefulSet 支持卷变配](#advanced-statefulset-支持卷变配)
* [目录](#目录)
* [Motivation](#motivation)
* [用户场景](#用户场景)
* [用户失败恢复场景](#用户失败恢复场景)
* [本质问题](#本质问题)
* [目标](#目标)
* [非目标](#非目标)
* [Proposal](#proposal)
* [API 修改](#api-修改)
* [增加 webhook 校验](#增加-webhook-校验)
* [PVC 调谐过程修改](#pvc-调谐过程修改)
* [原地变配 pvc 更新失败后怎么办?](#原地变配-pvc-更新失败后怎么办)
* [如何认定更新失败](#如何认定更新失败)
* [失败后处理流程](#失败后处理流程)
* [方案A](#方案a)
* [方案B](#方案b)
* [Implementation](#implementation)
* [为什么选择延续 KEP-661 的思路不追踪 vct 的历史版本?](#为什么选择延续-kep-661-的思路不追踪-vct-的历史版本)
* [为什么需要增加 VolumeClaimUpdateStrategy 字段,而不是完全参照 KEP-661?](#为什么需要增加-volumeclaimupdatestrategy-字段而不是完全参照-kep-661)

* [Motivation](#motivation)
* [用户场景](#用户场景)
* [用户失败恢复场景](#用户失败恢复场景)
* [本质问题](#本质问题)
* [目标](#目标)
* [非目标](#非目标)
* [Proposal](#proposal)
* [API 修改](#api-修改)
* [增加 webhook 校验](#增加-webhook-校验)
* [PVC 调谐过程修改](#pvc-调谐过程修改)
* [原地变配 pvc 更新失败后怎么办?](#原地变配-pvc-更新失败后怎么办)
* [如何认定更新失败](#如何认定更新失败)
* [失败后处理流程](#失败后处理流程)
* [Implementation](#implementation)
* [为什么选择延续 KEP-661 的思路不追踪 vct 的历史版本?](#为什么选择延续-kep-661-的思路不追踪-vct-的历史版本)
* [为什么需要增加 VolumeClaimUpdateStrategy 字段,而不是完全参照 KEP-661?](#为什么需要增加-volumeclaimupdatestrategy-字段而不是完全参照-kep-661)


## Motivation

Expand All @@ -45,10 +43,12 @@ asts 现在对 Volume Claim Templates 变动完全不关注,只对新的 pod
### 用户场景

1. **[H]** 对可支持变配的 StorageClass 的场景, 可以直接 edit pvc storage 字段增加规格大小(不支持减少)
- 联动 pod 修改按并发度同时修改 pod 字段和 pvc
- 只修改 pvc
2. 对不支持变配的 StorageClass 的场景,需要确保已有 pvc 内容不再需要后可(手动/自动)删除 pvc 和 pod,新 reconcile 出来的 pvc 和 pod 就可以使用最新的配置 (完善用户场景, 该场景理论上是需要)
a. 部分消费类场景,使用一段时间后磁盘会有一部分碎片,有时候会在消费完成后 recreate 以提高性能 (sts删除后重建是不是也可以)
- 部分消费类场景,使用一段时间后磁盘会有一部分碎片,有时候会在消费完成后 recreate 以提高性能 (sts删除后重建是不是也可以)
3. 对需要更改 StorageClass 的场景,操作和场景 2 类似
a. 更改 ssd -> essd / 迁移上云等
- 更改 ssd -> essd / 迁移上云等

### 用户失败恢复场景

Expand Down Expand Up @@ -76,6 +76,7 @@ asts 现在对 Volume Claim Templates 变动完全不关注,只对新的 pod
### 目标

1. 希望在 sc 支持容量扩展的前提下扩展 Volume Claim Templates 规格可以自动化操作
1. 实现只改动 volume claim 的修改,此需求可以用来做一些存储的运维工作。
2. 确保用户可以知道 pvc 的变配是否完成或发生错误
3. 不阻碍用户尝试从异常情况下进行恢复
4. 在打开 RecoverVolumeExpansionFailure feature gate 的集群中,允许用户达成恢复期望4
Expand All @@ -87,11 +88,9 @@ asts 现在对 Volume Claim Templates 变动完全不关注,只对新的 pod

1. 不实现 kep 1790
2. 不实现 volume claim 的版本管理和跟踪,详细影响[为什么选择延续 KEP-661 的思路不追踪 vct 的历史版本?](#为什么选择延续-kep-661-的思路不追踪-vct-的历史版本)
1. 不实现只改动 volume claim 的修改,此需求可以用运维手段来实现。
3. 不实现和标识 pvc 可删除联动的调谐机制
4. 不实现结合 VolumeSnapshot 做备份迁移的机制


## Proposal

### API 修改
Expand Down Expand Up @@ -212,24 +211,19 @@ status:
主体修改位于 `rollingUpdateStatefulsetPods` 函数
![asts-resize-pvc](../img/asts-resize-pvc.png)


### 为什么选择延续 KEP-661 的思路不追踪 vct 的历史版本?
现在 asts/cloneset 不在 controller revision 追踪 volumeClaimTemplates 的历史信息,只关注当前值,延续当前行为的主要原因:
1. 将 vct 的信息加入 controller revision,意味着**如果只存在 vct 的改动也会触发 asts 的版本变动**
1. 目前没有收集到相关的需求
2. 对现有控制器流程影响比较大,涉及改动多,风险比较大
3. 该需求可通过运行脚本来批量 patch 或通过下发一个 job 来解决

2. 直接回滚的操作可以通过上层重新下发配置解决,预想中的大部分场景是可以不回滚pvc配置(或不紧急)
1. 直接回滚的操作可以通过上层重新下发配置解决,预想中的大部分场景是可以不回滚pvc配置(或不紧急)
- 相较 expand pvc 的需求优先级较低,如有必要,可以后续演进

3. 加入历史版本跟踪,可以在尚未更新到某 pvc 时,即使 pvc 被删除也会被拉起到历史版本, 而非最新版本
2. 加入历史版本跟踪,可以在尚未更新到某 pvc 时,即使 pvc 被删除也会被拉起到历史版本, 而非最新版本
- pvc 数据还是没法恢复的,此时用户 delete 某个 pvc 的目的是为了拉起旧版本的 pvc 配置吗? 貌似没啥区别

在上述三个场景没有进一步反馈的情况下,考虑到复杂度,逐步演进暂不实现。
在上述两个场景没有进一步反馈的情况下,考虑到复杂度,逐步演进暂不实现。

### 为什么需要增加 VolumeClaimUpdateStrategy 字段,而不是完全参照 KEP-661?
1. sts 之前不允许修改 vct 任何字段,661 实现的是功能增强
2. asts 之前允许修改 vct 任何字段,如只允许修改 size,无法保证以前的用户场景兼容。通过增加 VolumeClaimUpdateStrategy 字段来兼容之前的行为
3. 可用于统一 cloneset 目前的 recreate 行为,便于理解
3. 可用于统一 CloneSet 目前的 recreate 行为,便于理解
4. 可用于未来可能的集合 VolumeSnapshot 的功能。
116 changes: 90 additions & 26 deletions docs/proposals/20240626-asts-volume-resize.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,34 @@ A table of contents is helpful for quickly jumping to sections of a proposal and
any additional information provided beyond the standard proposal template.
[Tools for generating](https://github.com/ekalinin/github-markdown-toc) a table of contents from markdown are available.

* [Table of Contents](#table-of-contents)
* [Advanced StatefulSet Volume Resize](#advanced-statefulset-volume-resize)
* [Table of Contents](#table-of-contents)
* [Motivation](#motivation)
* [User Story](#user-story)
* [Failure Recovery User Story](#failure-recovery-user-story)
* [Fundamental Issues](#fundamental-issues)
* [Goal](#goal)
* [None Goal](#none-goal)
* [Proposal](#proposal)
* [API Definition](#api-definition)
* [Adding Webhook Validation](#adding-webhook-validation)
* [Updating PVC Process](#updating-pvc-process)
* [Handling In-place PVC Update Failures](#handling-in-place-pvc-update-failures)
* [What to Do After a PVC Update Fails In-Place?](#what-to-do-after-a-pvc-update-fails-in-place)
* [How to Determine Update Failure](#how-to-determine-update-failure)
* [Post-Failure Process](#post-failure-process)
* [Scheme A](#scheme-a)
* [Scheme B](#scheme-b)
* [Implementation](#implementation)
* [Reasons for Not Tracking Historical Versions of VolumeClaimTemplates per KEP-661](#reasons-for-not-tracking-historical-versions-of-volumeclaimtemplates-per-kep-661)
* [Reasons for adding the VolumeClaimUpdateStrategy Field](#reasons-for-adding-the-volumeclaimupdatestrategy-field)

<!-- Created by https://github.com/ekalinin/github-markdown-toc -->
➜ kruise git:(opt_proposal_about_pvc_resize) ✗ gh-md-toc docs/proposals/20240626-asts-volume-resize.md

Table of Contents
=================

* [Motivation](#motivation)
* [User Story](#user-story)
* [Failure Recovery User Story](#failure-recovery-user-story)
Expand All @@ -30,8 +57,13 @@ any additional information provided beyond the standard proposal template.
* [Adding Webhook Validation](#adding-webhook-validation)
* [Updating PVC Process](#updating-pvc-process)
* [Handling In-place PVC Update Failures](#handling-in-place-pvc-update-failures)
* [Reasons for Not Tracking Historical Versions of VolumeClaimTemplates per KEP-661](#reasons-for-not-tracking-historical-versions-of-volumeclaimtemplates-per-kep-661)
* [How to Determine Update Failure](#how-to-determine-update-failure)
* [Post-Failure Process](#post-failure-process)
* [Scheme A](#scheme-a)
* [Scheme B](#scheme-b)
* [Implementation](#implementation)
* [Reasons for Not Tracking Historical Versions of VolumeClaimTemplates](#reasons-for-not-tracking-historical-versions-of-volumeclaimtemplates)
* [Reasons for adding the VolumeClaimUpdateStrategy Field](#reasons-for-adding-the-volumeclaimupdatestrategy-field)

## Motivation

Expand All @@ -47,6 +79,8 @@ the creation of new pods. Users may encounter situations such as:

1. **[H]** In cases where StorageClasses support expansion, users can directly edit the PVC's storage capacity to
increase it (decreasing is not supported).
- Concurrently update pod fields and PVCs.
- Modify only the PVC.
2. For StorageClasses that do not support expansion, users must ensure that the contents of existing PVCs are no longer
needed before manually or automatically deleting the PVC and associated pods. New PVCs and pods will then be
reconciled with the latest configuration. (This scenario requires refinement as it is theoretically necessary.)
Expand Down Expand Up @@ -107,8 +141,9 @@ Users may expect the following recovery options:
### None Goal

1. **Do Not Implement KEP-1790**.
2. **No Version Management**: Do not implement version management and tracking for volume claims.

2. **No Version Management**: Do not implement version management and tracking for volume claims. For detailed impact, see Why choose to continue with the KEP-661 approach without tracking the historical versions of VCT?
3. Do not implement a tuning mechanism for identifying and deleting linked PVCs.
4. Do not implement a backup and migration mechanism combined with VolumeSnapshot.
## Proposal

### API Definition
Expand Down Expand Up @@ -205,37 +240,66 @@ Webhook validation can be implemented using the `allowVolumeExpansion` field in

### Handling In-place PVC Update Failures

In theory, after an `InPlace + LockStep` failure, user intervention is typically necessary, often involving the creation of a new PVC and Pod.
#### How to Determine Update Failure
1. Clear update errors
2. Unclear errors that might succeed upon retry, waiting for a maxWaitTime (global setting, default value 60 seconds), considering it as a failure after timeout

After recognizing the failure, an error event will be printed on the sts resource.

#### Post-Failure Process
Theoretically, after `OnPodRollingUpdate` fails, user intervention is required, which generally involves rebuilding the pvc (and also implies that the pod must be rebuilt).

Taking a three-replica scenario as an example, after the in-place variation of pvc2 fails:
0. After recognizing the failure, an error event will be printed on the sts resource.

The ideal response would be to revert to the `OnDelete + LockStep` process. For example, using a three-replica scenario where `PVC2` fails to update in place:
There are two schemes for failure handling:
##### Scheme A

1. Delete `Pod2` and simultaneously apply a label to `PVC2`.
2. Detect the label on `PVC2` to prevent the creation of a new `Pod2`.
3. Await `PVC2` compatibility and label removal (which can be automated once compatibility is confirmed).
- **User intervention scenarios**:
- If the data on `PVC2` is expendable: Delete `PVC2`.
- If backup is required: create a job mounting `PVC2` to perform necessary operations. After completion, delete `PVC2`.
4. If `Pod0` is deleted in the meantime, it should be recreated without being hindered by `PVC0` incompatibility.
5. Once compatibility is restored, recreate `Pod2`. After `Pod2` is ready, proceed to update the subsequent replicas.
1. Delete pod2 and tag pvc2
2. Upon recognizing the tag on pvc2, no new pod2 will be created
3. Wait for pvc2 to be compatible and the tag to be removed (can be automatically removed after recognizing pvc compatibility)
1. At this point, user intervention is expected in several scenarios:
- No need for pvc2 data, delete pvc2
- Issue a new job to mount pvc2 for backup/snapshot, delete pvc2 after success
- If the storage class supports snapshots, issue a `VolumeSnapshot` resource and restore at an appropriate time
4. (3.5) At this time, if pod0 is deleted, it will trigger the reconstruction of pod0, and it will not be stuck due to the incompatibility of pvc0
5. After compatibility, create pod2 again, and update the next sequence number after pod2 is ready

### Reasons for Not Tracking Historical Versions of `VolumeClaimTemplates` per KEP-661
Suitable for scenarios where pvc updates are clearly not possible, such as patches being rejected, etc.

Currently, `AdvancedStatefulSet/CloneSet` do not track historical `VolumeClaimTemplates` in controller revisions, focusing on current values. This approach is maintained for the following reasons:
However, failure recognition based on timeout may cause pods to be deleted too early, resulting in pvc that only supports online updates to always fail to vary.

1. **Impact on Controller Revision**:
- Incorporating `VolumeClaimTemplates` in controller revisions would trigger version changes in `AdvancedStatefulSet` even with mere modifications to `VolumeClaimTemplates`. This could significantly disrupt existing controller processes, necessitating extensive modifications and posing high risks.
##### Scheme B

2. **Handling Rollbacks via Upper Layer**:
- Rollback operations can typically be managed by reapplying configurations from the upper layer. In most scenarios, reverting PVC configurations (or the lack of urgency for rollbacks) is deemed sufficient.
- The demand for expanding PVCs is more pressing. Future evolution can be considered based on necessity.
1. The controller waits for the pvc change to complete after patching pvc
2. It will keep waiting for the pvc change to complete, being stuck
3. At this time, the user recognizes the error event and intervenes
1. Manually handle the completion of pvc change
2. The data in the original pvc is no longer needed, delete the original pvc (need to delete both pod and pvc), and the controller automatically creates a new pvc
3. After backing up/snapshotting the data in the original pvc, perform step 2
4. If it is judged that it cannot be handled temporarily, change `OnPodRollingUpdate` to `OnDelete`, and no longer change pvc

3. **Historical Version Tracking and PVC Deletion**:
- With historical version tracking, an un-updated PVC would revert to an older version (instead of the latest) upon deletion.
- **Data Integrity**: The restoration of PVC data is still not guaranteed. If a user deletes a PVC, is the intention to reinstate an older version configuration? The distinction appears negligible.
In this design, Everything is judged by user, suitable for any scenario.

In the absence of specific user scenarios, the core advantage of reverting to historical version configurations over the latest ones is unclear, particularly in cases of data loss for persistent storage.
Considering both schemes, Scheme 1 currently cannot solve the boundary issues of all scenarios, and Scheme 2 is preferred for implementation, which can be optimized after accumulating user cases.

### Implementation
Main modification is in the `rollingUpdateStatefulsetPods` Function.
![asts-resize-pvc](../img/asts-resize-pvc.png)
![asts-resize-pvc](../img/en-asts-resize-pvc.png)

### Reasons for Not Tracking Historical Versions of `VolumeClaimTemplates`
Currently, asts/cloneset does not track the historical information of volumeClaimTemplates in controller revisions, focusing only on the current value. The main reasons for continuing the current behavior are:

1. Direct rollback operations can be resolved by re-issuing configurations from the upper layer, and most scenarios envisioned do not require rolling back PVC configurations (or are not urgent).
- Compared to the demand for expanding PVCs, the priority is lower, and if necessary, it can be evolved later.

2. With historical version tracking, even if a pvc is deleted before being updated to a certain version, it will be pulled up to the historical version, not the latest version.
- The pvc data is still not recoverable. Is the user's purpose for deleting a certain pvc to pull up the old version of the pvc configuration? It seems to make no difference.

In the absence of further feedback on these two scenarios, considering the complexity, gradual evolution is chosen, and implementation is not pursued for the time being.

### Reasons for adding the VolumeClaimUpdateStrategy Field
1. Previously, sts did not allow modifying any fields of vct, and 661 implements feature enhancement.
2. Previously, asts allowed modifying any fields of vct. If only the size is allowed to be modified, it cannot ensure compatibility with previous user scenarios. Adding the VolumeClaimUpdateStrategy field to maintain previous behavior.
3. It can be used to unify the current recreate behavior of CloneSet, making it easier to understand.
4. It can be used for potential future integration with VolumeSnapshot features.

0 comments on commit 328767a

Please sign in to comment.