Skip to content

<fix>[compute]: hard delete host file if created state vm destroyed#3889

Open
zstack-robot-2 wants to merge 1 commit intozsv_5.0.0from
sync/wenhao.zhang/c-2@@2
Open

<fix>[compute]: hard delete host file if created state vm destroyed#3889
zstack-robot-2 wants to merge 1 commit intozsv_5.0.0from
sync/wenhao.zhang/c-2@@2

Conversation

@zstack-robot-2
Copy link
Copy Markdown
Collaborator

Resolves: ZSV-11845
Related: ZSV-11310

Change-Id: I776f676a797473756a6f6166756c6a786f717869

sync from gitlab !9769

Resolves: ZSV-11845
Related: ZSV-11310

Change-Id: I776f676a797473756a6f6166756c6a786f717869
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 1, 2026

概述

调整虚拟机主机文件级联删除逻辑,使得处于Created状态的虚拟机实例对应的备份文件和主机文件在删除时被强制删除,以避免遗留孤立文件。

变更内容

队列 / 文件 摘要
虚拟机级联删除逻辑优化
compute/src/main/java/org/zstack/compute/vm/devices/VmHostBackupFileCascadeExtension.java, compute/src/main/java/org/zstack/compute/vm/devices/VmHostFileCascadeExtension.java
调整级联路由逻辑,VM_INSTANCE_EXPUNGE_CODE路径直接调用handleDeletion。在决定是否延迟关联文件删除时,检测删除上下文中处于Created状态的虚拟机实例,若发现则禁用延迟并记录日志。在实际删除阶段查询引用处于Created状态虚拟机的文件行,强制将这些行的删除操作标记为强制删除(forceDelete=true),确保硬删除后无遗留文件。

预期代码审查工作量

🎯 3 (中等) | ⏱️ ~20 分钟

🐰 级联之路变通妙,Created态细审查,
强制删除护周全,孤独文件不再逃,
备份与主机双管齐下,
清洁流程更完美。 ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确描述了PR的核心变更:当处于created状态的VM被销毁时,强制删除host file。
Description check ✅ Passed 描述与变更集相关,指明了关联的问题编号(ZSV-11845、ZSV-11310)和同步来源信息。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/wenhao.zhang/c-2@@2

Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@compute/src/main/java/org/zstack/compute/vm/devices/VmHostBackupFileCascadeExtension.java`:
- Around line 75-80: The current logic in VmHostBackupFileCascadeExtension uses
hasCreatedVmInDeletionContext(action) to skip deferring backup-file deletion for
the entire action, which causes backup rows for non-Created VMs to be
prematurely removed; change handleDeletion(...) so that you only bypass defer
for backup rows belonging to Created VMs: use
hasCreatedVmInDeletionContext(action) and
formatCreatedVmUuidsFromContext(action) (or iterate VmDeletionStructs) to
collect Created VM UUIDs, and when generating deletion messages for
VmHostBackupFileVO resources only mark those whose resourceUuid matches a
Created VM UUID as non-deferred while preserving the original defer semantics
for all other resourceUuid entries.

In
`@compute/src/main/java/org/zstack/compute/vm/devices/VmHostFileCascadeExtension.java`:
- Around line 75-80: The current code aborts deferring for the whole action when
any Created VM is present; change this to per-VM handling: remove the early
"return false" triggered by hasCreatedVmInDeletionContext(action), instead
iterate the VmDeletionStruct list from voFromAction(action) and separate Created
VMs from others, log created UUIDs using
formatCreatedVmUuidsFromContext(action), skip/force immediate deletion of host
files for Created VMs (VmHostFileVO entries), and only call handleDeletion(...)
(or pass the filtered list) for VMs whose deletion policy still allows
Delay/Never. Adjust logic in VmHostFileCascadeExtension to operate on each
VmDeletionStruct (or a filtered collection) rather than on the whole action.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 46ec7606-5c1a-4aa6-a408-b171581e3517

📥 Commits

Reviewing files that changed from the base of the PR and between 81b7e11 and 6c19c4b.

📒 Files selected for processing (2)
  • compute/src/main/java/org/zstack/compute/vm/devices/VmHostBackupFileCascadeExtension.java
  • compute/src/main/java/org/zstack/compute/vm/devices/VmHostFileCascadeExtension.java

Comment on lines +75 to +80
if (hasCreatedVmInDeletionContext(action)) {
logger.info(String.format(
"VmHostBackupFileCascadeExtension: skip deferring backup-file deletion for Created VM(s): %s; "
+ "destroy uses DBOnly hard-delete without expunge, backup rows must be removed in this cascade",
formatCreatedVmUuidsFromContext(action)));
return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

不要让一个 Created VM 带着整批 backup file 一起绕过 defer。

这里同样是 action 级别的开关:只要这批 VmDeletionStruct 里出现 Created VM,就会让整批 action 继续走 handleDeletion(...),后面会给所有 resourceUuid 对应的 VmHostBackupFileVO 发删除消息。若同批还包含删除策略仍应 Delay/Never 的非 Created VM,它们的 backup file 也会被提前清理。建议只对 Created VM 对应的 backup rows 放开 defer,其余资源继续沿用原策略。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@compute/src/main/java/org/zstack/compute/vm/devices/VmHostBackupFileCascadeExtension.java`
around lines 75 - 80, The current logic in VmHostBackupFileCascadeExtension uses
hasCreatedVmInDeletionContext(action) to skip deferring backup-file deletion for
the entire action, which causes backup rows for non-Created VMs to be
prematurely removed; change handleDeletion(...) so that you only bypass defer
for backup rows belonging to Created VMs: use
hasCreatedVmInDeletionContext(action) and
formatCreatedVmUuidsFromContext(action) (or iterate VmDeletionStructs) to
collect Created VM UUIDs, and when generating deletion messages for
VmHostBackupFileVO resources only mark those whose resourceUuid matches a
Created VM UUID as non-deferred while preserving the original defer semantics
for all other resourceUuid entries.

Comment on lines +75 to +80
if (hasCreatedVmInDeletionContext(action)) {
logger.info(String.format(
"VmHostFileCascadeExtension: skip deferring host-file deletion for Created VM(s): %s; "
+ "destroy uses DBOnly hard-delete without expunge, host files must be removed in this cascade",
formatCreatedVmUuidsFromContext(action)));
return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

不要因为一个 Created VM 就对整批级联关闭 defer。

这里是按 action 维度放开 defer:只要 hasCreatedVmInDeletionContext(action) 命中一个 Created VM,后面的 voFromAction(action) / handleDeletion(...) 就会处理这批 List<VmDeletionStruct> 里的全部 VmHostFileVO。如果同一个 action 里还混有删除策略仍应 Delay/Never 的非 Created VM,它们的 host file 也会被提前删掉。这里需要按 VM 维度而不是按 action 维度放开 defer。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@compute/src/main/java/org/zstack/compute/vm/devices/VmHostFileCascadeExtension.java`
around lines 75 - 80, The current code aborts deferring for the whole action
when any Created VM is present; change this to per-VM handling: remove the early
"return false" triggered by hasCreatedVmInDeletionContext(action), instead
iterate the VmDeletionStruct list from voFromAction(action) and separate Created
VMs from others, log created UUIDs using
formatCreatedVmUuidsFromContext(action), skip/force immediate deletion of host
files for Created VMs (VmHostFileVO entries), and only call handleDeletion(...)
(or pass the filtered list) for VMs whose deletion policy still allows
Delay/Never. Adjust logic in VmHostFileCascadeExtension to operate on each
VmDeletionStruct (or a filtered collection) rather than on the whole action.

@MatheMatrix
Copy link
Copy Markdown
Owner

Comment from yaohua.wu:

Review: MR !9769 — ZSV-11845

业务上下文回顾

ZSV-11845 全局 rekey 失败的根因之一:当 VM 在 clone 失败等场景停留在 Created 状态时,destroyVmInstance 走 DBOnly 硬删除路径,但 VmHostBackupFileVO / VmHostFileVO 的级联 extension 在该路径下会 defer,等待 expunge 阶段触发清理;而 DBOnly 不经过 expunge,导致这两个 VO 残留、关联的 EncryptedResourceKeyRefVO 在 rekey 时找不到 origin resource 而失败。

本 MR 是该 issue 的第 6 个增量补丁(前序 !9751 / !9755 / !9765 / !9766 / !9767 已合入)。

修复正确性评估

维度 结论
修复方向 ✅ 正确:在 shouldDeferVmAssociatedDeletion 中提前为 Created 状态短路返回 false,让本次 cascade 立即处理;同时 findVmUuidsInCreatedState 用 DB 查询确保仅对 Created 状态 VM 关联行设 force=true
分支重构 VM_INSTANCE_EXPUNGE_CODE 独立分支后语义更清晰,expunge 路径不再走 defer 检查
与并行补丁兼容 ✅ 与 !9766 / !9767 / !13735 / !13740 / !13754 各自处理不同路径,无冲突;目标分支 zsv_5.0.0diverged_commits_count=0,可直接合并
Round trip 一致性 ✅ 两个文件改动镜像式对齐,VmHostFileVO.vmInstanceUuidVmHostBackupFileVO.resourceUuid 字段差异处理正确

发现的问题

🟡 Warning 1: 两个 cascade extension 中存在重复实现

hasCreatedVmInDeletionContextformatCreatedVmUuidsFromContextfindVmUuidsInCreatedState 在两个文件中几乎逐字重复,仅入参字段名不同。

位置:

  • compute/src/main/java/org/zstack/compute/vm/devices/VmHostBackupFileCascadeExtension.java:99-129, 205-213
  • compute/src/main/java/org/zstack/compute/vm/devices/VmHostFileCascadeExtension.java:99-129, 202-210

影响: 按 ZSV-11310 评论中的「后续:补全 cascade」规划,TpmCascade / VmHostCascade(NvRam) 等还会继续新增类似 cascade extension。当前模式会让重复扩散到更多文件。

建议:

  • 短期:本 MR 范围内提取 VmInstanceCascadeUtils 静态工具类(Created 状态判定 + 格式化日志)
  • 长期:在「补全 cascade」MR 中引入 AbstractVmAssociatedAsyncCascadeExtension(继承 AbstractAsyncCascadeExtension),把 Created 状态短路 + DB 查询 + force 标志判定全部上移到基类,子类只声明 vmUuidExtractor

如急于合入收敛 ZSV-11845,本 MR 可先合入,但请在后续「补全 cascade」MR 标题里显式列入 dedup 这一项。

🟢 Suggestion 1: shouldDeferVmAssociatedDeletion 的短路应该按 VO 粒度而非整批

当前实现:只要批次中任意一个 VM 是 Created,整批都不 defer。

位置:

  • VmHostBackupFileCascadeExtension.java:75-80
  • VmHostFileCascadeExtension.java:75-80

理论风险: 当级联从 Account/Cluster/Zone 等上层资源向下传递时,VmDeletionStruct 列表可能包含混合状态的 VM(一个 Created + 多个 Stopped/Running)。此时非 Created VM 的 host file 会随短路一起被立即清理掉,丢失 expunge 期内通过 recoverVmInstance 找回的可能。

实际触发概率: 较低 —— 同一次 cascade 的 action code 通常一致,且 Account 级删除一般不留 expunge 窗口;但代码层面无强制约束。

改进方向:保持 shouldDeferVmAssociatedDeletion 原行为,把 Created 短路下沉到 handleDeletion 内的 VO 过滤:

// 仅对 Created VM 的 VO 在非 expunge / 非 force-delete 路径上立即处理
if (!action.isActionCode(VM_INSTANCE_EXPUNGE_CODE)
        && !action.isActionCode(DELETION_FORCE_DELETE_CODE)
        && !createdVmUuids.contains(vo.getResourceUuid())) {
    // 跳过非 Created VM 的 VO,让它走 expunge 路径
    continue;
}

这样把"防御性 defer 保护"和"Created 状态 force 删除"解耦,避免改动副作用。

🟢 Suggestion 2: 缺少自动化回归测试

ZSV-11845 已经历多轮迭代(每一轮都因新发现的失败路径被打回测试),但 MR 未带任何 case。建议补一个最小回归 case:

  • 构造 VmHostBackupFileVO (resourceUuid = vmUuid) + VmInstanceVO(state=Created)
  • 调用 destroy 流程
  • 断言:VmHostBackupFileVO 已删 + 关联 EncryptedResourceKeyRefVO 已清

否则后续做「补全 cascade」重构时容易再次回归到本 MR 修复的场景。

🟢 Suggestion 3: findVmUuidsInCreatedState 加注释说明隐式过滤语义

VmHostBackupFileVO.resourceUuid 在不同上下文可能指向 VM/Snapshot/Backup 等不同资源类型;findVmUuidsInCreatedState 通过 IN VmInstanceVO.uuid 隐式过滤掉非 VM 类型的 resourceUuid。建议加一行 javadoc:

/**
 * Filter the candidate uuids to those that exist in VmInstanceVO and are in Created state.
 * Non-VM resourceUuids (e.g. Snapshot/Backup) are naturally excluded by the IN clause.
 */
private Set<String> findVmUuidsInCreatedState(Set<String> candidateUuids) { ... }

避免后续读者疑惑「为什么把 Snapshot 的 resourceUuid 也喂进 VmInstanceVO 的查询」。

Coverage

数量/状态
修改文件 2(皆 logic class)
修改行数 ~80(含注释和日志)
跨 persona 一致命中 Warning 1(duplication)独立可见,Suggestion 1 需要业务上下文判断
上游冲突
Pre-existing risks VmHostBackupFileVO.resourceUuid 字段语义混杂(非 VM-only)属于历史设计问题,本 MR 不引入

Verdict: APPROVED

修复方向正确、与已合入的并行补丁协调一致,无 P0/P1 级阻塞问题。Warning 1(代码重复)可在「补全 cascade」后续 MR 中处理;Suggestion 1-3 为防御性 / 测试 / 文档改进,可选采纳。

考虑到 ZSV-11845 是 P0 阻塞且本 MR 是收敛该 issue 的最后一个补丁(残留场景),建议尽快合入并启动测试验证;后续 cascade 重构请把本 MR 涉及的 Created-state 路径作为必测项。


🤖 Robot Reviewer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants