fix(primaryStorage): rollback persisted records on controller build failure#3868
fix(primaryStorage): rollback persisted records on controller build failure#3868zstack-robot-1 wants to merge 240 commits into5.5.6from
Conversation
…igration. support starting vm without nic in other clusters. Resolves/Related: ZSTAC-80468 Change-Id: I6370646f7a796265677a6a656c716f6867706d69
…isk offering Resolves: ZSTAC-74683 Change-Id: Id0339ed0221e92e506f60745cde972cc3ee6d9ae
When anti-split-brain check selects a disconnected MDS node, the HTTP call now times out after 30s instead of 5+ minutes, and automatically retries the next available MDS via tryNext mechanism. Resolves: ZSTAC-80595 Change-Id: I1be80f1b70cad1606eb38d1f0078c8f2781e6941
When MN restarts during a destroy operation, the hypervisor may report the VM as Stopped. Without this transition, the state machine throws an exception and the VM stays stuck in Destroying state forever. Resolves: ZSTAC-80620 Change-Id: I037edba70d145a44a88ce0d3573089182fedb162
…pacity Resolves: ZSTAC-79709 Change-Id: I45a2133bbb8c51c25ae3549d59e588976192a08d
Resolves: ZSTAC-78989 Change-Id: I0fe3a56ab724978944c69afadaab7ff7353e4c0f
Resolves: ZSTAC-82153 Change-Id: Ib51c2e21553277416d1a9444be55aca2aa4b2fc4
…fterAddIpAddress to prevent NPE during rollback Resolves: ZSTAC-81741 Change-Id: I53bcf20a10306afc7b6172da294d347b74e6c41f
Resolves: ZSTAC-81182 Change-Id: Id1bb642154dc66ae9995dcc4d9fc00cdce9bcaf8
<fix>[vm]: add Destroying->Stopped state transition See merge request zstackio/zstack!9156
<fix>[i18n]: improve snapshot error message for unattached volume See merge request zstackio/zstack!9192
<fix>[zbs]: reduce mds connect timeout and enable tryNext for volume clients See merge request zstackio/zstack!9153
<fix>[vm]: use max of virtual and actual size for root disk allocation See merge request zstackio/zstack!9155
…talling In dual management node scenarios, concurrent modifications to the consistent hash ring from heartbeat reconciliation and canonical event callbacks can cause NodeHash/Nodes inconsistency, leading to message routing failures and task timeouts. Fix: (1) synchronized all ResourceDestinationMakerImpl methods to ensure atomic nodeHash+nodes updates, (2) added lifecycleLock in ManagementNodeManagerImpl to serialize heartbeat reconciliation with event callbacks, (3) added two-round delayed confirmation before removing nodes from hash ring to avoid race with NodeJoin events. Resolves: ZSTAC-77711 Change-Id: I3d33d53595dd302784dff17417a5b25f2d0f3426
<fix>[network]: filter reserved IPs in getFreeIp See merge request zstackio/zstack!9170
<fix>[ceph]: apply over-provisioning ratio when releasing snapshot capacity See merge request zstackio/zstack!9162
<fix>[network]: add null check for L3 network system tags in API interceptor See merge request zstackio/zstack!9169
The mdsUrls field in ExternalPrimaryStorage config contains user:password@host format credentials. Add desensitization to mask credentials as ***@host in API/CLI output. Resolves: ZSTAC-80664 Change-Id: I94bdede5a1b52eb039de70efb5458693484405f7
Add ORG_ZSTACK_STORAGE_BACKUP_CANCEL_TIMEOUT constant to CloudOperationsErrorCode for use in premium volumebackup module. Resolves: ZSTAC-82195 Change-Id: Ibc405876e1171b637cf76b91a6822574fb6e7811
<fix>[core]: synchronize consistent hash ring to prevent dual-MN race condition See merge request zstackio/zstack!9154
SyncTaskFuture constructor calls Context.current() unconditionally, triggering ServiceLoader for ContextStorageProvider even when telemetry is disabled. If sentry-opentelemetry-bootstrap jar is on classpath, ServiceLoader fails with "not a subtype" due to ClassLoader isolation in Tomcat, throwing ServiceConfigurationError (extends Error) that escapes all catch(Exception) blocks. 1. Why is this change necessary? MN startup crashes with ORG_ZSTACK_CORE_WORKFLOW_10001 because Context.current() triggers ServiceLoader unconditionally in SyncTaskFuture constructor, even when telemetry is disabled. 2. How does it address the problem? Only call Context.current() when isTelemetryEnabled() returns true, matching the existing guard pattern used in other DispatchQueueImpl code paths (lines 351, 1069). 3. Are there any side effects? None. When telemetry is disabled, parentContext was never used. # Summary of changes (by module): - core/thread: conditionalize Context.current() in SyncTaskFuture Related: ZSTAC-82275 Change-Id: I5c0e1f15769c746c630028a29df8cf1815620608
<fix>[thread]: guard Context.current() with telemetry check See merge request zstackio/zstack!9202
Resolves: ZSTAC-82195 Change-Id: I3d5e91d09d7c088d3c53e3839f8b32f4bce32dec
<fix>[loadBalancer]: block SLB deletion during grayscale upgrade See merge request zstackio/zstack!9187
<fix>[volumebackup]: add backup cancel timeout error code See merge request zstackio/zstack!9200
<fix>[core]: add @nologging to sensitive config fields See merge request zstackio/zstack!9171
…plit-brain When a management node departs, its VM skip-trace entries were immediately removed. If VMs were still being started by kvmagent, the next VM sync would falsely detect them as Stopped and trigger HA, causing split-brain. Fix: transfer departed MN skip-trace entries to an orphaned set with 10-minute TTL instead of immediate deletion. VMs in the orphaned set remain skip-traced until the TTL expires or they are explicitly continued, preventing false HA triggers during MN restart scenarios. Resolves: ZSTAC-80821 Change-Id: I3222e260b2d7b33dc43aba0431ce59a788566b34
…anup Resolves: ZSTAC-80821 Change-Id: I59284c4e69f5d2ee357b1836b7c243200e30949a
Resolves: ZSTAC-77544 Change-Id: I1f711bff9c1e87a8cbf6a2eb310ca6086f0f99ba
Resolves: ZSTAC-80821 Change-Id: Ia9a9597feceb96b3e6e22259e2d0be7bde8ae499
Add virtiofs-based model mounting capability for user VMs: - Add VmModelMountVO database table and schema with hostUuid tracking - Add AttachVirtiofsCmd/DetachVirtiofsCmd for KVM agent - Add MountModelCenterCmd with storageUrl field - Add SDK actions for mount/unmount/query operations - Add error codes 10138-10149 Resolves: ZSTAC-83157 Change-Id: I746679736f7a7176646e646d797969766f697a76
Resolves: ZSTAC-83157 Change-Id: I62696a6d667468766a6575656763707374757277
- add dedicated container image pull error code mapping Resolves: ZSTAC-84175 Change-Id: Iabcdef1234567890abcdef1234567890abcdef12
Resolves: ZSTAC-83157 Change-Id: I756c7073707a6468676c70696a7a746870767777
<fix>[conf]: update global-error See merge request zstackio/zstack!9557
Revert the flat userdata hostname fallback so VM creation and reboot do not inject derived hostname metadata into userdata payloads when no hostname is configured. Add CreateRebootVmKeepUserdataContentCase to capture flat userdata apply requests during VM create and reboot, and assert the userdata list remains unchanged while hostname metadata stays absent. Resolves: ZSTAC-84529 Change-Id: I6f6e6b6b796e6e7a73727673626b7569696a6172
Expose requestCpu/requestMemory on the generated PodInventory so SDK callers can deserialize the fields returned by ZQL. Upgrade script adds matching columns on PodVO and backfills existing pods with request=limit for legacy-equivalent behaviour. Resolves: ZSTAC-80103 Change-Id: I5b2a9d4e8c6f3b7a1d0e9c5b4a8f2d7e6c1b3a9f
<feature>[dpu-bm2]: support attaching novlan and vxlan network to baremetal2 instance See merge request zstackio/zstack!9530
fix(network): keep userdata content immutable See merge request zstackio/zstack!9686
… redeploy Resolves: ZSTAC-84446 Change-Id: I8af5e3887a5bad286b43dda00c874c9de999e1cb
<fix>[kvm]: unify TLS cert IP collection to avoid reconnect-triggered redeploy See merge request zstackio/zstack!9683
Resolves: ZSTAC-84751 Change-Id: I84751b1c2d3e4f5a678901234567890abcdef1235
<fix>[aios]: add i18n mappings for fatal image pull error See merge request zstackio/zstack!9670
<fix>[sdk,db]: add PodInventory request fields See merge request zstackio/zstack!9682
…0165
Resolves: ZSTAC-84677
Related: ZSTAC-80103
Two overlapping problems are addressed in the ai module's global error
codes; both root from the same author reusing codes that were already
allocated for other semantics.
(1) ZSTAC-84677: AIModelManagerImpl model-service startup-timeout path
passed the string literal "ORG_ZSTACK_AI_10144" to operr(). That
code is owned by VmModelMountManager (mount path conflict, 5 %s
template); the timeout only supplies 3 formatArgs, so the UI
rendered a garbled "挂载失败:路径..." message for a health-check
timeout.
(2) ZSTAC-80103: error codes 10138-10143 had two conflicting owners:
- AIModelManagerApiInterceptor: CPU / memory request validation
(this is the canonical semantics per the AI error-code spec)
- VmModelMountManager: VM/model not-found, VM-state checks,
cross-account mount, ModelCenter not-found
The mount feature wrote zh_CN i18n templates with mount semantics,
so when the CPU/memory validation actually fired, users saw the
unrelated mount text with %s slots filled by CPU/memory args
(e.g. "虚拟机「1」(UUID: 4) 必须处于运行状态才能挂载模型,当前状态: 4"
for a CPU-below-min error).
This commit (zstack side):
- Add 7 new error-code constants in CloudOperationsErrorCode:
10159 (model service startup timeout)
10160-10165 (relocated mount-feature errors that previously squatted
on 10138-10143)
- Rewrite globalErrorCodeMapping entries for 10138-10143 to the
canonical CPU/memory validation semantics
- Add localized templates for 10159-10165 across all 10 locales
(zh_CN, en_US, zh_TW, ja-JP, ko-KR, de-DE, fr-FR, ru-RU, id-ID,
th-TH) so non-Chinese users see localized messages
The companion premium MR switches:
- AIModelManagerImpl startup-timeout call site to 10159
- VmModelMountManager 6 call sites from 10138-10143 to 10160-10165
Change-Id: I076f8da0b935a47cd4956e039e1e0b05206d085e
<fix>[sdk]: sync OSPF neighbor adjacency fields See merge request zstackio/zstack!9737
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
代码变更总结Walkthrough本次PR更新了版本号至5.5.16,实现了多项核心功能增强:主存储容错初始化、虚拟机静态IP和DNS管理、L3网络IPAM改进、KVM主机TLS迁移支持、错误代码国际化、外部服务配置API、以及广泛的数据库schema更新。 Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy`:
- Around line 729-753: Replace the Chinese comment block above the test method
testAddExternalPrimaryStorageWithMalformedJsonShouldRollback with an English
comment that explains: this test verifies AddExternalPrimaryStorage rolls back
when config contains malformed JSON (JSONObjectUtil.toObject throws), and no
ExternalPrimaryStorageVO/PrimaryStorageVO/PrimaryStorageOutputProtocolRefVO
records remain to avoid dirty VO causing buildPsController failures and
QueryPrimaryStorage 503s; update any inline Chinese in the same comment region
to concise, correct English while keeping references to ExternalPrimaryStorageVO
and the failure expectations intact.
- Around line 733-755: The test only checks ExternalPrimaryStorageVO but misses
asserting rollback for PrimaryStorageVO and PrimaryStorageOutputProtocolRefVO;
capture counts for PrimaryStorageVO and PrimaryStorageOutputProtocolRefVO (using
Q.New(PrimaryStorageVO.class).count() and
Q.New(PrimaryStorageOutputProtocolRefVO.class).count()) before the failing
addExternalPrimaryStorage call, then after the expect(AssertionError.class)
block assert those counts are unchanged and also assert no rows exist with name
"zbs-bad-json" in PrimaryStorageVO and no protocol refs tied to that primary
storage in PrimaryStorageOutputProtocolRefVO, so failures don't leave residual
rows across those related tables.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
Run ID: 1e13331a-5bab-4deb-bbc0-e3ac9ea06519
📒 Files selected for processing (2)
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.javatest/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy
Resolves: ZSTAC-84446 Change-Id: I9bed31c0cefddd6ed11f59cd13e36eb1c2abc029
<fix>[conf]: add ORG_ZSTACK_AI_10159 for model service startup timeout See merge request zstackio/zstack!9734
<fix>[kvm]: decouple TLS cert detection from libvirtd restart toggle See merge request zstackio/zstack!9741
f0dc8d3 to
04a51b2
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java (1)
80-92:⚠️ Potential issue | 🟠 Major缓存未命中时这里会返回
null。Line 88-89 把
Map.put()的返回值赋给了info,但put()返回的是旧值。这样首次从 DB 加载节点时,getNodeInfo()实际会返回null,而不是刚创建的NodeInfo;当前改动里的调用方会因此落到异常分支,其他调用方也可能直接触发 NPE。🔧 建议修复
if (info == null) { ManagementNodeVO vo = dbf.findByUuid(nodeUuid, ManagementNodeVO.class); if (vo == null) { throw new ManagementNodeNotFoundException(nodeUuid); } nodeHash.add(nodeUuid); - info = nodes.put(nodeUuid, new NodeInfo(vo)); + info = new NodeInfo(vo); + nodes.put(nodeUuid, info); } return info;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java` around lines 80 - 92, getNodeInfo currently assigns info = nodes.put(nodeUuid, new NodeInfo(vo)) which stores the new NodeInfo but returns the previous value (null on first insert), causing getNodeInfo to return null; change the logic in getNodeInfo so that after fetching ManagementNodeVO via dbf.findByUuid and creating new NodeInfo(vo), you put that new NodeInfo into the nodes map and then set info to that newly created NodeInfo (or retrieve it with nodes.get(nodeUuid)), ensuring the method returns the actual NodeInfo instance; ensure references to nodeHash.add(nodeUuid) remain correct and preserve throwing ManagementNodeNotFoundException when vo is null.conf/tools/install.sh (1)
143-151:⚠️ Potential issue | 🟠 Major升级路径下会继续复用旧的
zstack-dashboard虚拟环境。Line 145 这里只判断目录是否存在;如果主机上已经有旧的 Python 2
zstack-dashboardvenv,这里不会重建 3.11 环境,后面的activate和pip install就还会落到旧解释器里。前面的zstack-cli/zstack-ctl/zstack-sys已经统一走了ensure_python3_venv(),这里也应该保持一致。🛠️ 建议修改
elif [ $tool = 'zstack-dashboard' ]; then UI_VIRENV_PATH=/var/lib/zstack/virtualenv/zstack-dashboard [ ! -z $force ] && rm -rf $UI_VIRENV_PATH - if [ ! -d "$UI_VIRENV_PATH" ]; then - python3.11 -m venv $UI_VIRENV_PATH - if [ $? -ne 0 ]; then - rm -rf $UI_VIRENV_PATH - exit 1 - fi - fi + ensure_python3_venv "$UI_VIRENV_PATH" . $UI_VIRENV_PATH/bin/activate🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@conf/tools/install.sh` around lines 143 - 151, 当前逻辑只检查 UI_VIRENV_PATH 目录存在性,可能复用旧的 Python2 venv;请改为使用与其他组件一致的 ensure_python3_venv() 或在此处验证虚拟环境解释器版本:调用 $UI_VIRENV_PATH/bin/python(或 bin/python3)检查其版本是否为 Python 3.11,若不存在或版本不符则删除并重新创建 venv(referencing UI_VIRENV_PATH and ensure_python3_venv()); 保证后续 activate 与 pip 安装都使用新建的 Python 3.11 环境。plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java (1)
3182-3195:⚠️ Potential issue | 🟠 Major不要把迁移 TLS 绑定到
RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE。Line 3194-3195 这里会在
LIBVIRT_TLS_ENABLED=true但RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE=false时直接关闭迁移 TLS。后者是“重连时是否允许重启 libvirtd”的策略,不是“迁移链路是否加密”的开关;这样会悄悄改变已有集群行为,并把原本应走 TLS 的迁移降级成非 TLS。💡 建议修改
- cmd.setUseTls(KVMGlobalConfig.LIBVIRT_TLS_ENABLED.value(Boolean.class) - && rcf.getResourceConfigValue(KVMGlobalConfig.RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE, self.getUuid(), Boolean.class)); + cmd.setUseTls(KVMGlobalConfig.LIBVIRT_TLS_ENABLED.value(Boolean.class));As per coding guidelines "向后兼容原则 - 之前的代码产生的行为不要直接去改动,即使是 bug 也可能用户有依赖,不应直接改动,可以考虑升级用户一套行为、新安装用户一套行为,开关控制等".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 3182 - 3195, The code incorrectly binds migration TLS to KVMGlobalConfig.RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE when setting cmd.setUseTls in KVMHost.java; remove the dependency on RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE and decide TLS solely from the TLS-related config (e.g. KVMGlobalConfig.LIBVIRT_TLS_ENABLED) or a dedicated migration-TLS config if one exists—update the setUseTls(...) call to use only KVMGlobalConfig.LIBVIRT_TLS_ENABLED.value(Boolean.class) (or the proper migration-specific resource/config via rcf.getResourceConfigValue if such a key exists) and remove the rcf.getResourceConfigValue(..., self.getUuid(), Boolean.class) check that references RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE.network/src/main/java/org/zstack/network/l3/NormalIpRangeFactory.java (1)
73-95:⚠️ Potential issue | 🟠 Major回滚路径未恢复
UsedIpVO.ipRangeUuid,失败后会留下脏引用。Line 73-95 会把
ipRangeUuid == null的UsedIpVO绑定到新建 IP Range,但 Line 112-116 的回滚只删了IpRangeVO,没有把这些UsedIpVO复原。后续流程失败时会产生不一致数据。建议修复(记录并在 rollback 中恢复)
@@ - List<UsedIpVO> updateVos = new ArrayList<>(); + List<UsedIpVO> updateVos = new ArrayList<>(); + List<String> reboundUsedIpUuids = new ArrayList<>(); @@ ipvo.setIpRangeUuid(vo.getUuid()); updateVos.add(ipvo); + reboundUsedIpUuids.add(ipvo.getUuid()); @@ ipvo.setIpRangeUuid(vo.getUuid()); updateVos.add(ipvo); + reboundUsedIpUuids.add(ipvo.getUuid()); @@ if (!updateVos.isEmpty()) { dbf.updateCollection(updateVos); + List<String> allReboundUuids = (List<String>) data.computeIfAbsent("ReboundUsedIpUuids", k -> new ArrayList<String>()); + allReboundUuids.addAll(reboundUsedIpUuids); } @@ public void rollback(FlowRollback trigger, Map data) { + List<String> reboundUsedIpUuids = (List<String>) data.get("ReboundUsedIpUuids"); + if (reboundUsedIpUuids != null && !reboundUsedIpUuids.isEmpty()) { + SQL.New("update UsedIpVO u set u.ipRangeUuid = null where u.uuid in (:uuids)") + .param("uuids", reboundUsedIpUuids) + .execute(); + } List<IpRangeVO> vos = (List<IpRangeVO>) data.get("IpRangeVO"); dbf.removeCollection(vos, IpRangeVO.class); trigger.rollback(); }Also applies to: 112-116
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@network/src/main/java/org/zstack/network/l3/NormalIpRangeFactory.java` around lines 73 - 95, The code in NormalIpRangeFactory updates UsedIpVO.ipRangeUuid for matching records (see UsedIpVO, NetworkUtils/IPv6NetworkUtils checks and dbf.updateCollection) but the rollback path only deletes the IpRangeVO and does not restore those UsedIpVO entries, leaving dirty references; modify the create path to record the list of modified UsedIpVO UUIDs and their previous ipRangeUuid values (or at least mark which were changed), and in the rollback handler (the block that currently deletes IpRangeVO) iterate those recorded UUIDs to restore ipRangeUuid to null (or their prior value) via dbf.updateCollection/DAO so the UsedIpVO state is reverted on failure.
🟡 Minor comments (10)
header/src/main/java/org/zstack/header/vm/VmInstanceState.java-174-175 (1)
174-175:⚠️ Potential issue | 🟡 Minor请确认注释中的工单号是否写错。
Line 174 注释为
ZSTAC-80898,与当前 PR 目标中的ZSTAC-84817不一致;如果不是同一问题,后续排障和追溯会被误导。建议改为正确编号或更通用的原因描述。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/VmInstanceState.java` around lines 174 - 175, 注释中工单号可能写错:在 VmInstanceState 类的注释里将 “ZSTAC-80898” 与本 PR 目标 “ZSTAC-84817” 不一致;请打开 VmInstanceState 的注释并将工单号改为正确的编号(ZSTAC-84817)或替换为通用描述(例如“Expunging safety net — if expunge fails, allow recovery”),以确保追溯无歧义并保留原意。test/src/test/groovy/org/zstack/test/integration/configuration/systemTag/TagPatternResourceTypeCase.groovy-12-22 (1)
12-22:⚠️ Potential issue | 🟡 Minor类注释第 3 点与实际断言不一致,建议修正。
当前用例没有验证“在
prepareDbInitialValue()后回填resourceType = "ModelVO"”这一行为。建议补上对应测试步骤与断言,或删除/改写该条说明,避免误导。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/src/test/groovy/org/zstack/test/integration/configuration/systemTag/TagPatternResourceTypeCase.groovy` around lines 12 - 22, The class-level doc's third bullet claims old AI tags are backfilled with resourceType = "ModelVO" on the next prepareDbInitialValue() run but the test does not exercise or assert that; update the test or the comment: either add a test step that calls prepareDbInitialValue() (or the helper that triggers DB migration) and assert TagPatternVO.resourceType for upgraded tags becomes "ModelVO" (use the existing test utilities and query methods in TagPatternResourceTypeCase to locate the TagPatternVO), or remove/rewrite the third bullet to reflect the actual assertions present; reference prepareDbInitialValue(), TagPatternVO.resourceType and the test class TagPatternResourceTypeCase.groovy when making the change.search/src/main/java/org/zstack/query/QueryFacadeImpl.java-381-381 (1)
381-381:⚠️ Potential issue | 🟡 Minor建议保留完整异常上下文,避免仅返回
e.getMessage()。这里改成只拼接
e.getMessage()后抛出,会丢失堆栈信息;线上查询失败时很难定位根因。建议至少在此处记录完整异常堆栈(同时保留当前对外错误语义)。As per coding guidelines "在必要时,应继续抛出异常,使上层业务处理者可以转换为用户友好的错误提示。"可选修复示例
} catch (Exception e) { - throw new OperationFailureException(inerr("failed to query: %s", e.getMessage())); + logger.warn(String.format("failed to query for API[%s]", msg.getClass().getName()), e); + throw new OperationFailureException(inerr("failed to query: %s", + Optional.ofNullable(e.getMessage()).orElse(e.getClass().getSimpleName()))); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@search/src/main/java/org/zstack/query/QueryFacadeImpl.java` at line 381, The throw in QueryFacadeImpl that currently uses throw new OperationFailureException(inerr("failed to query: %s", e.getMessage())) loses the original stacktrace; change it to preserve the full exception context by attaching the caught exception as the cause (or logging the full stacktrace) when creating the OperationFailureException so the original Throwable e is available for debugging; locate the throw site in QueryFacadeImpl (the block catching the query exception) and pass e into the OperationFailureException constructor or call processLogger/error with the full exception before rethrowing.core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java-67-79 (1)
67-79:⚠️ Potential issue | 🟡 Minor不要将 ERROR 级别降级为 WARN。
Line 67 和 Line 76 当前把
ERROR记录为warn,会影响日志分级检索和告警准确性。建议ERROR -> logger.error,WARNING -> logger.warn。建议修改
- case ERROR: logger.warn("[Sentry] " + formatted); break; + case ERROR: logger.error("[Sentry] " + formatted); break; case WARNING: logger.warn("[Sentry] " + formatted); break; default: logger.info("[Sentry] " + formatted); break;- case ERROR: logger.warn("[Sentry] " + message, throwable); break; + case ERROR: logger.error("[Sentry] " + message, throwable); break; case WARNING: logger.warn("[Sentry] " + message, throwable); break; default: logger.info("[Sentry] " + message, throwable); break;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java` around lines 67 - 79, In SentryInitHelper update the log mappings so ERROR is logged at error level instead of warn: in both overridden methods log(SentryLevel level, String formatted) and log(SentryLevel level, String message, Throwable throwable) change the case for ERROR from logger.warn(...) to logger.error(...), keep WARNING mapped to logger.warn(...) and the default to logger.info(...); this preserves correct severity for Sentry ERROR events.core/src/main/java/org/zstack/core/db/DatabaseFacadeImpl.java-931-932 (1)
931-932:⚠️ Potential issue | 🟡 Minor日志文案语法有误,建议修正。
Line 931 当前
"not entity events will be fired"语法不正确,建议改为"no entity events will be fired",便于排障时准确理解。建议修改
- logger.warn(String.format("cannot find EntityInfo for the class[%s], not entity events will be fired", + logger.warn(String.format("cannot find EntityInfo for the class[%s], no entity events will be fired", entity.getClass()));As per coding guidelines "
代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/db/DatabaseFacadeImpl.java` around lines 931 - 932, The logger.warn call in DatabaseFacadeImpl (the warning for missing EntityInfo) contains a grammatical error: replace the message text in the logger.warn(String.format(...)) that currently ends with "not entity events will be fired" to read "no entity events will be fired" so the log becomes clear and grammatically correct when EntityInfo for the class (entity.getClass()) is missing.plugin/kvm/src/main/java/org/zstack/kvm/KVMHostUtils.java-42-44 (1)
42-44:⚠️ Potential issue | 🟡 Minor修正这里的告警日志占位内容。
日志文案写的是“fallback to mgmtIp”,但第二个
%s实际传入的是r.getExitErrorMessage()。线上排查时会把错误原因误看成回退 IP。建议修改
- logger.warn(String.format( - "ssh-collect host IPs failed on host[uuid:%s], fallback to mgmtIp: %s", - hostUuid, r.getExitErrorMessage())); + logger.warn(String.format( + "ssh-collect host IPs failed on host[uuid:%s], reason: %s, fallback to managementIp: %s", + hostUuid, r.getExitErrorMessage(), managementIp));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostUtils.java` around lines 42 - 44, The log message in KVMHostUtils.java incorrectly labels the second %s as "fallback to mgmtIp" while passing r.getExitErrorMessage(); update the logger.warn call so the placeholder matches the actual argument—either pass the mgmtIp variable if you meant to log the fallback IP, or change the text to something like "ssh-collect host IPs failed on host[uuid:%s], error: %s" to correctly show r.getExitErrorMessage(); ensure hostUuid and r.getExitErrorMessage() (or hostUuid and mgmtIp if you choose that) are the two arguments to String.format in the logger.warn invocation.plugin/portForwarding/src/main/java/org/zstack/network/service/portforwarding/PortForwardingManagerImpl.java-371-376 (1)
371-376:⚠️ Potential issue | 🟡 MinorLine 375 对空 IP 放行会返回不可绑定 NIC
这里建议把
nicIp == null直接过滤掉,避免 attachable 列表出现后续无法成功绑定的候选项。♻️ 建议修改
- filtered = filtered.stream().filter(nic -> { - String nicIp = nic.getIp(); - String nicL3 = nic.getL3NetworkUuid(); - return nicIp == null || IpRangeHelper.isIpInL3NetworkCidr(nicIp, nicL3); - }).collect(Collectors.toList()); + filtered = filtered.stream().filter(nic -> { + String nicIp = nic.getIp(); + String nicL3 = nic.getL3NetworkUuid(); + return nicIp != null && IpRangeHelper.isIpInL3NetworkCidr(nicIp, nicL3); + }).collect(Collectors.toList());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/portForwarding/src/main/java/org/zstack/network/service/portforwarding/PortForwardingManagerImpl.java` around lines 371 - 376, The current filter in PortForwardingManagerImpl that builds the attachable NIC list lets through NICs with null IPs (nic.getIp() == null), which later cannot be bound; change the predicate to exclude null IPs instead of allowing them by removing the nicIp == null branch so only NICs with a non-null nic.getIp() that satisfy IpRangeHelper.isIpInL3NetworkCidr(nicIp, nicL3) remain; update the stream/filter logic where "filtered = filtered.stream().filter(nic -> { ... }).collect(...)" to drop null-IP NICs.header/src/main/java/org/zstack/header/core/external/service/APIUpdateExternalServiceConfigurationMsgDoc_zh_cn.groovy-10-20 (1)
10-20:⚠️ Potential issue | 🟡 Minor补全这里的 API 文档描述。
desc现在还是占位文本,而且 request 的描述也是空的;这会直接生成一份几乎不可用的接口文档。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/core/external/service/APIUpdateExternalServiceConfigurationMsgDoc_zh_cn.groovy` around lines 10 - 20, Replace the placeholder desc and empty request description with meaningful Chinese documentation: update the top-level desc to briefly explain what APIUpdateExternalServiceConfigurationMsg does (e.g., 更新外部服务的配置信息并返回更新结果), and populate rest.request.desc to describe the HTTP method/URL semantics, required header (Authorization: OAuth the-session-uuid), the path parameter {uuid}, and the request body fields as defined on APIUpdateExternalServiceConfigurationMsg.class (list required vs optional fields, expected formats and an example payload); ensure you keep the existing url, header and clz entries but provide clear usage, validation rules, and a short success/failure behavior summary.compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java-409-425 (1)
409-425:⚠️ Potential issue | 🟡 Minor这里最好先 trim 并回写 DNS,再做校验。
dnsAddresses是用户输入,当前实现直接校验原值;像" 8.8.8.8 "这种输入会被误判,后续写 system tag 时也拿不到干净值。既然这里已经是统一校验入口,建议先做trim()和回写。💡 可参考的修正方向
private void validateDnsAddresses(List<String> dnsAddresses) { if (dnsAddresses == null || dnsAddresses.isEmpty()) { return; } if (dnsAddresses.size() > VmInstanceConstant.MAXIMUM_NIC_DNS_NUMBER) { @@ - for (String dns : dnsAddresses) { - if (!NetworkUtils.isIpv4Address(dns) && !IPv6NetworkUtils.isIpv6Address(dns)) { + ListIterator<String> iterator = dnsAddresses.listIterator(); + while (iterator.hasNext()) { + String dns = iterator.next(); + String trimmedDns = dns == null ? null : dns.trim(); + iterator.set(trimmedDns); + if (!NetworkUtils.isIpv4Address(trimmedDns) && !IPv6NetworkUtils.isIpv6Address(trimmedDns)) { throw new ApiMessageInterceptionException(argerr( - ORG_ZSTACK_COMPUTE_VM_10322, "invalid DNS address[%s], must be a valid IPv4 or IPv6 address", dns)); + ORG_ZSTACK_COMPUTE_VM_10322, "invalid DNS address[%s], must be a valid IPv4 or IPv6 address", dns)); } } }As per coding guidelines, “注意检查来自 Message 的参数是否做过 trim,用户可能在浏览器上复制粘贴的数据带有空格、换行符等。”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java` around lines 409 - 425, The validateDnsAddresses method should trim and overwrite entries in the dnsAddresses list before any validation so inputs like " 8.8.8.8 " are normalized; modify validateDnsAddresses to iterate dnsAddresses (e.g., using index or ListIterator), replace each element with its trimmed value (and remove empty strings if appropriate) prior to the size checks and calls to NetworkUtils.isIpv4Address / IPv6NetworkUtils.isIpv6Address, and then throw ApiMessageInterceptionException via argerr as currently done when a trimmed value is invalid.compute/src/main/java/org/zstack/compute/vm/StaticIpOperator.java-662-669 (1)
662-669:⚠️ Potential issue | 🟡 MinorStaticIpOperator 中对 IPv6 tag 编码方法的调用不一致。
IPv6NetworkUtils工具类定义了两个方法:ipv6AddessToTagValue(行 376,拼写错误)和ipv6AddressToTagValue(行 391,拼写正确)。StaticIpOperator 同文件中混用了两个版本:行 201、245 调用拼写错误的版本,而行 669 调用拼写正确的版本。应统一使用ipv6AddressToTagValue(正确拼写)来保持一致性,符合编码规范中的英文拼写要求。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/StaticIpOperator.java` around lines 662 - 669, In StaticIpOperator, replace all uses of the misspelled IPv6NetworkUtils.ipv6AddessToTagValue with the correctly spelled IPv6NetworkUtils.ipv6AddressToTagValue to ensure consistent encoding of IPv6 tag values; specifically update the calls at the locations where ipv6AddessToTagValue is referenced so they match the existing correct call used near the IPv6_GATEWAY instantiation and remove any lingering references to the misspelled symbol.
🧹 Nitpick comments (21)
header/src/main/java/org/zstack/header/identity/RBACInfo.java (1)
15-15: 请确认APIQueryAccountMsg改为管理员专属是否为预期(存在兼容性风险)
Line 15将APIQueryAccountMsg放入adminOnlyAPIs会直接收紧现有 API 权限,可能导致非管理员查询流程/自动化脚本回归。若这不是明确的权限策略变更,建议撤回该行;若是预期变更,请补充对应测试与变更说明。可选修正(若非预期权限收紧)
- APIQueryAccountMsg.class,As per coding guidelines, “向后兼容原则:之前的代码产生的行为不要直接去改动…应当对原状态做好记录,做好二次确认,做好回退准备。”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/identity/RBACInfo.java` at line 15, 将 APIQueryAccountMsg 加入 adminOnlyAPIs(在 RBACInfo 中)会收紧该 API 权限并可能破坏向后兼容;如果这不是有意的权限变更,请从 adminOnlyAPIs 中移除 APIQueryAccountMsg 的条目以恢复原行为;如果这是预期的策略改动,保留 APIQueryAccountMsg 并补充回归/权限测试、在变更说明中记录该兼容性破坏(并在 RBACInfo 处明确注明变更理由),同时准备回退步骤以便必要时撤回该修改。compute/src/main/java/org/zstack/compute/host/HostManagerImpl.java (1)
105-106: 建议消除新增魔法值,改为命名常量再组装跳过集合这里新增了两个超管类型字符串字面量,建议提取为独立常量(或复用已有类型常量)后再组装集合,后续扩展和排错更稳。
♻️ 可选改法
- private static final List<String> SKIP_ARCH_CHECK_HYPERVISOR_TYPES = Arrays.asList("baremetal2", "baremetal2Dpu"); + private static final String HYPERVISOR_TYPE_BAREMETAL2 = "baremetal2"; + private static final String HYPERVISOR_TYPE_BAREMETAL2_DPU = "baremetal2Dpu"; + private static final Set<String> SKIP_ARCH_CHECK_HYPERVISOR_TYPES = new HashSet<>( + Arrays.asList(HYPERVISOR_TYPE_BAREMETAL2, HYPERVISOR_TYPE_BAREMETAL2_DPU) + );As per coding guidelines, “避免使用魔法值(Magic Value):直接使用未经定义的数值或字符串应替换为枚举或常量。”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/host/HostManagerImpl.java` around lines 105 - 106, Replace the two inline hypervisor string literals with named constants and use those constants to build SKIP_ARCH_CHECK_HYPERVISOR_TYPES; specifically, declare descriptive constants (e.g., BAREMETAL2_HYPERVISOR_TYPE and BAREMETAL2_DPU_HYPERVISOR_TYPE or reuse existing type constants), then initialize SKIP_ARCH_CHECK_HYPERVISOR_TYPES via Arrays.asList(BAREMETAL2_HYPERVISOR_TYPE, BAREMETAL2_DPU_HYPERVISOR_TYPE) so the magic values are centralized for easier reuse and debugging (refer to SKIP_ARCH_CHECK_HYPERVISOR_TYPES to locate the usage).header/src/main/java/org/zstack/header/allocator/HostAllocatorSpec.java (1)
169-170: 建议在setAccountUuid内统一做trim归一化。该值来自 Message,若携带首尾空白,后续按 UUID 做比较/查询时可能出现隐性不一致。建议在 setter 一处收敛处理。
建议修改
public void setAccountUuid(String accountUuid) { - this.accountUuid = accountUuid; + this.accountUuid = accountUuid == null ? null : accountUuid.trim(); }As per coding guidelines: 注意检查来自 Message 的参数是否做过 trim,用户可能在浏览器上复制粘贴的数据带有空格、换行符等。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/allocator/HostAllocatorSpec.java` around lines 169 - 170, The setter setAccountUuid in HostAllocatorSpec should normalize incoming values by trimming whitespace to avoid mismatches when later comparing or querying UUIDs; update setAccountUuid(String accountUuid) to assign a null-safe trimmed value (e.g., set this.accountUuid = accountUuid == null ? null : accountUuid.trim()) so leading/trailing spaces or newlines from Message inputs are removed consistently.compute/src/main/java/org/zstack/compute/host/HostBase.java (1)
1445-1445: 建议使用稳定标识替代"unknown"作为兜底去重键。空指针保护是正确的;但统一回退到
"unknown"可能让不同主机在极端场景下共用同一个去重键。建议回退到实例级稳定标识(如id)以保持去重粒度。可选修改示例
- return String.format("connect-host-%s", self == null ? "unknown" : self.getUuid()); + return String.format("connect-host-%s", self == null ? id : self.getUuid());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/host/HostBase.java` at line 1445, The fallback literal "unknown" in the connect-host key can cause different hosts to share the same dedupe key; update the return in HostBase that builds the connect key so the fallback uses a stable instance-level identifier instead of the string "unknown" (e.g. use self.getId() or this.id when available) while preserving the existing null check around self and still formatting as "connect-host-%s"; adjust the ternary or surrounding logic in the method that contains the String.format(...) to return "connect-host-" + (self == null ? <instance-stable-id> : self.getUuid()) so each host has a unique fallback key.plugin/securityGroup/src/main/java/org/zstack/network/securitygroup/SecurityGroupApiInterceptor.java (1)
1181-1185: 这里可以顺手合并成一次判断。Line 1181 和 Line 1184 的条件主体与异常信息完全相同,只是分别判断 ingress/egress 数量;合成一次
toCreateIngressRuleCount > 0 || toCreateEgressRuleCount > 0的检查会更直观,也能减少后续漏改风险。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/securityGroup/src/main/java/org/zstack/network/securitygroup/SecurityGroupApiInterceptor.java` around lines 1181 - 1185, Combine the two identical priority checks in SecurityGroupApiInterceptor into a single if that tests msg.getPriority() against SecurityGroupGlobalConfig.SECURITY_GROUP_RULES_NUM_LIMIT.value(Integer.class) and also checks (toCreateIngressRuleCount > 0 || toCreateEgressRuleCount > 0); if true, throw the ApiMessageInterceptionException once (use the existing error message and one of the existing error codes) so the logic references msg.getPriority(), SecurityGroupGlobalConfig.SECURITY_GROUP_RULES_NUM_LIMIT.value(Integer.class), toCreateIngressRuleCount and toCreateEgressRuleCount and prevents duplicated conditions.test/src/test/groovy/org/zstack/test/integration/network/l2network/AttachL2NetworkCase.groovy (1)
107-135: 建议补齐 VLAN 空白字符串用例,并收紧失败断言。当前 VLAN 只覆盖了“不传 physicalInterface”,建议再补一个空白字符串(如
" "或"")场景;另外如果测试框架支持,建议断言具体错误码ORG_ZSTACK_NETWORK_L2_10021,避免被非目标异常“误通过”。🧪 可补充的最小用例
// creating L2 Vlan network without physicalInterface should fail when vSwitchType is LinuxBridge expect(AssertionError.class) { createL2VlanNetwork { name = "test-vlan-no-physical-interface" zoneUuid = zone.uuid vlan = 100 } } + + // creating L2 Vlan network with blank physicalInterface should also fail + expect(AssertionError.class) { + createL2VlanNetwork { + name = "test-vlan-blank-physical-interface" + zoneUuid = zone.uuid + vlan = 101 + physicalInterface = " " + } + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/src/test/groovy/org/zstack/test/integration/network/l2network/AttachL2NetworkCase.groovy` around lines 107 - 135, Add a new VLAN test case in testCreateL2NetworkWithoutPhysicalInterface to cover empty-string physicalInterface (e.g., physicalInterface = "" or " ") when calling createL2VlanNetwork, and change the broad expect(AssertionError.class) checks for createL2NoVlanNetwork/createL2VlanNetwork to assert the specific error code ORG_ZSTACK_NETWORK_L2_10021 (or the framework's way to match error codes) so the test only fails for the intended validation error.plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java (2)
247-248: 把999提取成有语义的常量
999这里是新的策略参数,但从上下文看不出它表示什么边界或配额。建议提取成命名常量,避免后续修改时误用。As per coding guidelines, "避免使用魔法值(Magic Value)。"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java` around lines 247 - 248, The magic value 999 passed to ExternalHostIdGetter(...) is unclear; replace it with a well-named constant (e.g., NEW_POLICY_PARAM_LIMIT or DEFAULT_HOST_ID_STRATEGY) declared near the class level in KvmIscsiNodeServer (or a related constants holder) and use that constant in the call to ExternalHostIdGetter.getOrAllocateHostIdRef(param.getHostUuid(), param.getPrimaryStorage().getUuid()); ensure the constant name documents what the value represents and add a brief comment describing the boundary/quota meaning.
240-243: 只读取hostId字段即可这里后续只用到了
hostId,没必要把整条ExternalPrimaryStorageHostRefVO实体加载出来。建议把查询收窄到单列,回退分支再从新分配的 ref 里取值即可,能少一次不必要的实体装载。As per coding guidelines, "仅查询实际需要的列,避免加载整个实体。"
Also applies to: 257-257
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/iscsi/src/main/java/org/zstack/iscsi/kvm/KvmIscsiNodeServer.java` around lines 240 - 243, The code currently loads the full ExternalPrimaryStorageHostRefVO via Q.New(...).eq(...).find() even though only hostId is used; change the query to select only the hostId column (use ExternalPrimaryStorageHostRefVO_.hostId) and fetch the scalar value (e.g., via the Q select/findValue API) or map it to a lightweight ref object, then use that hostId; apply the same change to the second occurrence around ExternalPrimaryStorageHostRefVO usage at the other location.core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java (1)
58-58: 避免在 Line 58 使用魔法值1000。建议将该值改为可配置项(例如新增
Telemetry.sentryMaxQueueSize),避免后续调优需要改代码发布。As per coding guidelines,直接使用未经定义的数值属于 Magic Value,应替换为常量或配置项。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/telemetry/SentryInitHelper.java` at line 58, Line uses magic value 1000 in options.setMaxQueueSize(1000); update SentryInitHelper to replace the literal with a configurable property (e.g., Telemetry.sentryMaxQueueSize) or a constant: add a new Telemetry configuration entry (sentryMaxQueueSize) with a sensible default and read it in SentryInitHelper to pass into options.setMaxQueueSize(...); ensure validation/defaulting when the config is absent and reference the Telemetry property name in logs/comments.core/src/main/java/org/zstack/core/db/DatabaseFacadeImpl.java (1)
911-915:uninstallEntityLifeCycleCallback对未知实体类静默忽略,容易掩盖调用错误。Line 911-915 若
clz不存在直接 no-op,而installEntityLifeCycleCallback是断言失败;建议保持一致,至少Assert或warn一次,避免回调泄漏问题被吞掉。建议修改
public void uninstallEntityLifeCycleCallback(Class clz, EntityEvent evt, EntityLifeCycleCallback cb) { if (clz != null) { EntityInfo info = entityInfoMap.get(clz); - if (info != null) { - info.uninstallLifeCycleCallback(evt, cb); - } + DebugUtils.Assert(info != null, String.format("cannot find EntityInfo for the class[%s]", clz)); + info.uninstallLifeCycleCallback(evt, cb); } else { for (EntityInfo info : entityInfoMap.values()) { info.uninstallLifeCycleCallback(evt, cb); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/db/DatabaseFacadeImpl.java` around lines 911 - 915, In DatabaseFacadeImpl, make uninstallEntityLifeCycleCallback mirror installEntityLifeCycleCallback’s error handling: when clz is null or entityInfoMap.get(clz) returns null, do not silently no-op—either assert or at minimum log a warning including the evt, cb and clz class name to surface misuse; then only call info.uninstallLifeCycleCallback(evt, cb) when info is non-null (EntityInfo and entityInfoMap are the referenced symbols).core/src/main/java/org/zstack/core/db/DatabaseFacade.java (1)
88-88: 为新增接口方法补充 Javadoc,明确行为边界。Line 88 新增了公共接口能力,但缺少方法注释;建议至少说明
entityClass == null时是否作用于全部实体、以及未命中回调时的行为。建议修改
void installEntityLifeCycleCallback(Class entityClass, EntityEvent evt, EntityLifeCycleCallback cb); + /** + * Unregister an entity lifecycle callback. + * `@param` entityClass target entity class; if null, unregister from all entity classes + * `@param` evt lifecycle event + * `@param` cb callback instance to remove + */ void uninstallEntityLifeCycleCallback(Class entityClass, EntityEvent evt, EntityLifeCycleCallback cb);As per coding guidelines "接口方法不应有多余的修饰符(例如
public),且必须配有有效的 Javadoc 注释。"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/db/DatabaseFacade.java` at line 88, Add a Javadoc to the DatabaseFacade#uninstallEntityLifeCycleCallback method that documents its exact behavior: state that this interface method has no extra access modifier (do not add `public`), explain that passing entityClass == null means the uninstall applies to all entity types, specify whether evt can be null and how it is treated (e.g., match all events or require non-null), and describe the no-op behavior when the specified callback cb is not found (e.g., silently return without error). Also document thread-safety expectations and any return/exception behavior so callers know the boundary cases.test/src/test/groovy/org/zstack/test/integration/kvm/vm/migrate/LibvirtTlsMigrateCase.groovy (1)
146-173: 用try/finally恢复 GlobalConfig,避免场景间串扰。这些场景都在同一个
env.create {}里串行跑;如果中间某个断言提前失败,后面的场景会继承错误的全局配置,排查时会出现级联失败。Also applies to: 178-203, 212-236, 241-272
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/src/test/groovy/org/zstack/test/integration/kvm/vm/migrate/LibvirtTlsMigrateCase.groovy` around lines 146 - 173, The test modifies global config keys (KVMGlobalConfig.LIBVIRT_TLS_ENABLED and KVMGlobalConfig.RECONNECT_HOST_RESTART_LIBVIRTD_SERVICE) but does not restore them on failure; wrap the config changes in each test (e.g., testMigrateWithTlsEnabled) with a try/finally: save the original values, update to the desired values before the scenario, run the test body, and always restore the originals in the finally block so other scenarios in the same env.create block don’t inherit mutated globals; apply the same try/finally restore pattern to the other TLS-related tests in this file referenced in the comment.plugin/kvm/src/main/java/org/zstack/kvm/KVMHostUtils.java (1)
140-146: 避免把三个布尔开关直接暴露在公共方法签名里。调用点很容易把
allowRestartLibvirtd和isNewAdded传反;后面如果再加条件,这个签名会继续变得难读。更稳妥的是收敛成一个小的上下文对象,或者把“是否允许重启”抽成更明确的枚举/策略。As per coding guidelines, “避免使用布尔型参数造成含义不明确。”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostUtils.java` around lines 140 - 146, The public method KVMHostUtils.shouldForceTlsRedeploy exposes three ambiguous boolean flags; refactor its signature to accept a small context object (e.g., TlsRedeployContext) or a clear enum/strategy instead of allowRestartLibvirtd and isNewAdded so callers cannot accidentally swap flags and future conditions remain readable; update the method to read from context.isNewlyAdded(), context.allowsLibvirtdRestart() (or a RedeployPolicy enum) and change callers to construct that context/enum before calling shouldForceTlsRedeploy.plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerBase.java (1)
1058-1063: 把这段 NIC CIDR 过滤抽成单一 helper。同样的过滤逻辑现在在两个 candidate API 里各写了一遍。前一处修完后,这里很容易继续保留旧语义,最终让两个候选接口返回不一致的结果。抽成一个公共 helper 后,后续只需要维护一处。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerBase.java` around lines 1058 - 1063, Extract the NIC CIDR filtering logic used in LoadBalancerBase into a single reusable helper method to avoid duplicate logic across candidate APIs: create a private/static method (e.g., filterNicsByL3Cidr or isNicInL3Cidr) referenced from LoadBalancerBase that encapsulates the stream filter currently applied to "nics" (the lambda using nic.getIp(), nic.getL3NetworkUuid() and IpRangeHelper.isIpInL3NetworkCidr), replace the inline filter with a call to this helper, and update the other candidate API to call the same helper so both candidate-returning paths share identical behavior.network/src/main/java/org/zstack/network/service/NetworkServiceManager.java (1)
21-29: 这个接口名和注释会误导调用方。
getVmNicDns看起来像是按vmNicUuid查询,但签名实际是vmUuid + l3NetworkUuid,Javadoc 也写成了 “for a VM NIC”。这是 public interface,后续实现和调用都很容易按错语义。建议至少把注释改成 “for a VM on the specified L3 network”;如果现在还没被广泛依赖,方法名也最好一起收敛到与签名一致。 As per coding guidelines, "命名应尽量用完整的单词组合表达意图,并在名称中体现数据类型或用途".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@network/src/main/java/org/zstack/network/service/NetworkServiceManager.java` around lines 21 - 29, The method getVmNicDns(vmUuid, l3NetworkUuid) and its Javadoc are misleading because the name implies a vmNicUuid but the signature takes a VM UUID and an L3 network UUID; rename the method to reflect the actual inputs (for example getVmDnsForL3Network or getVmDnsForVmOnL3Network) and update the Javadoc to read something like “Get DNS servers for a VM on the specified L3 network. Priority: VM NIC system tag > L3 Network DNS”; then refactor all implementations and callers of getVmNicDns to the new method name and adjust any unit tests and docs accordingly to keep behavior unchanged.plugin/applianceVm/src/main/java/org/zstack/appliancevm/ApplianceVmNicTO.java (1)
38-50: 不要在 ApplianceVm 的 IPv6 NIC 上兜底成SLAAC。ApplianceVm 的 NIC 按约束始终应绑定到
ipRangeUuid。这里新增的prefixLen + SLAAC兜底会把本该暴露的数据异常静默转成一个看似合法但可能错误的addressMode;一旦NormalIpRangeVO查不到,agent 侧拿到的就是错误的 IPv6 配置。更稳妥的是继续把NormalIpRangeVO作为addressMode的唯一真值源,ipRangeUuid == null时直接显式失败。Based on learnings In VirtualRouterManagerImpl.java (plugin/virtualRouterProvider/src/main/java/org/zstack/network/service/virtualrouter/VirtualRouterManagerImpl.java), ApplianceVMs (including virtual router VMs) are by design only allowed to have IPs within an IP range (ipRangeUuid is always non-null for appliance VM NICs).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/applianceVm/src/main/java/org/zstack/appliancevm/ApplianceVmNicTO.java` around lines 38 - 50, The code currently defaults addressMode to IPv6Constants.SLAAC when uip.getPrefixLen() is present, which can silently produce incorrect IPv6 configs for ApplianceVm NICs; change ApplianceVmNicTO handling so that ipRangeUuid is the authoritative source: remove the fallback that sets addressMode = IPv6Constants.SLAAC in the uip.getPrefixLen() branch, only set prefixLength/addressMode from NormalIpRangeVO when uip.getIpRangeUuid() is non-null (use ipRangeVO.getPrefixLen() and ipRangeVO.getAddressMode()), and if uip.getIpRangeUuid() is null return/throw an explicit error (fail fast) instead of silently defaulting; reference symbols: ApplianceVmNicTO, uip, prefixLength, addressMode, IPv6Constants.SLAAC, NormalIpRangeVO, ipRangeUuid.header/src/main/java/org/zstack/header/core/external/service/APIQueryExternalServiceConfigurationMsg.java (1)
16-18: 查询路径建议改成复数资源名。这是新 API,
/external/service/configuration现在还是单数形式,建议趁还没稳定前改成复数资源路径,避免后续对外兼容成本。As per coding guidelines,
path: 针对资源使用复数形式。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/core/external/service/APIQueryExternalServiceConfigurationMsg.java` around lines 16 - 18, The REST path uses singular resource names; update the RestRequest annotation on the APIQueryExternalServiceConfigurationMsg class to use plural resource names (e.g. change path="/external/service/configuration" to a plural form like "/external/services/configurations" and update optionalPaths accordingly such as "/external/services/configurations/{uuid}") so the API follows the plural-noun resource convention; ensure only the annotation values are changed and any references to the old path in tests or routing are updated to the new plural paths.header/src/main/java/org/zstack/header/core/progress/TaskProgressInventory.java (1)
21-22: 建议在TaskProgressInventory(TaskProgressVO vo)里同步填充progressDetail。现在这个新字段只能依赖调用方额外
setProgressDetail(),而同类的 VO 映射字段已经在构造函数里统一完成了。这样会让直接走该构造函数的路径始终拿不到progressDetail,同一个 inventory 在不同构造路径下输出不一致。♻️ 可选修改
public TaskProgressInventory(TaskProgressVO vo) { taskUuid = vo.getTaskUuid(); parentUuid = vo.getParentUuid(); type = vo.getType().toString(); if (vo.getOpaque() != null) { opaque = JSONObjectUtil.toObject(vo.getOpaque(), LinkedHashMap.class); } taskName = vo.getTaskName(); time = vo.getTime(); arguments = vo.getArguments(); + progressDetail = LongJobProgressDetailBuilder.fromTaskProgressVO(vo); }Also applies to: 111-117
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/core/progress/TaskProgressInventory.java` around lines 21 - 22, The TaskProgressInventory(TaskProgressVO vo) constructor currently doesn't populate the new progressDetail field, forcing callers to call setProgressDetail() separately; update that constructor to parse vo.getOpaque() (handling nulls and parse errors) into a LongJobProgressDetail instance and assign it to the progressDetail field so inventories built from VO contain the same typed detail as other constructors; use the same parsing utility or factory used elsewhere for LongJobProgressDetail (or catch/log JSON/parse exceptions and leave progressDetail null) and apply the same change to the other VO-based mapping code paths noted around the lines referenced (the existing setProgressDetail()/progressDetail references).plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerApiInterceptor.java (1)
660-682: 批量 NIC 校验里有明显的 N+1 查库。这两段都是先按
nicUuid逐个findByUuid(),再访问getUsedIps();批量加后端时会把拦截器路径放大成多轮数据库往返。更稳妥的做法是一次性按vmNicUuid查询UsedIpVO,再按 NIC 分组做 CIDR 校验。As per coding guidelines "禁止循环里套查询,避免嵌套查询带来的性能问题。"
Also applies to: 1614-1637
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/loadBalancer/src/main/java/org/zstack/network/service/lb/LoadBalancerApiInterceptor.java` around lines 660 - 682, The interceptor LoadBalancerApiInterceptor performs an N+1 DB lookup by calling dbf.findByUuid for each nicUuid and then accessing VmNicVO.getUsedIps(); fix it by batching the UsedIpVO fetch for all msg.getVmNicUuids at once (query UsedIpVO where vmNicUuid IN (...)), group the results by vmNicUuid, then iterate the grouped UsedIpVOs to run the existing IpRangeHelper.isIpOutsideL3NetworkCidr checks against lbVO (vipUuid/ipv6VipUuid) and throw the same ApiMessageInterceptionException when needed; replace per-nic dbf.findByUuid + getUsedIps logic with this batched query and grouping to avoid looped DB calls.core/src/main/java/org/zstack/core/errorcode/GlobalErrorCodeI18nServiceImpl.java (1)
102-106: 默认 locale 不要再硬编码成en_US。
getTemplate()和localizeErrorCode()现在各自维护了一套默认 locale 来源;后面只要有一边改了,回退行为就会悄悄分叉。这里直接复用LocaleUtils.DEFAULT_LOCALE会更稳。建议修改
- // fallback to en_US - if (!"en_US".equals(locale)) { - Map<String, String> enMessages = localeMessages.get("en_US"); + // fallback to default locale + if (!LocaleUtils.DEFAULT_LOCALE.equals(locale)) { + Map<String, String> enMessages = localeMessages.get(LocaleUtils.DEFAULT_LOCALE); if (enMessages != null) { return enMessages.get(globalErrorCode); } }As per coding guidelines
避免使用魔法值(Magic Value)。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/java/org/zstack/core/errorcode/GlobalErrorCodeI18nServiceImpl.java` around lines 102 - 106, The fallback hard-coded "en_US" literal should be replaced with the shared default locale constant to avoid divergent behavior; update GlobalErrorCodeI18nServiceImpl (specifically the localizeErrorCode() fallback) to use LocaleUtils.DEFAULT_LOCALE instead of "en_US" so getTemplate() and localizeErrorCode() rely on the same default source and eliminate the magic value.test/src/test/groovy/org/zstack/test/integration/core/ErrorCodeI18nCase.groovy (1)
146-180: 补一条“已加载映射”的断言会更稳。这几条 case 都是直接
new GlobalErrorCodeI18nServiceImpl(),没有走start(),所以目前只覆盖到了message的兜底路径。建议再补一条初始化后的用例,至少命中一次真实的globalErrorCode -> localized template映射,并顺手覆盖ErrorCodeList.causes的递归分支。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/src/test/groovy/org/zstack/test/integration/core/ErrorCodeI18nCase.groovy` around lines 146 - 180, 补充一个初始化后的测试用例:new 一个 GlobalErrorCodeI18nServiceImpl 后调用 start() 以加载映射,然后构造至少一个 ErrorCode(例如 code 在全局映射中存在)并给它设置 causes(递归链上的 ErrorCode),调用 localizeErrorCode(mid, "en_US"),断言 mid.getMessage() 返回映射到的本地化模板而不是 description/details,同时断言链上每个 cause 的 getMessage() 都按映射或兜底规则被正确填充;参考符号:GlobalErrorCodeI18nServiceImpl.start(), GlobalErrorCodeI18nServiceImpl.localizeErrorCode(), ErrorCode.setCause()/getMessage() 和 ErrorCodeList.causes。
| output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s -vvvv --private-key %s -e '\\''%s'\\' | tee -a %s", | ||
| AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments), AnsibleConstant.LOG_PATH), | ||
| AnsibleConstant.ROOT_DIR); | ||
| } else if (AnsibleGlobalProperty.DEBUG_MODE) { | ||
| output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s -vvvv --private-key %s -e '\\''%s'\\'", | ||
| output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s -vvvv --private-key %s -e '\\''%s'\\'", | ||
| AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)), | ||
| AnsibleConstant.ROOT_DIR); | ||
| } else { | ||
| output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s --private-key %s -e '\\''%s'\\'", | ||
| output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s --private-key %s -e '\\''%s'\\'", | ||
| AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)), |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
在 Ansible 官方文档里,ansible/ansible-playbook的-B 选项是什么意思?它是否能替代原本放置 playbook 路径的位置参数?
💡 Result:
在 Ansible 官方文档中,ansible 和 ansible-playbook 的 -B 选项含义不同。 对于 ansible 命令(ad hoc 模式),-B 或 --background SECONDS 表示异步运行操作,并在指定秒数后失败(默认 N/A)。它与 -P(轮询间隔)一起使用,用于异步任务。 对于 ansible-playbook,官方文档中没有 -B 选项。-b 或 --become 用于以 become(特权提升)运行操作,不暗示密码提示。 -B 不能替代 playbook 路径的位置参数。playbook 路径是 ansible-playbook 的必需位置参数,而 -B 是 ansible 的异步选项,不适用于 playbook 执行,且两者语法不同。
Citations:
- 1: https://docs.ansible.com/ansible/latest/cli/ansible.html
- 2: https://docs.ansible.com/projects/ansible/latest/cli/ansible-playbook.html
移除 -B 选项,playBookPath 应作为位置参数传递。
Ansible 官方文档确认:-B 是 ansible 命令(ad hoc 模式)的异步超时参数,ansible-playbook 命令中不存在该选项。playbook 路径是 ansible-playbook 的必需位置参数。第 260、264、268 行将 playBookPath 传递给 -B %s 导致语法错误——playbook 参数丢失,所有三个分支都会在运行时失败。应移除 -B,直接将 playbook 路径作为可执行文件后的位置参数。
修改方案
- output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s -vvvv --private-key %s -e '\\''%s'\\' | tee -a %s",
+ output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s -vvvv --private-key %s -e '\\''%s'\\' | tee -a %s",
AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments), AnsibleConstant.LOG_PATH),
AnsibleConstant.ROOT_DIR);
} else if (AnsibleGlobalProperty.DEBUG_MODE) {
- output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s -vvvv --private-key %s -e '\\''%s'\\'",
+ output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s -vvvv --private-key %s -e '\\''%s'\\'",
AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)),
AnsibleConstant.ROOT_DIR);
} else {
- output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s --private-key %s -e '\\''%s'\\'",
+ output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s --private-key %s -e '\\''%s'\\'",
AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)),
AnsibleConstant.ROOT_DIR);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s -vvvv --private-key %s -e '\\''%s'\\' | tee -a %s", | |
| AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments), AnsibleConstant.LOG_PATH), | |
| AnsibleConstant.ROOT_DIR); | |
| } else if (AnsibleGlobalProperty.DEBUG_MODE) { | |
| output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s -vvvv --private-key %s -e '\\''%s'\\'", | |
| output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s -vvvv --private-key %s -e '\\''%s'\\'", | |
| AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)), | |
| AnsibleConstant.ROOT_DIR); | |
| } else { | |
| output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s --private-key %s -e '\\''%s'\\'", | |
| output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s -B %s -i %s --private-key %s -e '\\''%s'\\'", | |
| AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)), | |
| output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s -vvvv --private-key %s -e '\\''%s'\\' | tee -a %s", | |
| AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments), AnsibleConstant.LOG_PATH), | |
| AnsibleConstant.ROOT_DIR); | |
| } else if (AnsibleGlobalProperty.DEBUG_MODE) { | |
| output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s -vvvv --private-key %s -e '\\''%s'\\'", | |
| AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)), | |
| AnsibleConstant.ROOT_DIR); | |
| } else { | |
| output = ShellUtils.run(String.format("bash -c '. /var/lib/zstack/virtualenv/zstacksys/bin/activate; PYTHONPATH=%s timeout %d %s %s -i %s --private-key %s -e '\\''%s'\\'", | |
| AnsibleConstant.ZSTACKLIB_ROOT, timeout, executable, playBookPath, AnsibleConstant.INVENTORY_FILE, msg.getPrivateKeyFile(), JSONObjectUtil.dumpPretty(arguments)), | |
| AnsibleConstant.ROOT_DIR); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@core/src/main/java/org/zstack/core/ansible/AnsibleFacadeImpl.java` around
lines 260 - 269, In AnsibleFacadeImpl update the three ShellUtils.run format
strings to stop using the invalid "-B %s" option and instead pass playBookPath
as a positional argument to the ansible-playbook command; specifically remove
the "-B %s" token from the format templates (including the LOG_PATH, DEBUG_MODE
and default branches) and place %s for playBookPath immediately after the
executable variable in each String.format call, adjusting the format arguments
order so JSONObjectUtil.dumpPretty(arguments) and other params remain correct.
| } catch (Throwable t) { | ||
| logger.warn(String.format("[%s] failed to calculate result for item %s", getName(), req.item), t); | ||
| req.notifyFailure(operr(ORG_ZSTACK_CORE_THREAD_10003, "failed to calculate result: %s", t.getMessage())); | ||
| } | ||
| } | ||
| cleanup(syncSignature); | ||
| chain.next(); | ||
| } | ||
|
|
||
| protected final void handleFailure(String syncSignature, List<PendingRequest> requests, ErrorCode errorCode, SyncTaskChain chain) { | ||
| for (PendingRequest req : requests) { | ||
| req.notifyFailure(errorCode); | ||
| } | ||
| cleanup(syncSignature); | ||
| chain.next(); |
There was a problem hiding this comment.
单个 completion.fail() 抛异常会把整条 chain 卡死。
handleFailure() 以及 handleSuccess() 的降级分支里,req.notifyFailure(...) 没有隔离。任一回调抛出后,后续请求不会继续通知,cleanup(syncSignature) 和 chain.next() 也执行不到。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@core/src/main/java/org/zstack/core/thread/AbstractCoalesceQueue.java` around
lines 111 - 125, The failure is that a thrown exception from any
req.notifyFailure / req.notifySuccess can abort the loop and prevent
cleanup(syncSignature) and chain.next() from running; update methods
handleFailure and handleSuccess (and any downgrade branches where
PendingRequest.notifyFailure/notifySuccess or completion.fail is called) to wrap
each per-request callback invocation in its own try/catch so a single bad
callback cannot stop notifying remaining requests, and ensure
cleanup(syncSignature) and chain.next() are executed afterwards (use a finally
or guarantee after the loop) so the sync chain always advances.
| // createIfAbsent: DB has no record → write; DB has record → return DB value | ||
| JsonLabelInventory caInv = new JsonLabel().createIfAbsent(LIBVIRT_TLS_CA_KEY, ca); | ||
| JsonLabelInventory keyInv = new JsonLabel().createIfAbsent(LIBVIRT_TLS_PRIVATE_KEY, key); | ||
|
|
||
| // Use DB as source of truth — overwrite local files (HA: MN2 uses MN1's CA from DB) | ||
| FileUtils.writeStringToFile(caFile, caInv.getLabelValue()); | ||
| FileUtils.writeStringToFile(keyFile, keyInv.getLabelValue()); |
There was a problem hiding this comment.
CA 与私钥分两次 createIfAbsent() 会在多 MN 并发启动时写出错配对。
libvirtTLSCA 和 libvirtTLSPrivateKey 是两个独立键,而且这里分别调用两次 createIfAbsent()。如果两个管理节点同时首次初始化,完全可能出现“CA 来自 MN1、私钥来自 MN2”的交叉写入,随后所有节点都会把这组不匹配的 PEM 覆盖到本地,TLS 会直接失效。这里需要把 CA/私钥作为一个原子对象持久化,或至少放进同一个事务/同一条 label 里。
一个更安全的方向
- private static final String LIBVIRT_TLS_CA_KEY = "libvirtTLSCA";
- private static final String LIBVIRT_TLS_PRIVATE_KEY = "libvirtTLSPrivateKey";
+ private static final String LIBVIRT_TLS_CA_BUNDLE_KEY = "libvirtTLSCABundle";
- JsonLabelInventory caInv = new JsonLabel().createIfAbsent(LIBVIRT_TLS_CA_KEY, ca);
- JsonLabelInventory keyInv = new JsonLabel().createIfAbsent(LIBVIRT_TLS_PRIVATE_KEY, key);
- FileUtils.writeStringToFile(caFile, caInv.getLabelValue());
- FileUtils.writeStringToFile(keyFile, keyInv.getLabelValue());
+ String bundleValue = JSONObjectUtil.toJsonString(new LibvirtTlsCaBundle(ca, key));
+ JsonLabelInventory bundleInv = new JsonLabel().createIfAbsent(LIBVIRT_TLS_CA_BUNDLE_KEY, bundleValue);
+ LibvirtTlsCaBundle bundle = JSONObjectUtil.toObject(bundleInv.getLabelValue(), LibvirtTlsCaBundle.class);
+ FileUtils.writeStringToFile(caFile, bundle.ca);
+ FileUtils.writeStringToFile(keyFile, bundle.privateKey);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java` around lines 501
- 507, The two separate createIfAbsent calls for LIBVIRT_TLS_CA_KEY and
LIBVIRT_TLS_PRIVATE_KEY in KVMHostFactory risk cross-writes under concurrent MN
startup; instead persist CA+private-key atomically by writing a single label
(e.g., combine ca and key into one JSON/PEM payload) via one
JsonLabel.createIfAbsent call or perform both writes inside a single DB
transaction/synchronized upsert so both values are inserted or read together;
then read that single label and split to caFile and keyFile (replace references
to separate LIBVIRT_TLS_CA_KEY and LIBVIRT_TLS_PRIVATE_KEY usages with the
combined label access).
04a51b2 to
29ef261
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java`:
- Around line 183-190: The catch block after calling saveControllerIfNeed(vo)
can leave a half-initialized state because saveControllerIfNeed() inserts into
controllers before fully creating nodes; update the error handling so that on
any Throwable you remove the partially inserted entries for that UUID from both
controllers and nodes (use vo.getUuid()) to avoid skipping future retries, or
preferably refactor saveControllerIfNeed() to fully construct controller and
node services off-map and only put them into the controllers and nodes maps in
one atomic step once both are successfully created; modify
ExternalPrimaryStorageFactory's catch to perform the cleanup if you choose the
minimal fix.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
Run ID: d80f4d12-31b4-4ae0-8d9c-eb9818881cc8
📒 Files selected for processing (1)
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java
| if (controllers.containsKey(vo.getUuid())) { | ||
| continue; | ||
| } | ||
| try { | ||
| saveControllerIfNeed(vo); | ||
| } catch (Throwable t) { | ||
| logger.warn(String.format("skip building controller for primary storage[uuid:%s, identity:%s]; service stays up so other primary storages remain usable", | ||
| vo.getUuid(), vo.getIdentity()), t); |
There was a problem hiding this comment.
失败分支需要清理半初始化的 controllers / nodes 状态。
saveControllerIfNeed() 会先把对象放进 controllers,再去构建/写入 nodes。如果异常发生在这两步之间,这里的 catch 只记录日志,不回滚内存状态;而 Line 183 下次又会因为 controllers.containsKey() 直接跳过重建,最终把这个 PS 卡在“有 controller、无 nodeSvc”的半初始化状态。
建议至少在这里同步移除两个 map 中的该 UUID;更稳妥的是把 saveControllerIfNeed() 改成先完整构建两个 svc,再一次性发布到 map。
💡 最小修复示例
try {
saveControllerIfNeed(vo);
} catch (Throwable t) {
+ controllers.remove(vo.getUuid());
+ nodes.remove(vo.getUuid());
logger.warn(String.format("skip building controller for primary storage[uuid:%s, identity:%s]; service stays up so other primary storages remain usable",
vo.getUuid(), vo.getIdentity()), t);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java`
around lines 183 - 190, The catch block after calling saveControllerIfNeed(vo)
can leave a half-initialized state because saveControllerIfNeed() inserts into
controllers before fully creating nodes; update the error handling so that on
any Throwable you remove the partially inserted entries for that UUID from both
controllers and nodes (use vo.getUuid()) to avoid skipping future retries, or
preferably refactor saveControllerIfNeed() to fully construct controller and
node services off-map and only put them into the controllers and nodes maps in
one atomic step once both are successfully created; modify
ExternalPrimaryStorageFactory's catch to perform the cleanup if you choose the
minimal fix.
29ef261 to
6a403bf
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java (1)
183-190:⚠️ Potential issue | 🟠 Major启动期失败分支仍会留下半初始化的 controller 状态。
saveControllerIfNeed()先写controllers,再补nodes。如果在这两步之间抛异常,Line 183-185 的新 guard 之后会一直continue,这个 UUID 就不会再重建成功了。这里至少要在catch里同步清掉两个 map;更稳妥的是把两个 svc 都构造完成后再一次性发布到 map。💡 最小修复示例
try { saveControllerIfNeed(vo); } catch (Throwable t) { + controllers.remove(vo.getUuid()); + nodes.remove(vo.getUuid()); logger.warn(String.format("skip building controller for primary storage[uuid:%s, identity:%s]; service stays up so other primary storages remain usable", vo.getUuid(), vo.getIdentity()), t); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java` around lines 183 - 190, The current saveControllerIfNeed() flow can leave a partially-initialized controller in the controllers map if an exception occurs between writing controllers and nodes, causing the early guards (controllers.containsKey(vo.getUuid())) to permanently skip reconstruction; fix by making publication atomic: change the logic in ExternalPrimaryStorageFactory so you fully construct the controller and its node(s) off-map and only put them into the controllers and nodes maps together in one step, or if you prefer a minimal fix, ensure the catch block removes any partial entries from both controllers and nodes for vo.getUuid() before continuing; reference saveControllerIfNeed(), controllers, nodes and the catch block around the try in ExternalPrimaryStorageFactory.
🧹 Nitpick comments (1)
search/src/main/java/org/zstack/zql/ZQL.java (1)
120-124: 考虑捕获Exception而非Throwable当前捕获
Throwable会同时捕获Error类型(如OutOfMemoryError、StackOverflowError),这些严重错误通常应当向上传播而非被静默处理。建议改为捕获Exception以保持对严重系统错误的敏感性。♻️ 建议修改
- } catch (Throwable t) { + } catch (Exception e) { logger.warn(String.format("skipping a VO that failed inventory conversion in query[target:%s, vo:%s]; the rest of the result is unaffected", astResult != null && astResult.inventoryMetadata != null ? astResult.inventoryMetadata.fullInventoryName() : "?", - identifyVoForLog(vo)), t); + identifyVoForLog(vo)), e); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@search/src/main/java/org/zstack/zql/ZQL.java` around lines 120 - 124, The catch currently swallows all Throwables; change the catch clause in ZQL (the block that handles inventory conversion failures) from "catch (Throwable t)" to "catch (Exception e)" and update the logger call to use the new exception variable (e) so only non-fatal exceptions are logged/handled while Errors can propagate. Ensure the surrounding code (the try that contains identifyVoForLog(vo) and astResult usage) remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java`:
- Around line 183-190: The current saveControllerIfNeed() flow can leave a
partially-initialized controller in the controllers map if an exception occurs
between writing controllers and nodes, causing the early guards
(controllers.containsKey(vo.getUuid())) to permanently skip reconstruction; fix
by making publication atomic: change the logic in ExternalPrimaryStorageFactory
so you fully construct the controller and its node(s) off-map and only put them
into the controllers and nodes maps together in one step, or if you prefer a
minimal fix, ensure the catch block removes any partial entries from both
controllers and nodes for vo.getUuid() before continuing; reference
saveControllerIfNeed(), controllers, nodes and the catch block around the try in
ExternalPrimaryStorageFactory.
---
Nitpick comments:
In `@search/src/main/java/org/zstack/zql/ZQL.java`:
- Around line 120-124: The catch currently swallows all Throwables; change the
catch clause in ZQL (the block that handles inventory conversion failures) from
"catch (Throwable t)" to "catch (Exception e)" and update the logger call to use
the new exception variable (e) so only non-fatal exceptions are logged/handled
while Errors can propagate. Ensure the surrounding code (the try that contains
identifyVoForLog(vo) and astResult usage) remains unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
Run ID: b2ff67e7-c3d9-446e-8080-769ae7c05512
📒 Files selected for processing (3)
search/src/main/java/org/zstack/zql/ZQL.javastorage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.javastorage/src/main/java/org/zstack/storage/primary/PrimaryStorageApiInterceptor.java
… marshal time
Per ZSTAC-84817, malformed JSON in APIAddExternalPrimaryStorage.config slipped
past the API boundary and surfaced as SYS.1000 InternalError from a deep Gson
parse, after the message had already reached the cloud-bus handler.
Fix:
* PrimaryStorageApiInterceptor now validates msg.getConfig() with org.json
(strict — rejects unquoted keys / missing colons that Gson lenient mode
silently tolerates) and throws ApiMessageInterceptionException with
ORG_ZSTACK_STORAGE_PRIMARY_10053 if it isn't a valid JSON object.
* conf/serviceConfig/externalPrimaryStorage.xml wires
PrimaryStorageApiInterceptor at service level so the interceptor actually
runs for APIAddExternalPrimaryStorageMsg (the existing wiring in
primaryStorage.xml does not cover messages declared in this file).
* CloudOperationsErrorCode.ORG_ZSTACK_STORAGE_PRIMARY_10053 constant added.
Test: ZbsPrimaryStorageCase gains
testAddExternalPrimaryStorageWithMalformedJsonRejectedByInterceptor — sends
"{this is not valid json" via API, asserts the call fails AND no
PrimaryStorageVO/ExternalPrimaryStorageVO/PrimaryStorageOutputProtocolRefVO
row leaks. Verified locally: Tests run: 1, Failures: 0, Errors: 0,
Time elapsed: 92.788 sec, BUILD SUCCESS.
Resolves: ZSTAC-84817
Change-Id: Icf6c648133d7866edf35940d56a28f74f4c64817
6a403bf to
f8e039a
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
storage/src/main/java/org/zstack/storage/primary/PrimaryStorageApiInterceptor.java (1)
91-102: 建议在 Line [92]-Line [97] 先标准化config并回写消息体。当前只在判空时用
trim();建议统一trim后msg.setConfig(config),减少后续处理链对首尾空白的重复处理与歧义。♻️ 建议修改
private void validate(APIAddExternalPrimaryStorageMsg msg) { String config = msg.getConfig(); - if (config == null || config.trim().isEmpty()) { + if (config == null) { + return; + } + config = config.trim(); + if (config.isEmpty()) { return; } + msg.setConfig(config); try { new JSONObject(config); } catch (JSONException e) { throw new ApiMessageInterceptionException(argerr(ORG_ZSTACK_STORAGE_PRIMARY_10053, "config is not a valid JSON object: %s", e.getMessage())); } }As per coding guidelines
**/*Interceptor.java: 注意检查来自 Message 的参数是否做过 trim,用户可能在浏览器上复制粘贴的数据带有空格、换行符等。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@storage/src/main/java/org/zstack/storage/primary/PrimaryStorageApiInterceptor.java` around lines 91 - 102, In validate(APIAddExternalPrimaryStorageMsg msg) trim and normalize the incoming config before further checks: obtain config, if non-null then do config = config.trim(); call msg.setConfig(config) to write the normalized value back into the message, then proceed with the existing empty check and JSON parse (in the validate method of PrimaryStorageApiInterceptor) so downstream handlers see the trimmed config.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@storage/src/main/java/org/zstack/storage/primary/PrimaryStorageApiInterceptor.java`:
- Around line 91-102: In validate(APIAddExternalPrimaryStorageMsg msg) trim and
normalize the incoming config before further checks: obtain config, if non-null
then do config = config.trim(); call msg.setConfig(config) to write the
normalized value back into the message, then proceed with the existing empty
check and JSON parse (in the validate method of PrimaryStorageApiInterceptor) so
downstream handlers see the trimmed config.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
Run ID: b80f17df-d3b3-42e6-90af-a6110ea2696e
⛔ Files ignored due to path filters (1)
conf/serviceConfig/externalPrimaryStorage.xmlis excluded by!**/*.xml
📒 Files selected for processing (1)
storage/src/main/java/org/zstack/storage/primary/PrimaryStorageApiInterceptor.java
Summary
AddExternalPrimaryStorage with malformed JSON config left dirty rows in DB, breaking PrimaryStorage service permanently (QueryPrimaryStorage 503).
Root Cause
ExternalPrimaryStorageFactory.createPrimaryStoragepersistsExternalPrimaryStorageVO/PrimaryStorageVO/PrimaryStorageOutputProtocolRefVObefore invokingsaveControllerIfNeed → buildControllerSvc → ZbsStorageController.reloadDbInfo → JSONObjectUtil.toObject. ARuntimeExceptionfrom JSON parsing left those rows persisted with no rollback. The dirty VO then made every subsequentbuildPsController()throw the same parse error, so the PrimaryStorage service stayed unhealthy andQueryPrimaryStoragekept returning 503.Changes
storage/ExternalPrimaryStorageFactory.java: wrapsaveControllerIfNeedin try/catch; onThrowable,dbf.remove(ref) + dbf.remove(lvo), clearcontrollers/nodesmap entries, then rethrow.test/.../ZbsPrimaryStorageCase.groovy: new SubCasetestAddExternalPrimaryStorageWithMalformedJsonShouldRollbackasserts no leftover rows after malformed-JSON Add and that QueryPrimaryStorage still works.Testing
mvn compile -pl storage -am -Dmaven.test.skippasses locally.ZbsPrimaryStorageCaseruns the new SubCase.Resolves: ZSTAC-84817
sync from gitlab !9743