sync master to feature branch#7086
Closed
liulinC wants to merge 34 commits into
Closed
Conversation
Add a new DynamicRO field to track Secure Boot certificate status per VM. The field indicates whether UEFI Secure Boot certificates need updating. - Define enum (ok, update_available, update_on_boot) and field in datamodel - Check certificate state via varstore-nvram-certcheck on import and DB upgrade for UEFI Secure Boot VMs - Skip control domains, default templates, and non-Secure Boot VMs in the DB upgrade rule Signed-off-by: Stephen Cheng <stephen.cheng@citrix.com>
…eter Add a new versioned parameter 'update' to VM.set_NVRAM_EFI_variables, allowing varstored to indicate whether Secure Boot certificates were changed during an NVRAM write. This enables xapi to maintain the VM.secureboot_certificates_state field accurately. The 'update' parameter is an enum with three values: - 'yes': certificates were updated, set state to 'ok' - 'no': certificates unchanged, keep current state as-is - 'unspecified': (default for v1 callers) run certcheck to determine state Signed-off-by: Stephen Cheng <stephen.cheng@citrix.com>
Signed-off-by: Chunjie Zhu <chunjie.zhu@cloud.com>
Also fix other comments during review Signed-off-by: Stephen Cheng <stephen.cheng@citrix.com>
…ing (#7015) ## Background Microsoft Secure Boot certificates from 2011 are reaching end-of-life, and legacy VMs may still contain only the old certificate set. This PR implements the xapi side of the out-of-band mechanism to track and update per-VM UEFI Secure Boot variables safely, as described in the [design doc](#7006). ## Changes Sorry for the large PR, but the changes form a single feature. The two main commits are: ### 1. CP-311907: Add `VM.secureboot_certificates_state` field Add a new `DynamicRO` field to track Secure Boot certificate status per VM. The field indicates whether UEFI Secure Boot certificates need updating. - Invoke `varstore-nvram-certcheck` to determine certificate state from the NVRAM EFI-variables blob - On DB upgrade: compute state for existing UEFI Secure Boot VMs, skipping control domains and default templates - On import: compute state for VMs imported - On clone/snapshot: copy the state to the new VM ### 2. CP-311908: Add versioned `update` parameter to `VM.set_NVRAM_EFI_variables` Add a versioned `update` parameter (enum: `yes`/`no`/`unspecified`) so varstored can report whether certificates were changed during an NVRAM write. This avoids invoking the certcheck binary on every NVRAM write. - `update=yes`: certificates were updated → set state to `ok` - `update=no`: certificates not changed → preserve current state - `update=unspecified` (default for legacy v1 callers): run certcheck to determine state - Register `set_NVRAM_v2` RPC in xapi-guard, mapping the string parameter to the enum ## Testing Tested the following scenarios with the updated varstored: - State transitions: `ok` → `update_available` → `update_on_boot` → `ok` verified correct - VM reboot with certificate update: varstored sends `update=yes`, state transitions to `ok` - VM reboot without certificate update: varstored sends `update=no`, state is preserved - Cross-host live migration (old host → new host, new host → new host): `secureboot_certificates_state` is correctly preserved
Signed-off-by: Chunjie Zhu <chunjie.zhu@cloud.com>
Signed-off-by: Chunjie Zhu <chunjie.zhu@cloud.com>
Signed-off-by: Chunjie Zhu <chunjie.zhu@cloud.com>
Signed-off-by: Chunjie Zhu <chunjie.zhu@cloud.com>
Admin marks a VM for update: update_available -> update_on_boot tgroup.opam update is to fix "make test" failure
Sync with the latest master branch
Signed-off-by: Chunjie Zhu <chunjie.zhu@citrix.com>
According to design doc, we treat non-secure-boot VM same as secure-boot VM, that is to say, the certificate and certificate state should be updated for non-secure-boot VM.
A test of VM with 239 VBDs (1 system disk + 238 iSCSI VBDs)
fails because of `Timed out while waiting on task xxx`.
In the fail task:
```
VM.start (task 61)
→ perform_atomics([..., Parallel("Devices.plug"), ...])
→ perform_atomic(Parallel("Devices.plug"))
↓
Parallel:task=61.atoms=2.(Devices.plug (no qemu))
│
│ queue_atomics_and_wait(ops=[atom0, atom1])
│ expiration(atom0) = atomic_expires_after(Nested_parallel(239))
│ = Float.max(3600, ...) = 3600s ← BUG
│ event_wait(task=79, timeout=3600s)
│
├─ task 79 (worker): Nested_parallel:task=79.atoms=239.(VBDs.activate_epoch_and_plug RW)
│ │
│ │ queue_atomics_and_wait(239 ops, chunks of 10)
│ ├─ chunk 0: 10 × Serial(VBD_set_active, VBD_epoch_begin, VBD_plug)
│ ├─ chunk 1: 10 × ...
│ ├─ ...
│ └─ chunk 23: 9 × ...
│ └─ total: 82 minutes
│
└─ task 80 (worker): Serial(VIF_set_active, VIF_plug)
```
The task expiry calculation for parallel and nested_parallel
doesn't consider the multi chunks. Just get the max expiry in
the parallel task list. In fact the parallel task split to
24 chunks to run.
Signed-off-by: Changlei Li <changlei.li@citrix.com>
Signed-off-by: Chunjie Zhu <chunjie.zhu@citrix.com>
If you have a global proxy set in /etc/dnf/dnf.conf local repositories (pointing to http://127.0.0.1 ...) are not working. A proxy should not be used for such repositories. Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Signed-off-by: Changlei Li <changlei.li@citrix.com>
This feature does, 1. detect whether a VM, snapshot, or template still uses the Microsoft 2011 UEFI certificates 2. expose a simple state that admins can use in XenCenter, CLI, and API to identify affected objects 3. allow admins to mark eligible VMs so XenServer updates their certificates automatically on the next boot 4. allow admins to unmark a pending update before the VM is rebooted 5. support Windows and Linux UEFI VMs (irrespective if VM has he secure boot function enabled or not) that still carry the affected Microsoft 2011 certificates
dnf was upgraded to dnf5, which prints transaction errors in a new format. The existing parsers in pool_update.precheck and pool_update.apply were written against dnf4 and silently fell through to a generic "unknown error" for every failure mode, hiding the real reason from XAPI. Signed-off-by: Stephen Cheng <stephen.cheng@citrix.com>
If you have a global proxy set in `/etc/dnf/dnf.conf` local repositories (pointing to `http://127.0.0.1 ...`) are not working. A proxy should not be used for such repositories.
The api_version is actually rewriten by flag passed by `--xapi_api_version_minor`. Here bump up the value just affects the local build with make command. Signed-off-by: Changlei Li <changlei.li@citrix.com>
Signed-off-by: Changlei Li <changlei.li@citrix.com>
Signed-off-by: Changlei Li <changlei.li@citrix.com>
The api_version is actually rewriten by flag passed by `--xapi_api_version_minor`. Here bump up the value for local build. Also add new version to `release_order_full` for sdk generate.
A test of VM with 239 VBDs (1 system disk + 238 iSCSI VBDs) fails
because of `Timed out while waiting on task xxx`. In the fail task:
```
VM.start (task 61)
→ perform_atomics([..., Parallel("Devices.plug"), ...])
→ perform_atomic(Parallel("Devices.plug"))
↓
Parallel:task=61.atoms=2.(Devices.plug (no qemu))
│
│ queue_atomics_and_wait(ops=[atom0, atom1])
│ expiration(atom0) = atomic_expires_after(Nested_parallel(239))
│ = Float.max(3600, ...) = 3600s ← BUG
│ event_wait(task=79, timeout=3600s)
│
├─ task 79 (worker): Nested_parallel:task=79.atoms=239.(VBDs.activate_epoch_and_plug RW)
│ │
│ │ queue_atomics_and_wait(239 ops, chunks of 10)
│ ├─ chunk 0: 10 × Serial(VBD_set_active, VBD_epoch_begin, VBD_plug)
│ ├─ chunk 1: 10 × ...
│ ├─ ...
│ └─ chunk 23: 9 × ...
│ └─ total: 82 minutes
│
└─ task 80 (worker): Serial(VIF_set_active, VIF_plug)
```
The task expiry calculation for parallel and nested_parallel doesn't
consider the multi chunks. Just get the max expiry in the parallel task
list. In fact the parallel task split to 24 chunks to run.
Signed-off-by: Stephen Cheng <stephen.cheng@citrix.com>
Signed-off-by: Changlei Li <changlei.li@citrix.com>
Signed-off-by: Changlei Li <changlei.li@citrix.com>
In some rare cases, if peer server port is open, replys TCP SYN but doesn't reply TLS hello, the openssl s_client command in `fetch_server_cert` may hang infinitely. Add a timeout to address this.
dnf was upgraded to dnf5, which prints transaction errors in a new format. The existing parsers in pool_update.precheck and pool_update.apply were written against dnf4 and silently fell through to a generic "unknown error" for every failure mode, hiding the real reason from XAPI. Tested and passed: 4648126 updatenegative.seq 4648212 supppackduringhostinstall.seq 4648128 supppackduringhostupgrade.seq Also did manual tests for typical dnf5 transaction errors (conflict, non-existent pacakge, missing provides)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.