Skip to content

sync master to feature branch#7086

Closed
liulinC wants to merge 34 commits into
feature/ldapsfrom
master
Closed

sync master to feature branch#7086
liulinC wants to merge 34 commits into
feature/ldapsfrom
master

Conversation

@liulinC

@liulinC liulinC commented May 21, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

Stephen Cheng and others added 30 commits April 16, 2026 10:09
Add a new DynamicRO field to track Secure Boot certificate status per VM.
The field indicates whether UEFI Secure Boot certificates need updating.

- Define enum (ok, update_available, update_on_boot) and field in datamodel
- Check certificate state via varstore-nvram-certcheck on import and
  DB upgrade for UEFI Secure Boot VMs
- Skip control domains, default templates, and non-Secure Boot VMs
  in the DB upgrade rule

Signed-off-by: Stephen Cheng <stephen.cheng@citrix.com>
…eter

Add a new versioned parameter 'update' to VM.set_NVRAM_EFI_variables,
allowing varstored to indicate whether Secure Boot certificates were
changed during an NVRAM write. This enables xapi to maintain the
VM.secureboot_certificates_state field accurately.

The 'update' parameter is an enum with three values:
- 'yes': certificates were updated, set state to 'ok'
- 'no': certificates unchanged, keep current state as-is
- 'unspecified': (default for v1 callers) run certcheck to determine state

Signed-off-by: Stephen Cheng <stephen.cheng@citrix.com>
Signed-off-by: Chunjie Zhu <chunjie.zhu@cloud.com>
Also fix other comments during review

Signed-off-by: Stephen Cheng <stephen.cheng@citrix.com>
…ing (#7015)

## Background

Microsoft Secure Boot certificates from 2011 are reaching end-of-life,
and legacy VMs may still contain only the old certificate set. This PR
implements the xapi side of the out-of-band mechanism to track and
update per-VM UEFI Secure Boot variables safely, as described in the
[design doc](#7006).

## Changes

Sorry for the large PR, but the changes form a single feature.
The two main commits are:

### 1. CP-311907: Add `VM.secureboot_certificates_state` field

Add a new `DynamicRO` field to track Secure Boot certificate status per
VM. The field indicates whether UEFI Secure Boot certificates need
updating.

- Invoke `varstore-nvram-certcheck` to determine certificate state from
the NVRAM EFI-variables blob
- On DB upgrade: compute state for existing UEFI Secure Boot VMs,
skipping control domains and default templates
- On import: compute state for VMs imported
- On clone/snapshot: copy the state to the new VM

### 2. CP-311908: Add versioned `update` parameter to
`VM.set_NVRAM_EFI_variables`

Add a versioned `update` parameter (enum: `yes`/`no`/`unspecified`) so
varstored can report whether certificates were changed during an NVRAM
write. This avoids invoking the certcheck binary on every NVRAM write.

- `update=yes`: certificates were updated → set state to `ok`
- `update=no`: certificates not changed → preserve current state
- `update=unspecified` (default for legacy v1 callers): run certcheck to
determine state
- Register `set_NVRAM_v2` RPC in xapi-guard, mapping the string
parameter to the enum


## Testing

Tested the following scenarios with the updated varstored:

- State transitions: `ok` → `update_available` → `update_on_boot` → `ok`
verified correct
- VM reboot with certificate update: varstored sends `update=yes`, state
transitions to `ok`
- VM reboot without certificate update: varstored sends `update=no`,
state is preserved
- Cross-host live migration (old host → new host, new host → new host):
`secureboot_certificates_state` is correctly preserved
Signed-off-by: Chunjie Zhu <chunjie.zhu@cloud.com>
Signed-off-by: Chunjie Zhu <chunjie.zhu@cloud.com>
Signed-off-by: Chunjie Zhu <chunjie.zhu@cloud.com>
Signed-off-by: Chunjie Zhu <chunjie.zhu@cloud.com>
Admin marks a VM for update: update_available -> update_on_boot

tgroup.opam update is to fix "make test" failure
Signed-off-by: Chunjie Zhu <chunjie.zhu@citrix.com>
According to design doc, we treat non-secure-boot VM same as secure-boot
VM, that is to say, the certificate and certificate state should be
updated for non-secure-boot VM.
A test of VM with 239 VBDs (1 system disk + 238 iSCSI VBDs)
fails because of `Timed out while waiting on task xxx`.
In the fail task:
```
VM.start (task 61)
  → perform_atomics([..., Parallel("Devices.plug"), ...])
    → perform_atomic(Parallel("Devices.plug"))
      ↓
      Parallel:task=61.atoms=2.(Devices.plug (no qemu))
      │
      │ queue_atomics_and_wait(ops=[atom0, atom1])
      │   expiration(atom0) = atomic_expires_after(Nested_parallel(239))
      │                     = Float.max(3600, ...) = 3600s  ← BUG
      │   event_wait(task=79, timeout=3600s)
      │
      ├─ task 79 (worker): Nested_parallel:task=79.atoms=239.(VBDs.activate_epoch_and_plug RW)
      │    │
      │    │ queue_atomics_and_wait(239 ops, chunks of 10)
      │    ├─ chunk 0:  10 × Serial(VBD_set_active, VBD_epoch_begin, VBD_plug)
      │    ├─ chunk 1:  10 × ...
      │    ├─ ...
      │    └─ chunk 23:  9 × ...
      │    └─ total: 82 minutes
      │
      └─ task 80 (worker): Serial(VIF_set_active, VIF_plug)
```
The task expiry calculation for parallel and nested_parallel
doesn't consider the multi chunks. Just get the max expiry in
the parallel task list. In fact the parallel task split to
24 chunks to run.

Signed-off-by: Changlei Li <changlei.li@citrix.com>
Signed-off-by: Chunjie Zhu <chunjie.zhu@citrix.com>
If you have a global proxy set in /etc/dnf/dnf.conf local
repositories (pointing to http://127.0.0.1 ...) are not working.
A proxy should not be used for such repositories.

Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Signed-off-by: Changlei Li <changlei.li@citrix.com>
This feature does,
1. detect whether a VM, snapshot, or template still uses the Microsoft
2011 UEFI certificates
2. expose a simple state that admins can use in XenCenter, CLI, and API
to identify affected objects
3. allow admins to mark eligible VMs so XenServer updates their
certificates automatically on the next boot
4. allow admins to unmark a pending update before the VM is rebooted
5. support Windows and Linux UEFI VMs (irrespective if VM has he secure
boot function enabled or not) that still carry the affected Microsoft
2011 certificates
dnf was upgraded to dnf5, which prints transaction errors in a new
format. The existing parsers in pool_update.precheck and pool_update.apply
were written against dnf4 and silently fell through to a generic
"unknown error" for every failure mode, hiding the real reason from
XAPI.

Signed-off-by: Stephen Cheng <stephen.cheng@citrix.com>
If you have a global proxy set in `/etc/dnf/dnf.conf` local repositories
(pointing to `http://127.0.0.1 ...`) are not working.
A proxy should not be used for such repositories.
The api_version is actually rewriten by flag passed by
`--xapi_api_version_minor`. Here bump up the value just
affects the local build with make command.

Signed-off-by: Changlei Li <changlei.li@citrix.com>
Signed-off-by: Changlei Li <changlei.li@citrix.com>
Signed-off-by: Changlei Li <changlei.li@citrix.com>
The api_version is actually rewriten by flag passed by
`--xapi_api_version_minor`. Here bump up the value for local build. Also
add new version to `release_order_full` for sdk generate.
A test of VM with 239 VBDs (1 system disk + 238 iSCSI VBDs) fails
because of `Timed out while waiting on task xxx`. In the fail task:
```
VM.start (task 61)
  → perform_atomics([..., Parallel("Devices.plug"), ...])
    → perform_atomic(Parallel("Devices.plug"))
      ↓
      Parallel:task=61.atoms=2.(Devices.plug (no qemu))
      │
      │ queue_atomics_and_wait(ops=[atom0, atom1])
      │   expiration(atom0) = atomic_expires_after(Nested_parallel(239))
      │                     = Float.max(3600, ...) = 3600s  ← BUG
      │   event_wait(task=79, timeout=3600s)
      │
      ├─ task 79 (worker): Nested_parallel:task=79.atoms=239.(VBDs.activate_epoch_and_plug RW)
      │    │
      │    │ queue_atomics_and_wait(239 ops, chunks of 10)
      │    ├─ chunk 0:  10 × Serial(VBD_set_active, VBD_epoch_begin, VBD_plug)
      │    ├─ chunk 1:  10 × ...
      │    ├─ ...
      │    └─ chunk 23:  9 × ...
      │    └─ total: 82 minutes
      │
      └─ task 80 (worker): Serial(VIF_set_active, VIF_plug)
```
The task expiry calculation for parallel and nested_parallel doesn't
consider the multi chunks. Just get the max expiry in the parallel task
list. In fact the parallel task split to 24 chunks to run.
Signed-off-by: Stephen Cheng <stephen.cheng@citrix.com>
changlei-li and others added 4 commits May 20, 2026 15:53
Signed-off-by: Changlei Li <changlei.li@citrix.com>
Signed-off-by: Changlei Li <changlei.li@citrix.com>
In some rare cases, if peer server port is open, replys TCP SYN but
doesn't reply TLS hello, the openssl s_client command in
`fetch_server_cert` may hang infinitely. Add a timeout to address this.
dnf was upgraded to dnf5, which prints transaction errors in a new
format. The existing parsers in pool_update.precheck and
pool_update.apply were written against dnf4 and silently fell through to
a generic "unknown error" for every failure mode, hiding the real reason
from XAPI.

Tested and passed:
4648126 updatenegative.seq
4648212 supppackduringhostinstall.seq
4648128 supppackduringhostupgrade.seq

Also did manual tests for typical dnf5 transaction errors (conflict,
non-existent pacakge, missing provides)
@liulinC liulinC closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants