From e82754585a62e9d6a1515b28ad03e2f7227c7f91 Mon Sep 17 00:00:00 2001 From: alex newman Date: Tue, 2 Jun 2026 12:58:37 -0400 Subject: [PATCH 1/3] fix(cp): boot the TDX CP on TDX-enlightened OVMF, not generic OVMF.fd MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit local-cp.sh pinned the CP VM's firmware to /usr/share/ovmf/OVMF.fd — the generic, non-TDX OVMF. A TDX confidential guest cannot run on it: the firmware hits a TDX-context-invalid instruction and faults with `#UD` (invalid opcode) before the kernel boots, so the TD reset-loops (no kernel, no network, no /health) and every deploy fails its readiness gate. This was latent until the baremetal host was reprovisioned (fresh OS, 2026-05-29): the new image's /usr/share/ovmf/OVMF.fd is a plain, non-TDX build, where before it happened to be TDX-capable. Confirmed on the host — the guest's firmware log shows the #UD; swapping the live VM's loader to OVMF.inteltdx.fd boots it straight into the kernel. Fix: resolve a firmware that is both TDX-enlightened *and* non-secure- boot (OVMF.inteltdx.fd, the same one the easyenclave-local base domain boots — the non-".ms" variant avoids the "Access Denied" the unsigned UKI gets from secure-boot OVMF), searching the standard locations and failing loudly if none is present. Keeps the stateless ROM loader + firmware-auto-selection-disable that the unsigned UKI requires. Co-Authored-By: Claude Opus 4.8 (1M context) --- apps/_infra/local-cp.sh | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/apps/_infra/local-cp.sh b/apps/_infra/local-cp.sh index 57f9264..f9d9035 100755 --- a/apps/_infra/local-cp.sh +++ b/apps/_infra/local-cp.sh @@ -245,26 +245,43 @@ x = pattern.sub("\n" + replacement, x, count=1) with open(p, "w") as f: f.write(x) PY - # The local-tdx-qcow2 UKI is intentionally unsigned; this host's secure - # boot OVMF rejects it with UEFI "Access Denied". Use the non-secure - # ROM loader when present and disable firmware auto-selection below. - if [ -r /usr/share/ovmf/OVMF.fd ]; then - python3 - "$out" <<'PY' -import re, sys + # The local-tdx-qcow2 UKI is intentionally unsigned, so a secure-boot + # OVMF (the ".ms" Microsoft-keys variant) rejects it with UEFI "Access + # Denied". But a TDX guest *also* cannot boot on the generic, non-TDX + # OVMF (`OVMF.fd`): its firmware executes a TDX-context-invalid + # instruction and faults with `#UD` (invalid opcode), reset-looping + # before the kernel ever starts. We therefore need a firmware that is + # both TDX-enlightened *and* non-secure-boot: `OVMF.inteltdx.fd`. Pin + # the first one available as a stateless ROM loader and disable + # libvirt's firmware auto-selection below. (This is the same firmware + # the easyenclave-local base domain boots.) + tdx_fw="" + for c in /usr/local/share/ovmf/OVMF.inteltdx.fd \ + /usr/share/ovmf/OVMF.inteltdx.fd \ + /usr/share/OVMF/OVMF.inteltdx.fd; do + if [ -r "$c" ]; then tdx_fw="$c"; break; fi + done + if [ -z "$tdx_fw" ]; then + echo "no TDX-enlightened OVMF found (looked for OVMF.inteltdx.fd in /usr/local/share/ovmf, /usr/share/ovmf, /usr/share/OVMF); cannot boot a TDX CP" >&2 + exit 1 + fi + echo "local-cp: TDX firmware -> $tdx_fw" + TDX_FW="$tdx_fw" python3 - "$out" <<'PY' +import re, sys, os p = sys.argv[1] +fw = os.environ["TDX_FW"] with open(p) as f: x = f.read() x = re.sub(r"", "", x, count=1) x = re.sub(r"\n\s*.*?", "", x, count=1, flags=re.DOTALL) x = re.sub(r"\n\s*]*>.*?", "", x, count=1, flags=re.DOTALL) x = re.sub( r"]*>.*?", - "/usr/share/ovmf/OVMF.fd", + lambda _m: "%s" % fw, x, count=1, ) with open(p, "w") as f: f.write(x) PY - fi # CP sizing. local mem_kib=16777216 # 16 GiB From e4db9b6cb257220500910c562c42702e67abeb04 Mon Sep 17 00:00:00 2001 From: alex newman Date: Tue, 2 Jun 2026 13:10:33 -0400 Subject: [PATCH 2/3] fix(cp): log firmware choice to stderr, not stdout MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit render_domain_xml() streams the domain XML on stdout (the caller captures it for `virsh define`). The firmware-selection progress line went to stdout, prepending non-XML to the captured document → `virsh define` failed with "Start tag expected, '<' not found". Redirect it to stderr. Co-Authored-By: Claude Opus 4.8 (1M context) --- apps/_infra/local-cp.sh | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/apps/_infra/local-cp.sh b/apps/_infra/local-cp.sh index f9d9035..bb00751 100755 --- a/apps/_infra/local-cp.sh +++ b/apps/_infra/local-cp.sh @@ -265,7 +265,10 @@ PY echo "no TDX-enlightened OVMF found (looked for OVMF.inteltdx.fd in /usr/local/share/ovmf, /usr/share/ovmf, /usr/share/OVMF); cannot boot a TDX CP" >&2 exit 1 fi - echo "local-cp: TDX firmware -> $tdx_fw" + # NB: render_domain_xml() streams the finished XML on stdout (`cat "$out"` + # at the end) and the caller captures it — so any progress logging here + # must go to stderr, or it corrupts the domain XML fed to `virsh define`. + echo "local-cp: TDX firmware -> $tdx_fw" >&2 TDX_FW="$tdx_fw" python3 - "$out" <<'PY' import re, sys, os p = sys.argv[1] From 4ecf88dd97f6323aefa955d2fa9f18bd7bf325d5 Mon Sep 17 00:00:00 2001 From: alex newman Date: Tue, 2 Jun 2026 13:30:14 -0400 Subject: [PATCH 3/3] fix(agents): boot TDX agent VMs on TDX OVMF, not generic OVMF.fd MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Same firmware bug as local-cp.sh, agent side: local-agents.sh pinned the agent VMs' loader to /usr/share/ovmf/OVMF.fd (generic, non-TDX), so on the reprovisioned host they fault with #UD in firmware and reset-loop — the preview deploy's CP came up but dd-local-preview / -oracle never registered (their consoles show the same #UD). Resolve a TDX-enlightened, non-secure-boot OVMF (OVMF.inteltdx.fd) the same way local-cp.sh now does, logging to stderr so the captured domain XML stays clean. Co-Authored-By: Claude Opus 4.8 (1M context) --- apps/_infra/local-agents.sh | 35 +++++++++++++++++++++++++++-------- 1 file changed, 27 insertions(+), 8 deletions(-) diff --git a/apps/_infra/local-agents.sh b/apps/_infra/local-agents.sh index 8aec03f..cafde1c 100755 --- a/apps/_infra/local-agents.sh +++ b/apps/_infra/local-agents.sh @@ -476,26 +476,45 @@ PY sed -i "s|/var/log/ee-local\\.log|/var/log/ee-local-$name.log|g" "$out" sed -i "s|||g" "$out" - # The local-tdx-qcow2 UKI is intentionally unsigned; this host's secure - # boot OVMF rejects it with UEFI "Access Denied". Use the non-secure - # ROM loader when present and disable firmware auto-selection below. - if [ -r /usr/share/ovmf/OVMF.fd ]; then - python3 - "$out" <<'PY' -import re, sys + # The local-tdx-qcow2 UKI is intentionally unsigned, so a secure-boot + # OVMF (the ".ms" Microsoft-keys variant) rejects it with UEFI "Access + # Denied". But a TDX guest *also* cannot boot on the generic, non-TDX + # OVMF (`OVMF.fd`): its firmware faults with `#UD` (invalid opcode) and + # reset-loops before the kernel starts. Use a firmware that is both + # TDX-enlightened *and* non-secure-boot (`OVMF.inteltdx.fd`, the same one + # the easyenclave-local base domain boots); pin it as a stateless ROM + # loader and disable firmware auto-selection below. + # + # NB: render_domain_xml() streams the finished XML on stdout and the + # caller captures it — any logging here must go to stderr or it corrupts + # the domain XML fed to `virsh define`. + tdx_fw="" + for c in /usr/local/share/ovmf/OVMF.inteltdx.fd \ + /usr/share/ovmf/OVMF.inteltdx.fd \ + /usr/share/OVMF/OVMF.inteltdx.fd; do + if [ -r "$c" ]; then tdx_fw="$c"; break; fi + done + if [ -z "$tdx_fw" ]; then + echo "no TDX-enlightened OVMF found (looked for OVMF.inteltdx.fd in /usr/local/share/ovmf, /usr/share/ovmf, /usr/share/OVMF); cannot boot a TDX agent" >&2 + exit 1 + fi + echo "local-agents: TDX firmware -> $tdx_fw" >&2 + TDX_FW="$tdx_fw" python3 - "$out" <<'PY' +import re, sys, os p = sys.argv[1] +fw = os.environ["TDX_FW"] with open(p) as f: x = f.read() x = re.sub(r"", "", x, count=1) x = re.sub(r"\n\s*.*?", "", x, count=1, flags=re.DOTALL) x = re.sub(r"\n\s*]*>.*?", "", x, count=1, flags=re.DOTALL) x = re.sub( r"]*>.*?", - "/usr/share/ovmf/OVMF.fd", + lambda _m: "%s" % fw, x, count=1, ) with open(p, "w") as f: f.write(x) PY - fi # CPU-only agent sizing. Dogfood is meant for real interactive # development, so give it enough room for Codex + nested containers.