CKS cluster setup hangs in guest waiting for /mnt/k8sdisk binaries (CloudStack 4.22 + RHEL 9.7 KVM + Ubuntu CKS template) #13147
rishabhjain1997
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
CKS cluster provisioning consistently fails on our environment. The control VM boots, cloud-init runs to completion with
DataSourceCloudStack, but the in-VM setup script then spends ~25 minutes polling for/mnt/k8sdisk/to be populated by the Kubernetes binaries ISO, gives up, andkubelet.servicenever gets installed. CKS then reportsFailed to setup Kubernetes cluster : <name> is not in usable state as the system is unable to access control node VMs of the cluster, which is a downstream symptom — the real failure is that the binaries ISO contents never appear inside the guest.We've reproduced this with both a custom-patched Ubuntu cloud-image and the canonical ShapeBlue/Apache CKS template (
cks-ubuntu-2204-kvm.qcow2.bz2, md5627c49a5523fc2cfddebd0a1a396512f), on two separate KVM hosts. Same failure both times, which rules out template or per-host config.Environment
setup-v1.33.1-fresh.iso)cks-ubuntu-2204-kvm.qcow2.bz2fromdownload.cloudstack.org/testing/custom_templates/ubuntu/22.04/Symptoms
Inside the guest (control VM, via VNC console)
So: cloud-init's
DataSourceCloudStackworks (user-data + SSH keys + hostname all applied), but the binaries ISO contents never show up at/mnt/k8sdisk/and the setup loop times out.Management server log
KVM agent log — patch.sh
We separately see
patch.shtiming out every 5 minutes via qemu-guest-agent:And
virsh qemu-agent-command i-2-171-VM '{"execute":"guest-ping"}'reports:Qemu monitor confirms the host side socket is open but disconnected:
The qga issue may be a contributing factor (since it's how
patch.shinjects the cloud-init cmdline) — but cloud-init clearly still ran via the metadata-server path, so the user-data delivery itself isn't the blocker. The cluster failure is happening in the ISO-mount step that runs after cloud-init.What we've tried
qemu-guest-agent,containerd, deps viavirt-customize --install; added a manualmulti-user.target.wants/qemu-guest-agent.servicesymlink; experimented with clearingBindsTo=via a systemd drop-in). All variants hit the same failure.download.cloudstack.organd registered as a new template. Same failure./var/www/html/setup-v1.33.1-fresh.isoexists on secondary storage, registered as ISO template id 227, NFS-mounted atnfs://10.96.32.32/mnt/raid0/secondary.We've validated that:
server=on) and just lacks a guest-side connection.pc-i440fx-rhel7.6.0(qemu reports this is deprecated).KubernetesClusterStartWorker.Detached Kubernetes binaries from VM).Questions for the community
/mnt/k8sdisk/inside Ubuntu CKS guests? Anyone successfully running CKS on this combo?/mnt/k8sdisk/for 25 min. Where does the canonical CKS template expect that mount to be sourced from (CD-ROM device path inside guest)? Should we see a/dev/sr0block device, and if not, what would cause it to be missing despite CKS having attached the ISO at the libvirt layer?pc-i440fx-rhel7.6.0machine type known to break virtio-serial-based qemu-guest-agent on recent Ubuntu cloud kernels? If so, what's the recommended override (hypervisor.kvm.machine.typeglobal setting)?disconnectedeven though both the patched Ubuntu cloud-image AND the canonical CKS template have qga installed. Is there a known wiring issue for the virtio-serial channel nameorg.qemu.guest_agent.0on RHEL 9.7 / qemu 9.1?Happy to provide more logs / mgmt-server traces /
virsh dumpxmlof the affected VMs on request. Cross-linked from discussion #13056 (original report from @anirudh-kaushal).Thanks!
Beta Was this translation helpful? Give feedback.
All reactions