fix(#405): honest CPU fallback for local embeddings + ADR-101 ROCm strategy by aaronsb · Pull Request #408 · aaronsb/knowledge-graph-system

aaronsb · 2026-05-24T21:40:22Z

Closes #405. End-to-end verified on AMD 7900 XTX (gfx1100) + iGPU 9950X3D (gfx1036), Arch ROCm 7.2.3 host.

Summary

Issue #405 reported that ./operator.sh init "AMD ROCm" path produced a usable-looking config but every ingest then failed with Embedding model not loaded. Call load_model() first. The startup log claimed "Falling back to API-based embeddings" — a lie; no fallback actually occurred. The root cause is image packaging: published kg-api:latest ships PyPI's default torch wheel (CUDA runtime bundled, no ROCm). This PR ships both halves of the fix.

What's in this PR

ADR-101 (Accepted) — docs/architecture/infrastructure/ADR-101-rocm-image-variant-and-install-time-selection.md. Frames the root cause, proposes three ROCm variant tags (rocm60, rocm61, rocm72-host) with install-time selection via KG_API_IMAGE_TAG substitution in docker-compose.ghcr.yml. NVIDIA stays on :latest (default PyPI torch wheel carries the CUDA runtime — works on NVIDIA and CPU hosts unchanged).

Honest CPU fallback — api/app/lib/embedding_model_manager.py. When load_model() fails on the configured device (CUDA-built torch on AMD host, missing /dev/kfd for ROCm, MPS on non-Apple, etc.), retries once with device='cpu' rather than leaving the platform silent-broken. Also closes a global-pollution bug where a failed init left _model_manager pointing at a half-built manager (the actual reason concepts saw "model not loaded" instead of "not initialized"). Hot-reload deliberately does NOT carry the fallback — operator just made an explicit just-typed device choice, atomic-swap preserves the previous working model on failure, silently downgrading would be the worse failure.

Honest startup logs — api/app/main.py. No more "Falling back to API-based embeddings" when the active profile is still local. Mirror fix at line 218 for visual-embedding init. On both-paths-failure, the message surfaces what actually happened with concrete remediation paths.

Published ROCm image — ghcr.io/aaronsb/knowledge-graph-system/kg-api:rocm72-host. Built from api/Dockerfile.rocm-host on AMD's official rocm/pytorch:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.9.1 base image. Single-arch linux/amd64 (ROCm has no production arm64 path). publish.sh images-rocm is the new target group; rocm60/rocm61 variants stay gated behind --force until someone runs them on real hardware (#409).

Operator wiring with single-source-of-truth helper — operator/lib/image-tag.sh is the one place GPU_MODE → KG_API_IMAGE_TAG is defined. operator.sh:load_config, operator/lib/common.sh:load_operator_config, and operator/lib/guided-init.sh all source it. The earlier inline-three-times shape drifted between commits, the helper ends that. docker/docker-compose.ghcr.yml parameterized to substitute the tag. operator/lib/guided-init.sh persists the tag in .operator.conf, warns about ROCR_VISIBLE_DEVICES=0 on hosts with both a discrete dGPU and an iGPU, and reframes wizard option [3] as "preview" (it now resolves to :latest + CPU fallback because rocm60/rocm61 are deferred — previously it pointed at an unpublished image). operator.sh:get_compose_cmd gained the missing amd-host GPU overlay case — without it, no /dev/kfd device mounts, no privileged: true, ROCm runtime had nothing to talk to inside the container.

Verified on real hardware (7900 XTX)

Check	Result
Container image pulled	`ghcr.io/.../kg-api:rocm72-host`
`/dev/kfd` + `/dev/dri` in container	Mounted
`rocminfo` inside container	gfx1100 (7900 XTX) + gfx1036 (iGPU)
`torch.cuda.is_available()`	True
`torch.version.hip`	7.2.53211-c2d9476115
Text embedding device	`nomic-ai/nomic-embed-text-v1.5 (768 dims, cuda)`
Vision embedding device	`nomic-ai/nomic-embed-vision-v1.5 on cuda`
GPU under load	7900 XTX 12% busy, iGPU 1% idle — work routes to discrete
VRAM detected	24268 MB free

Diagnosis also incidentally verified the CPU fallback fires correctly on the ROCm container when the GPU overlay wasn't yet applied — proof the two halves of the ADR work together as designed.

Commits

fac3daa3 — ADR-101 draft → Accepted
49fc13ba — Real CPU fallback in init_embedding_model_manager; global-pollution fix; honest main.py log
fd2e72cf — Review follow-ups: extract _build_with_cpu_fallback helper, case-insensitive 'cpu' retry guard, real-torch integration test, attempt-then-success log ordering, ADR status promoted to Accepted, async-reload asymmetry comment
9c7db593 — publish.sh images-rocm target + kg-api:rocm72-host published; operator/lib/common.sh KG_API_IMAGE_TAG export; ghcr compose parameterization; guided-init persistence + multi-GPU note; Dockerfile.rocm-host base bumped to ROCm 7.2.3
c448f098 — Drift fix between operator.sh (top-level) and operator/lib/common.sh: KG_API_IMAGE_TAG export, missing amd-host overlay case
2583051c — Final review pass: collapse GPU_MODE→TAG triplication into operator/lib/image-tag.sh, fix wizard option [3] footgun (was pointing at unpublished rocm60 image), honest visual-embedding log at main.py:218, doc-surface drift in ADR-101 + publish.sh comment

Test plan

pytest tests/unit/ tests/api/ — 921 passed, 15 skipped (no regressions)
tests/unit/lib/test_embedding_model_manager_fallback.py — 8 cases (happy path, CUDA-fails / CPU-rescues, both-fail terminal state with cleared global, CPU-as-configured no-retry, case-insensitive 'cpu' parametrized)
tests/unit/lib/test_embedding_model_manager_real_torch.py — exercises real sentence-transformers loader (no load_model mock) on no-CUDA host, proves the catchable-exception contract
End-to-end ROCm verification on 7900 XTX — model loads on ROCm device, 24GB VRAM detected, embeddings generate on discrete GPU
ADR-101 lints clean (docs/scripts/adr lint)
Shared derive_kg_api_image_tag smoke-tested across all three call sites

Tracking issues for deferred follow-ups

Build & publish kg-api ROCm 6.x variants (rocm60, rocm61) #409 — Build & publish ROCm 6.x variants (rocm60, rocm61)
CPU fallback for visual embedding generator (parity with text path) #410 — CPU fallback for visual embedding generator (parity with text)
Operator: --image-source flag silently dropped by guided-init.sh #411 — Operator: --image-source flag silently dropped by guided-init.sh
Operator: .operator.conf-persisted KG_API_IMAGE_TAG goes stale on manual GPU_MODE edits #412 — Operator: persisted KG_API_IMAGE_TAG goes stale on manual GPU_MODE edits

Housekeeping

Stale branch fix/catalog-refresh-embedding-coupling was deleted local + remote during this work (commits already on main via direct push, never PR'd).

Frames issue #405's root cause as image packaging: the published kg-api:latest ships PyPI's default torch wheel, which bundles CUDA runtime (works on NVIDIA and CPU) but has no ROCm support. AMD users need a separate wheel from the ROCm-specific PyTorch index. Proposes three published ROCm variants alongside kg-api:latest — kg-api:rocm60, kg-api:rocm61 (PYTORCH_VARIANT wheels), and kg-api:rocm71-host (AMD's official rocm/pytorch base image) — with KG_API_IMAGE_TAG in docker-compose.ghcr.yml driving install-time selection from operator GPU_MODE. NVIDIA support unchanged: nvidia and cpu modes continue to use kg-api:latest. Single-arch (linux/amd64) for ROCm variants — no production-quality arm64 ROCm path exists. publish.sh gains an api-rocm target group that builds all three variants in one invocation. Visual-embedding parallel path noted as a deferred follow-up; #405 only names the text path.

Issue #405 reported AMD ROCm init looking fixed but silently broken: the API logs "Falling back to API-based embeddings" while the active embedding profile is still 'local', and every ingest concept then fails with "Embedding model not loaded. Call load_model() first." Two coupled defects, both addressed here: 1. init_embedding_model_manager assigned the module-global _model_manager *before* calling load_model(). When load_model() raised, the global was left pointing at a half-built manager with self.model = None — so downstream get_embedding_model_manager() returned the broken manager instead of raising "not initialized". Now the global is only published after load_model() succeeds, and is explicitly cleared on terminal failure. 2. No CPU fallback existed when the configured device was unusable. Per ADR-101, retry once with device='cpu' before giving up — that matches the workaround documented in #405 and means a user who picks the wrong ROCm variant gets a degraded-but-working install, not a silent-broken one. main.py's misleading "Falling back to API-based embeddings" log line becomes an honest error: the active profile is 'local', and if both the configured device and CPU fail, ingestion will fail until the user switches profiles or fixes the model config. No more lying about a fallback that never happened. Tests lock in: configured-device success path, CUDA-fails / CPU-rescues fallback, both-fail terminal state (global stays None, get_embedding_* raises "not initialized"), and CPU-as-configured failure short-circuits without a redundant retry.

…ch test, honest log ordering Addresses code-review feedback on PR #408: * Extracts the build-with-CPU-fallback logic to a module-level `_build_with_cpu_fallback` helper. The boot-time init path uses it; hot-reload deliberately does NOT — see the long-form comment in `reload_embedding_model_manager` for the asymmetry argument (boot uses stale config that may predate hardware changes and benefits from fallback; reload carries fresh just-typed operator intent and should fail loudly while the atomic-swap pattern preserves the previous working manager). * Case-insensitive 'cpu' check in the retry guard — config strings like 'CPU', ' cpu', 'Cpu' no longer trigger a redundant retry that double-logs the same failure. Parametrized test locks the contract. * Reorders the fallback log lines: 'Attempting CPU fallback...' before the retry, success-after-the-fact 'Loaded <model> on CPU (configured device=<x> was unavailable). Re-run...' once the fallback completes. Reads correctly whether the retry succeeded or failed, instead of the prior "Falling back" message that implied an action which might then error out. * Adds tests/unit/lib/test_embedding_model_manager_real_torch.py — integration-style test that calls the real sentence-transformers loader (no `load_model` mock) on a CUDA-less host with the all-MiniLM tiny model. Proves the catchable-exception contract the mocked tests can only assume (torch raises catchable Exception on cuda-no-driver rather than segfaulting past the except clause), and verifies the fallback manager actually produces real 384-dim embeddings. Skipped on hosts with usable CUDA — assertions are meaningful only when the configured device is genuinely missing. * ADR-101 status Draft → Accepted. The image-variant scheme is operational from this PR forward; outstanding image-build/publish and operator-selection work tracks as follow-up tasks.

… (ADR-101, #405) Implements the image-build and operator-selection halves of ADR-101. Image build (publish.sh images-rocm): * New cmd_images_rocm publishes kg-api ROCm variants. Builds the appropriate Dockerfile (rocm60/rocm61 → api/Dockerfile with PYTORCH_VARIANT; rocm72-host → api/Dockerfile.rocm-host) and pushes tags `kg-api:<variant>`, `kg-api:<VERSION>-<variant>`, and `kg-api:sha-<GIT_SHA>-<variant>`. Does NOT touch :latest. * Single-arch linux/amd64 (ROCm has no production arm64 path). * Two-tier variant gating: DEFAULT_VARIANTS (rocm72-host) build by default; DEFERRED_VARIANTS (rocm60, rocm61) require --force because no maintainer-verified hardware test has run for them yet. * api/Dockerfile.rocm-host updated: base image bumped rocm/pytorch:rocm7.1_ubuntu24.04_py3.13 → rocm/pytorch:rocm7.2.3_ubuntu24.04_py3.12 to match modern Arch ROCm 7.2.3 hosts (closer than 7.1's ABI-compatible guess). Python 3.12 — no 3.13-specific syntax in the API code. * Tag rename: rocm71-host → rocm72-host to reflect the actual base. Operator wiring: * docker/docker-compose.ghcr.yml: api service image becomes `kg-api:${KG_API_IMAGE_TAG:-latest}`, so AMD hosts pull the matching variant. NVIDIA/CPU/mac default to latest (PyPI torch ships CUDA runtime bundled — works for both). * operator/lib/common.sh load_operator_config: derives KG_API_IMAGE_TAG from GPU_MODE when not explicitly set, exports for docker-compose substitution. amd → rocm60 (rocm61 with ROCM_VERSION=rocm61); amd-host → rocm72-host; else latest. * operator/lib/guided-init.sh: persists KG_API_IMAGE_TAG into .operator.conf during init so the choice survives env resets and is visible to operators. Adds a ROCR_VISIBLE_DEVICES=0 hint when AMD mode is chosen — discrete + iGPU coexistence is a real foot-gun on Ryzen 7000+ hosts. ADR-101 updated to reflect the operational tag name (rocm72-host) and adds a naming-convention note: the suffix tracks the base image's ROCm version. Future base bumps (rocm/pytorch:rocm7.3) add new tags rather than overloading the existing one — keeps tag meaning immutable. Published this commit (manually verified): ghcr.io/aaronsb/knowledge-graph-system/kg-api:rocm72-host ghcr.io/aaronsb/knowledge-graph-system/kg-api:0.13.1-rocm72-host Outstanding: end-to-end verification on the AMD 7900 XTX host (task #6) — operator tear-down + fresh init → choose AMD GPU (host ROCm) → confirm ROCm device picked, model loads, ingest runs. The rocm60/rocm61 variants remain deferred per ADR-101 until a tester volunteers.

…nd amd-host overlay (ADR-101, #405) Two drifts between operator.sh (top-level standalone entry-point) and operator/lib/common.sh (dev helpers) that prevented end-to-end ROCm verification: 1. load_config never derived/exported KG_API_IMAGE_TAG from GPU_MODE. common.sh's load_operator_config got it in the previous commit, but operator.sh has its own load_config that the standalone start path (`./operator.sh start` after `init --image-source ghcr`) goes through. Without the export, docker-compose substituted ${KG_API_IMAGE_TAG:-latest} with `latest`, so AMD hosts pulled the CUDA-bundled image and silently landed on the #405 failure mode (now caught by #408's CPU fallback but defeating the purpose of the ROCm variant). 2. get_compose_cmd's GPU overlay case only handled nvidia, amd, mac — it was missing amd-host entirely. So with GPU_MODE=amd-host, no docker-compose.gpu-amd-host.yml overlay applied: no /dev/kfd or /dev/dri device mounts, no privileged:true / ipc:host for ROCm HSA init, and the container's ROCm runtime had nothing to talk to — "No HIP GPUs are available" inside the container despite a perfectly functional 7900 XTX on the host. Verified on 7900 XTX (gfx1100) + iGPU 9950X3D (gfx1036) host running Arch ROCm 7.2.3: Container: ghcr.io/aaronsb/knowledge-graph-system/kg-api:rocm72-host rocminfo (inside): both gfx1100 and gfx1036 enumerated torch.cuda.is_available(): True torch.version.hip: 7.2.53211-c2d9476115 device_count: 2, [0] AMD Radeon RX 7900 XTX, [1] iGPU Embedding model loaded: nomic-ai/nomic-embed-text-v1.5 (768 dims, cuda) Vision model loaded: nomic-ai/nomic-embed-vision-v1.5 on cuda rocm-smi: GPU[0] 12% busy under embedding load, GPU[1] idle (1%) Default cuda:0 device routes work to the discrete 7900 XTX without requiring ROCR_VISIBLE_DEVICES — the iGPU is enumerated but not used. The wizard hint about ROCR_VISIBLE_DEVICES=0 remains useful as a contingency for hosts where the runtime enumerates the iGPU first. Closes the ROCm half of ADR-101. The CPU fallback half (PR #408 core) also showed honestly during diagnosis: when devices weren't mounted, embedding load failed with 'No HIP GPUs are available' and the new fallback engaged with the new log wording 'Attempting CPU fallback...' 'Loaded ... on CPU (configured device='cuda' was unavailable)' — proof that both halves work together as the ADR claimed.

aaronsb · 2026-05-25T03:33:12Z

Review — post-`c448f098` final state

Focused on fd2e72cf, 9c7db593, c448f098. End-to-end verification on the 7900 XTX confirms the amd-host half lands clean; the amd (wheel-based) half ships a footgun that should be addressed before merge.

Must-fix / discuss

1. Wizard offers an image that doesn't exist in GHCR.

gh api .../kg-api/versions shows only latest, rocm72-host, and versioned aliases published. kg-api:rocm60 and kg-api:rocm61 are not in the registry — cmd_images_rocm gates them behind --force and they've never been built.

But the wizard at operator/lib/guided-init.sh:187-193 offers [3] "Linux with AMD GPU (ROCm wheels)" as a peer option, with description "For systems with ROCm 6.x installed." A user on Ubuntu 22.04 + ROCm 6.x picks the option that names their hardware → GPU_MODE=amd → both operator.sh:75 and operator/lib/common.sh:47 derive KG_API_IMAGE_TAG="${ROCM_VERSION:-rocm60}" → docker compose pull fails with manifest unknown for kg-api:rocm60.

This is the exact "looks fixed, silently broken at install" failure mode #405 was filed to eliminate — except now it bites the wheel-based AMD users instead of the host-mode ones. The PR claims to close #405 for AMD users; today it only closes it for amd-host users.

Pick one (don't ship as-is):

Gate option 3 in the wizard with [EXPERIMENTAL — requires \--image-source local` to build]` until rocm60 ships
Move option 3 behind a "Show advanced/unverified options" prompt
Have operator.sh print a clear pre-pull warning when KG_API_IMAGE_TAG resolves to a variant that won't be in GHCR
Build + push rocm60/rocm61 unverified (contradicts your own cmd_images_rocm "refuse to ship untested" stance — probably wrong, listed for completeness)

2. Doc/code drift on the publish-flow surface.

ADR-101.md:138-141 documents the command as ./publish.sh images api-rocm --variants rocm60.
scripts/publish.sh:767 header comment also says --variants rocm60,rocm61.
The actual implemented surface is ./publish.sh images-rocm rocm60 rocm61 — a separate subcommand with positional args, no --variants flag.

Two stale references to a non-existent flag, both in load-bearing maintainer-facing docs. Cheap to fix; should align with what cmd_images_rocm actually parses.

3. Triplication of the GPU_MODE → KG_API_IMAGE_TAG derivation.

operator.sh:73-80, operator/lib/common.sh:39-54, and operator/lib/guided-init.sh:316-318 all carry the same case statement. The fact that c448f098 was necessary at all — because operator.sh's copy fell out of sync with common.sh's — is empirical evidence the duplication is unsafe.

You asked whether the two-copy version is intentional separation by entry-point. The answer would be yes for two. For three, the next ROCm version bump means three edits and a high probability of repeating exactly this drift. Suggest a tiny shared file — operator/lib/image-tag.sh exporting a derive_kg_api_image_tag() function — sourced by all three. The triplication is now the load-bearing problem, not the original two-copy split.

Should-fix

4. The asymmetry comment defends a contract no test locks.

The fix to extract _build_with_cpu_fallback + the long comment in reload_embedding_model_manager are sound, but tests/unit/lib/test_embedding_model_manager_fallback.py only exercises the init path. The reload path's claims — (a) does NOT fall back, (b) atomic-swap preserves the previous working manager on failure — are asserted by code reading only.

Add a test: existing _model_manager set → call reload_embedding_model_manager with a load_model that raises → assert _model_manager is still the original instance and a RuntimeError propagates. Otherwise a future refactor of reload_* breaks the contract silently and the comment becomes a lie.

5. Visual embedding init log still uses the dishonest pattern this PR explicitly killed for text.

api/app/main.py:218:

logger.info("   Visual embedding features may be limited")

Same shape as the "Falling back to API-based embeddings" line you just rewrote at line 199. When init_visual_embedding_generator raises and an image-using profile is active, "may be limited" is just as misleading as the old text-path message. The full fallback is correctly deferred per ADR-101, but the log line is a one-line edit (downgrade severity, surface what actually happens — ingestion of image content will fail until the profile is changed or the device is fixed) and it's in scope for this PR. Skipping it leaves a half-honest startup log.

6. No tracking issues for the deferred work.

PR description lists three follow-ups (rocm60/rocm61 publish, visual fallback, --image-source parity); GitHub issue search returns only #405. "Deferred to follow-up PR" without an issue is the kind of promise that decays. Open three before merge.

7. .operator.conf KG_API_IMAGE_TAG becomes stale if a user edits GPU_MODE by hand.

guided-init.sh persists the derived tag at init time. Both loaders short-circuit on if [ -z "$KG_API_IMAGE_TAG" ]. So a user who later edits GPU_MODE=cpu → GPU_MODE=amd-host in .operator.conf (which the auto-generated header comment at guided-init.sh:313 actively suggests they do) will keep the old KG_API_IMAGE_TAG=latest and silently land on the CUDA-bundled image — the same shape of bug c448f098 just fixed.

Two viable shapes:

Persist only the user-overridable bits (GPU_MODE, DEV_MODE, IMAGE_SOURCE) and always re-derive KG_API_IMAGE_TAG on load. Honors edits.
Keep persisting, but document inline in the generated file: # DO NOT edit GPU_MODE without also updating KG_API_IMAGE_TAG. Worse — relies on the user noticing.

Related: the .operator.conf auto-generated comment (./operator.sh config --dev true --gpu nvidia|amd|mac|cpu) references a config subcommand that doesn't exist in operator.sh. Another doc/code drift, low severity.

Nits

8. ROCM_VERSION override is undocumented in the persisted config.

common.sh:47 honors ROCM_VERSION as a shell-level override (amd → ${ROCM_VERSION:-rocm60}). Nobody writes it; the header comment in .operator.conf doesn't mention it; the wizard doesn't ask. Either drop the override path (currently dead weight outside test/dev) or document it as a comment line in the generated file.

9. guided-init.sh:317 doesn't honor ROCM_VERSION (only the other two copies do).

Consistent across the three copies for amd mode in common.sh and operator.sh, but guided-init.sh ignores it. Either way, fixing this is moot if you collapse the triplication (item 3).

Praise

The _build_with_cpu_fallback extraction is cleanly keyword-only-args and module-scope; the asymmetry comment on reload_* is exactly the kind of long-form-rationale-in-code that earns its keep three months from now when someone asks "why don't we fall back here too?"
The case-insensitive cpu retry guard with the parametrized test is a small fix that prevents real double-log noise.
test_embedding_model_manager_real_torch.py is the right shape — proves the catchable-exception contract the mocked tests can only assume. The skip-if-CUDA guard makes the assertions honest.
The rocm71-host → rocm72-host rename is sound (the old tag was never published; gh api confirms) and the "naming convention" addition to ADR-101 — future ROCm 7.3 lands as a new tag rather than overloading — is the right immutability stance.
End-to-end verification on real hardware (7900 XTX gfx1100 + iGPU 9950X3D gfx1036), with the CPU fallback also firing correctly during diagnosis, is the kind of evidence that turns "should work" into "verified."
The ROCR_VISIBLE_DEVICES=0 hint at yellow-info-warning severity is the right level — discrete + iGPU coexistence on Ryzen 7000+ is a real foot-gun and runtime ordering isn't guaranteed across kernel/ROCm versions, even if your hardware happened to enumerate the discrete card first.

Empirical checks done in this review

gh api .../kg-api/versions → confirmed rocm60/rocm61 not in registry
requires-python floor → only fuse/pyproject.toml pins (>=3.11); API has no constraint. py3.13→py3.12 base bump safe.
grep -rn "rocm71" → no stale references survive the rename
ADR-101's configure.py embedding --device reference → verified the flag exists in operator/configure.py:774
kg-api:latest multi-arch (amd64+arm64) → Apple Silicon path through mac → latest is sound

…visual log, doc-surface fixes Addresses code-reviewer findings on PR #408: * **operator/lib/image-tag.sh** is the single source of truth for GPU_MODE → KG_API_IMAGE_TAG mapping. operator.sh's load_config, operator/lib/common.sh's load_operator_config, and operator/lib/guided-init.sh all source it and call derive_kg_api_image_tag(). The earlier inline-three-times shape drifted (commit c448f09 was the empirical proof — fixed one of the copies after the bug had silently shipped). One definition now. * **Wizard option [3] no longer points at an unpublished image.** Previously `amd` mode resolved to KG_API_IMAGE_TAG=rocm60, but rocm60/rocm61 are deferred per ADR-101 §Negative — GHCR has only `latest` and `rocm72-host`. Users picking option [3] would have hit `manifest unknown` on pull — same "looks fixed, silently broken" shape this PR is supposed to eliminate. Now `amd` mode resolves to `latest` and relies on ADR-101's CPU fallback. Setting ROCM_VERSION=rocm60 in .operator.conf forces the variant tag once a tester confirms a build (tracked in #409). Wizard text reframed: "Linux with AMD GPU (ROCm 6.x — preview) / Falls back to CPU embeddings via :latest until variant ships". * **Visual-embedding startup log honest at main.py:218.** Parity with the text-embedding fix at main.py:199. No more "Visual embedding features may be limited" when the active profile asks for visual embeddings and the load failed (image ingestion will actually fail, not "be limited"). The new message names the failure mode and gives two concrete remediation paths (switch profile, or repair model config). Mirrors the text path. * **Doc/code surface drift fixed.** ADR-101 examples and publish.sh comment referenced `--variants rocm60,rocm61` flag and `images api-rocm` command that don't exist. Real surface is `images-rocm rocm60 --force` (positional variant args + force flag to opt into deferred variants). Issues opened for the items intentionally deferred from this PR: #409 — Build & publish ROCm 6.x variants (rocm60, rocm61) #410 — CPU fallback for visual embedding generator (text-parity) #411 — Operator: --image-source flag silently dropped by guided-init.sh #412 — Operator: persisted KG_API_IMAGE_TAG goes stale on manual edits Reload-asymmetry contract test (review item 4) intentionally deferred — covers existing pre-PR behavior, separate concern. Verified the helper across the three call sites: bash> source operator/lib/image-tag.sh bash> for m in amd amd-host nvidia cpu; do echo "$m -> $(derive_kg_api_image_tag $m)" done amd -> latest # was rocm60 (unpublished — review footgun) amd-host -> rocm72-host nvidia -> latest cpu -> latest bash> derive_kg_api_image_tag amd rocm60 rocm60 # ROCM_VERSION override still works

aaronsb added 5 commits May 24, 2026 16:39

aaronsb merged commit fceabf3 into main May 25, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(#405): honest CPU fallback for local embeddings + ADR-101 ROCm strategy#408

fix(#405): honest CPU fallback for local embeddings + ADR-101 ROCm strategy#408
aaronsb merged 6 commits into
mainfrom
fix/issue-405-fallback-and-rocm-adr

aaronsb commented May 24, 2026 •

edited

Loading

Uh oh!

aaronsb commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aaronsb commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in this PR

Verified on real hardware (7900 XTX)

Commits

Test plan

Tracking issues for deferred follow-ups

Housekeeping

Uh oh!

aaronsb commented May 25, 2026

Review — post-c448f098 final state

Must-fix / discuss

Should-fix

Nits

Praise

Empirical checks done in this review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aaronsb commented May 24, 2026 •

edited

Loading

Review — post-`c448f098` final state