manage: extract _fetch_image_info to reduce duplication#2217
Draft
manage: extract _fetch_image_info to reduce duplication#2217
Conversation
Four `osism manage image` commands (octavia, clusterapi,
clusterapi-gardener, gardenlinux) fetch marker and checksum
files from nbg1.your-objectstorage.com using bare
requests.get() with no error handling and no retry. When the
Ceph RGW backend transiently returns an XML S3 error document,
the code parses <?xml as the checksum and
openstack-image-manager rejects it with
'sha256:<?xml' is not a valid checksum.
Analysis of 295 .CHECKSUM fetch events logged across testbed
builds in the window 2025-12-14 – 2026-04-27 shows 84 XML
failures (28.5 % of fetches). 94 % of those failures returned
in ≤ 2 s (fast canned RGW 503 response); the remaining 6 %
returned in 8–60 s. Zero failures were connection-level errors;
every failure returned an HTTP response. Successful fetches
span 0–53 s (p99 = 9 s).
New module osism/utils/http.py exports fetch_text, which wraps
requests.get with:
- Retry on {408, 429} ∪ 5xx (covers the observed RGW 503)
- Retry on non-HTTPError RequestException (connection / DNS / TLS)
- Retry when an optional validate callback rejects the body
(guards against HTTP 200 with unexpected content)
- Immediate HTTPError propagation on non-retryable 4xx (404, 403)
- Structured INFO log lines per attempt for observability in Zuul
Default schedule: 3 retries, 2 s / 4 s / 8 s sleeps (14 s budget).
Two validators are added to manage.py:
- _validate_marker: generic YYYY-MM-DD <name>.qcow2 contract;
rejects XML bodies without hard-coding any image-name prefix,
so production deployments with unfamiliar names pass through
to downstream validation rather than burning the retry budget.
- _is_sha256: requires a 64-char lowercase hex first token,
matching sha256sum(1) output; accepting uppercase would mask a
downstream mismatch rather than surface it.
All seven requests.get call sites in manage.py are replaced:
clusterapi: marker + .CHECKSUM (take_action lines 110, 125)
clusterapi-gardener: marker + .CHECKSUM (take_action lines 229, 245)
gardenlinux: .sha256 (take_action line 354)
octavia: marker + .CHECKSUM (take_action lines 440, 451)
The checksum_url_status log line added to octavia in ce844a0 is
removed; fetch_text emits the status code on every attempt.
No per-attempt timeout is added. The distributions of slow
failures (8–60 s) and slow successes (9–53 s) overlap — a 41 s
duration appears as both a failure and a success in the data.
No timeout value cleanly separates the two populations without
introducing false positives on legitimate slow responses.
34 unit tests across three new files cover the retry helper
(test_http.py, 15 tests), the validators
(test_manage_validators.py, 15 tests), and the call-site wiring
(test_manage_wiring.py, 4 tests).
AI-assisted: Claude Code
Signed-off-by: Roger Luethi <luethi@osism.tech>
ImageClusterapi, ImageClusterapiGardener, and ImageOctavia all share the same marker-fetch and checksum-fetch sequence: fetch a marker file, parse the date and image filename, construct the image URL, fetch the .CHECKSUM file, and log each step. This identical block was repeated verbatim in all three take_action() implementations. Extract this sequence into a private _fetch_image_info(base_url, marker_url) helper that returns (date, image_filename, url, checksum). Callers that need the image filename for version extraction (ImageClusterapi, ImageClusterapiGardener) unpack it; ImageOctavia discards it with _. ImageGardenlinux is deliberately excluded: it constructs the image URL directly from a known pattern rather than fetching a marker file, so it shares only the checksum-fetch half of the pattern and does not fit this helper without contortion. AI-assisted: Claude Code Signed-off-by: Roger Luethi <luethi@osism.tech>
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
_fetch_image_info, consider validating the number of whitespace-separated fields in the marker body before doingstrip().split()[:2]so that a malformed marker produces a clear, custom error instead of an unhandledValueErrorfrom tuple unpacking.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `_fetch_image_info`, consider validating the number of whitespace-separated fields in the marker body before doing `strip().split()[:2]` so that a malformed marker produces a clear, custom error instead of an unhandled `ValueError` from tuple unpacking.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ImageClusterapi, ImageClusterapiGardener, and ImageOctavia all share
the same marker-fetch and checksum-fetch sequence: fetch a marker
file, parse the date and image filename, construct the image URL,
fetch the .CHECKSUM file, and log each step. This identical block
was repeated verbatim in all three take_action() implementations.
Extract this sequence into a private _fetch_image_info(base_url,
marker_url) helper that returns (date, image_filename, url, checksum).
Callers that need the image filename for version extraction
(ImageClusterapi, ImageClusterapiGardener) unpack it; ImageOctavia
discards it with _.
ImageGardenlinux is deliberately excluded: it constructs the image
URL directly from a known pattern rather than fetching a marker file,
so it shares only the checksum-fetch half of the pattern and does not
fit this helper without contortion.
AI-assisted: Claude Code
Signed-off-by: Roger Luethi luethi@osism.tech
Stack created with GitHub Stacks CLI • Give Feedback 💬