Skip to content

[smoke-test] fix a flake: retry tag validation to tolerate Docker Hub rate limiting#1903

Open
V-Subhankar-infy wants to merge 3 commits into
devcontainers:mainfrom
V-Subhankar-infy:fix-python
Open

[smoke-test] fix a flake: retry tag validation to tolerate Docker Hub rate limiting#1903
V-Subhankar-infy wants to merge 3 commits into
devcontainers:mainfrom
V-Subhankar-infy:fix-python

Conversation

@V-Subhankar-infy

@V-Subhankar-infy V-Subhankar-infy commented Jun 23, 2026

Copy link
Copy Markdown
Member

Summary

Smoke-test tag validation failed intermittently with false Invalid variants
errors (e.g. 3.13-trixie) when anonymous Docker Hub calls got rate limited.

Root cause

validate-tags.sh ran a single docker manifest inspect per tag and treated
any non-zero exit — including HTTP 429 rate limiting — as "tag does not exist",
so one throttled call failed the whole job despite the tag existing upstream.

Changes (check_image_exists() only)

  • Retry the registry query up to 3× with a 10s pause before marking a tag invalid.
  • On failure, flag the one unambiguous transient cause — rate limiting (HTTP 429);
    everything else keeps the original ✗ Invalid - tag does not exist message.
  • Echo the raw registry response for debugging.

Preserved

Fail-closed: genuinely missing tags still exit 1, and the default message is unchanged and displays "invalid-tag"

Testing

bash -n passes; classification self-test confirms 429 → rate-limit message and
all else → original message;

@V-Subhankar-infy V-Subhankar-infy marked this pull request as ready for review June 23, 2026 07:22
@V-Subhankar-infy V-Subhankar-infy requested a review from a team as a code owner June 23, 2026 07:23
Copilot AI review requested due to automatic review settings June 23, 2026 07:23

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the smoke-test tag validator so transient Docker Hub failures (rate limiting/network/5xx) aren’t incorrectly reported as missing image tags, reducing false CI failures during variant validation.

Changes:

  • Adds retry-with-backoff around docker manifest inspect and distinguishes between valid, invalid (missing), and unverified (transient) outcomes.
  • Treats only known “missing manifest” errors as invalid; other errors are retried and then downgraded to warnings if still unresolved.
  • Introduces UNVERIFIED reporting to allow CI to proceed when tags cannot be verified due to transient issues.

Comment thread .github/actions/smoke-test/validate-tags.sh Outdated
return 1
fi

echo " ! Transient error (attempt ${attempt}/${max_attempts}): $(echo "$output" | tr '\n' ' ' | cut -c1-200)"
done
echo "These were not treated as missing; the build will continue."
echo ""
echo "✓ No invalid base image tags found (${#UNVERIFIED_TAGS[@]} could not be verified and were skipped)."
@V-Subhankar-infy V-Subhankar-infy changed the title fix: avoid false "Invalid variants" failures from Docker Hub rate limiting [python] fix: avoid false "Invalid variants" failures from Docker Hub rate limiting Jun 23, 2026
@V-Subhankar-infy V-Subhankar-infy marked this pull request as draft June 23, 2026 08:16
Comment thread .github/actions/smoke-test/validate-tags.sh Outdated
Comment thread .github/actions/smoke-test/validate-tags.sh Outdated
… rate limiting

validate-tags.sh ran a single anonymous `docker manifest inspect` per tag, so on
shared-IP CI runners any non-zero exit (including HTTP 429 rate limiting) was
misreported as "tag does not exist" and failed the whole smoke-test job with a
false "Invalid variants" error, even though the tag exists upstream.

- Retry the registry query up to 3 times with a 10s pause before marking a tag
  invalid, so a transient failure isn't mistaken for a missing tag.
- After all retries fail, flag the one unambiguous transient cause, Docker Hub
  rate limiting (HTTP 429); every other case keeps the original
  "tag does not exist" message unchanged.
- Echo the raw registry response to aid debugging.

Fail-closed behaviour is unchanged: a genuinely missing tag still exits 1.
@V-Subhankar-infy V-Subhankar-infy changed the title [python] fix: avoid false "Invalid variants" failures from Docker Hub rate limiting [smoke-test] fix a flake: retry tag validation to tolerate Docker Hub rate limiting Jun 23, 2026
@V-Subhankar-infy V-Subhankar-infy marked this pull request as ready for review June 23, 2026 17:59
@V-Subhankar-infy V-Subhankar-infy dismissed Kaniska244’s stale review June 23, 2026 18:01

I have considered all the points and made the following changes incorporating the review into the code.

  1. Removed the complex string match that depends on dockerhub response data type.
  2. Removed Complex code to handle multiple cases and sticked to simple sleep to wait for 10 seconds and retry 2 more times before stopping the build.
  3. Only when the docker-hub sends a definitive response code of 429 that only means rate limiting we will log that onto console.
  4. The code only updates the log displayed in case of failure, the core logic is untouched & the error validation is unaffected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants