Skip to content

Commit b9f2ebb

Browse files
authored
Merge pull request #1328 from Open-Source-Legal/fix/dockerfile-pip-retries
Harden Dockerfile pip against mid-stream download failures
2 parents ff5d1e9 + fe58be9 commit b9f2ebb

3 files changed

Lines changed: 29 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2222

2323
### Fixed
2424

25+
- **Backend CI build aborted on transient SSL failure mid-wheel-download** (`compose/local/django/Dockerfile`, `compose/production/django/Dockerfile`): Backend CI run `24646911374` on commit `233a9b67` (push to `main`) failed in the `Build the Stack` step with `ssl.SSLError: [SSL] record layer failure (_ssl.c:2590)` at 22 MB of 60.4 MB while `pip wheel` was downloading `opencv-python-headless`, aborting the entire build and cascading skips through `Run DB Migrations`, `Verify Docker Containers`, and `pytest`. Pip's default `--retries 5` only covers connection setup and does not resume broken mid-stream downloads — that behaviour is gated on `--resume-retries`, added in pip 24.1 (env var `PIP_RESUME_RETRIES`). Added `ENV PIP_RETRIES=10 PIP_TIMEOUT=60 PIP_RESUME_RETRIES=5` to both the `python-build-stage` and `python-run-stage` of both Dockerfiles so every `pip install` / `pip wheel` invocation (wheel build, `--upgrade pip`, spacy model downloads) picks up the hardened settings. Verified the base image (`pytorch/pytorch:2.7.1-cuda12.6-cudnn9-runtime`) ships pip 25.1.1 and recognises all three env vars (`pip config list` reports `:env:.resume-retries='5'`, `pip install --help` shows `--timeout (default 60.0 seconds)`).
2526
- **`fullDatacellList` no-args path was unbounded** (Issue #1256, follow-up to PR #1235, `config/graphql/extract_types.py:131-148`): `ExtractType.resolve_full_datacell_list` capped the `limit` and `offset-only` branches at `MAX_FULL_DATACELL_LIST_LIMIT` but returned the entire queryset when called with no arguments. Authenticated callers hitting `fullDatacellList` directly (no `limit`, no `offset`) could bypass the payload bound. Collapsed all three code paths so every call — no-args, offset-only, or `limit`+`offset` — returns `qs[start : start + min(limit_or_max, MAX_FULL_DATACELL_LIST_LIMIT)]`. The embed UI is unaffected (it already passes `limit: 500`); direct API callers now receive at most 500 cells and must paginate via `offset` to walk the rest. Added regression test `test_full_datacell_list_no_args_capped_at_server_max` in `opencontractserver/tests/test_extract_queries.py` which creates 501 cells and asserts the no-args response returns exactly `MAX_FULL_DATACELL_LIST_LIMIT` while `datacellCount` reports the true total.
2627

2728
### Changed

compose/local/django/Dockerfile

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,15 @@ FROM python as python-build-stage
99
ARG BUILD_ENVIRONMENT=local
1010
ARG GITHUB_ACTIONS
1111

12+
# Harden pip against transient network failures during large wheel downloads.
13+
# PIP_RESUME_RETRIES (pip 24.1+) resumes broken downloads; the others bump
14+
# connection retry count and socket timeout. Without these, a mid-stream SSL
15+
# drop while fetching a large wheel (e.g. opencv-python-headless) aborts the
16+
# entire build.
17+
ENV PIP_RETRIES=10 \
18+
PIP_TIMEOUT=60 \
19+
PIP_RESUME_RETRIES=5
20+
1221
# Debugging line for build args
1322
RUN echo "GITHUB_ACTIONS: $GITHUB_ACTIONS"
1423

@@ -56,6 +65,11 @@ ENV PYTHONUNBUFFERED 1
5665
ENV PYTHONDONTWRITEBYTECODE 1
5766
ENV BUILD_ENV ${BUILD_ENVIRONMENT}
5867

68+
# Harden pip against transient network failures (see python-build-stage for context).
69+
ENV PIP_RETRIES=10 \
70+
PIP_TIMEOUT=60 \
71+
PIP_RESUME_RETRIES=5
72+
5973
# CUDA-specific environment variables for optimal performance
6074
ENV CUDA_MODULE_LOADING=LAZY
6175
ENV TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0;8.6;8.9;9.0"

compose/production/django/Dockerfile

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,15 @@ FROM python as python-build-stage
66

77
ARG BUILD_ENVIRONMENT=production
88

9+
# Harden pip against transient network failures during large wheel downloads.
10+
# PIP_RESUME_RETRIES (pip 24.1+) resumes broken downloads; the others bump
11+
# connection retry count and socket timeout. Without these, a mid-stream SSL
12+
# drop while fetching a large wheel (e.g. opencv-python-headless) aborts the
13+
# entire build.
14+
ENV PIP_RETRIES=10 \
15+
PIP_TIMEOUT=60 \
16+
PIP_RESUME_RETRIES=5
17+
918
# Install apt packages
1019
RUN apt-get update && apt-get install --no-install-recommends -y \
1120
# dependencies for building Python packages
@@ -50,6 +59,11 @@ ENV PYTHONUNBUFFERED 1
5059
ENV PYTHONDONTWRITEBYTECODE 1
5160
ENV BUILD_ENV ${BUILD_ENVIRONMENT}
5261

62+
# Harden pip against transient network failures (see python-build-stage for context).
63+
ENV PIP_RETRIES=10 \
64+
PIP_TIMEOUT=60 \
65+
PIP_RESUME_RETRIES=5
66+
5367
# CUDA-specific environment variables for optimal performance
5468
ENV CUDA_MODULE_LOADING=LAZY
5569
ENV TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0;8.6;8.9;9.0"

0 commit comments

Comments
 (0)