Skip to content

Production readiness: fix password hashing, add CI, align docs#15

Merged
whittenator merged 5 commits into
mainfrom
claude/production-readiness-review-Qcoz0
May 29, 2026
Merged

Production readiness: fix password hashing, add CI, align docs#15
whittenator merged 5 commits into
mainfrom
claude/production-readiness-review-Qcoz0

Conversation

@whittenator

Copy link
Copy Markdown
Owner

Summary

A production-readiness pass over VisionForge. The codebase is further along than its own docs claim — but there were two real deploy blockers plus several hygiene gaps. Everything below is implemented and verified.

Critical fixes

  • Password hashing crashed on a fresh deploy. requirements.txt pinned passlib[bcrypt]>=1.7 (unmaintained since 2020) with no bcrypt ceiling, so a clean install pulls bcrypt 5.0, which passlib can't drive — every signup/login threw ValueError (module 'bcrypt' has no attribute '__about__' / 72-byte error). Replaced passlib with the maintained bcrypt library directly, truncate inputs to bcrypt's 72-byte limit, and pinned bcrypt>=4.1,<6.
  • No CI/CD. Added .github/workflows/ci.yml (backend lint + unit tests, frontend lint + build, agent tests). It installs the real requirements.txt (CPU torch) — exactly what would have caught the bcrypt break. The absence of CI is why broken tests had reached main unnoticed.

Docs

  • Rewrote the Security Notes in README.md and CLAUDE.md. They described SHA-256 hashing, token-{user_id} auth, and hardcoded CORS as TODOs — all three were already replaced with bcrypt, signed JWTs, and env-driven CORS. Now documents the genuinely-remaining hardening (rotate secrets, Redis-backed rate limiter for multi-replica, TLS, prod frontend build).

Production deployment

  • Added frontend/Dockerfile.prod (multi-stage Vite build served by nginx), frontend/nginx.conf (SPA history fallback + API reverse-proxy + /healthz), and frontend/.dockerignore. The compose frontend service was shipping the Vite dev server.
  • Added a backend healthcheck to docker-compose.yml.

Hygiene

  • Fixed stale test_embed_texts_deterministic_fallback (asserted dim 3 vs the service's actual 512) and made it hermetic.
  • Removed dead frontend/src/pages/admin/Users.jsx (the router resolves users.tsx).
  • Fixed pre-existing lint violations (B904 / B905 / F841 / I001) and whitelisted the framework Depends (B008) / alembic+bootstrap E402 patterns so the repo passes its own linters under CI. lint_all.sh had been swallowing these with || true.

Verification

  • Backend unit tests: 96 passed
  • Agent tests: 8 passed
  • ruff check and black --check: clean

Notes / out of scope

  • The auth rate limiter is in-memory / per-process — fine for a single replica, but needs Redis backing before horizontal scaling. Documented rather than re-architected.

https://claude.ai/code/session_01K5vvMbgiRvkvfRrcKeFxe8


Generated by Claude Code

claude added 5 commits May 28, 2026 22:52
Critical fixes
- Replace passlib with the maintained bcrypt library directly. passlib 1.7.4
  is unmaintained and crashes against bcrypt>=4.1 ("module 'bcrypt' has no
  attribute '__about__'" / 72-byte ValueError), so a fresh install broke every
  signup/login. Truncate inputs to bcrypt's 72-byte limit; pin bcrypt>=4.1,<6.
- Add a GitHub Actions CI workflow (backend lint+unit tests, frontend
  lint+build, agent tests). There was no CI, which is why broken tests reached
  main unnoticed.

Docs
- Rewrite the Security Notes in README.md and CLAUDE.md: they described
  SHA-256 hashing, token-{user_id} auth, and hardcoded CORS as TODOs, but all
  three were already replaced with bcrypt, signed JWTs, and env-driven CORS.
  Document the genuinely-remaining hardening items instead.

Production deployment
- Add frontend/Dockerfile.prod (multi-stage Vite build served by nginx) plus
  nginx.conf (SPA fallback + API reverse-proxy + /healthz) and a .dockerignore;
  the compose frontend service was shipping the Vite dev server.
- Add a backend healthcheck to docker-compose.yml.

Hygiene
- Fix stale test_embed_texts_deterministic_fallback (asserted dim 3 vs the
  service's 512) and make it hermetic.
- Remove dead pages/admin/Users.jsx (router resolves users.tsx).
- Fix pre-existing lint violations (B904/B905/F841/I001) and whitelist the
  framework Depends/E402 patterns so the repo passes its own linters in CI.

https://claude.ai/code/session_01K5vvMbgiRvkvfRrcKeFxe8
The .gitignore ignored the entire .github directory (two entries), so the
new .github/workflows/ci.yml was silently never staged or pushed — which is
why no CI run appeared. Remove the .github ignore rules and harden the
workflow for this repo's setup: no committed package-lock.json (use
npm install, drop lockfile cache) and explicit pip cache-dependency-path.

https://claude.ai/code/session_01K5vvMbgiRvkvfRrcKeFxe8
test_send_once_posts_correct_payload substring-matched the serialized JSON
with no separator whitespace (e.g. "register_token":"tok"), but httpx 0.27.2
serializes with spaces, so it failed under the agent's pinned deps. Parse the
JSON and assert on values instead.

https://claude.ai/code/session_01K5vvMbgiRvkvfRrcKeFxe8
Frontend job failed on 7 pre-existing ESLint errors (now that lint actually
runs in CI):
- The flat config set `globals: { browser: true }`, which declares a global
  literally named "browser" instead of the browser globals — so console,
  setInterval, clearInterval, structuredClone and Image were all no-undef.
  Import the `globals` package and spread globals.browser/node.
- App.jsx had UI text starting with "//" that ESLint read as a stray JSX
  comment (react/jsx-no-comment-textnodes) — wrap it as a string expression.
- Annotator.jsx had an eslint-disable for react-hooks/exhaustive-deps, but
  that plugin isn't loaded, so the directive itself errored — remove it.

Backend job timed out/failed installing the full ~1.5GB ML stack (torch,
ultralytics, onnx, onnxruntime, open-clip, opencv) just to run unit tests
that never import it (those libs are loaded lazily inside services). Add
backend/requirements-test.txt with the lightweight runtime deps and point the
unit-test job at it: install drops from minutes to ~15s and is deterministic.
The auth/bcrypt-class coverage is preserved since those deps remain.

Verified locally: 96 backend unit tests pass, ruff+black clean, frontend lint
+ build clean, agent tests pass.

https://claude.ai/code/session_01K5vvMbgiRvkvfRrcKeFxe8
…lects tests

The backend unit-test job failed at collection: test_agents_install.py does
`from tests.conftest import client`, which needs the backend root on sys.path.
conftest.py only adds src/, so it worked under `python -m pytest` (CWD on path)
but not under the bare `pytest` console script CI uses — ModuleNotFoundError:
No module named 'tests'. Add `pythonpath = [".", "src"]` to the pytest config
so collection works under any invocation.

Also remove the dead frontend/.eslintrc.cjs: ESLint 9 uses the flat config
(eslint.config.js) and warned that .eslintrc.cjs is not a valid flat-config
name on every lint run.

Verified locally with the CI-exact bare console scripts: 96 backend unit tests
pass, 8 agent tests pass, frontend lint + build clean.

https://claude.ai/code/session_01K5vvMbgiRvkvfRrcKeFxe8
@whittenator whittenator merged commit dcc4f82 into main May 29, 2026
3 checks passed
@whittenator whittenator deleted the claude/production-readiness-review-Qcoz0 branch May 29, 2026 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants