Skip to content

CI fixes#290

Merged
luarss merged 2 commits into
The-OpenROAD-Project:masterfrom
luarss:fix-ciii2
Jun 5, 2026
Merged

CI fixes#290
luarss merged 2 commits into
The-OpenROAD-Project:masterfrom
luarss:fix-ciii2

Conversation

@luarss
Copy link
Copy Markdown
Collaborator

@luarss luarss commented Jun 5, 2026

  • async retriever graph - so its no longer a blocker for docker healthcheck
  • reduce docker healthcheck timeout
  • add for loop for health check simulation block in secret ci

luarss added 2 commits June 5, 2026 15:54
Split document embedding into 100-chunk batches with a 1s delay
between batches so a 429 only retries one batch (~1 API call) rather
than restarting FAISS.from_documents from scratch (~87 calls). Also
raise retry wait times from max 120s to max 600s to give the quota
time to reset before the next attempt.

Signed-off-by: Jack Luar <jluar@precisioninno.com>
Move RetrieverGraph construction out of module-level import in
conversations.py and into a background thread spawned during the
FastAPI lifespan. This lets the server start instantly so the
Docker health-check passes within the reduced 30 s start_period
(instead of timing out after 22+ min waiting for FAISS embedding).

- Replace module-level rg = RetrieverGraph(...) with lazy singleton
  (get_graph / start_graph_init / reset_graph_state_for_testing)
- Add /conversations/ready readiness probe returning 'ready' or
  'initializing'
- Conversation endpoints return 503 / stream error when graph is
  not yet initialized
- Add readiness poll loop (30 min, 10 s intervals) before Run LLM CI
  step in ci-secret.yaml
- Reduce Docker healthcheck start_period default from 1200 s to 30 s
- Update streaming tests to use new public reset_graph_state_for_testing()

Signed-off-by: Jack Luar <jluar@precisioninno.com>
@luarss luarss merged commit 052d03a into The-OpenROAD-Project:master Jun 5, 2026
6 checks passed
@luarss luarss deleted the fix-ciii2 branch June 5, 2026 17:23
luarss added a commit to luarss/ORAssistant that referenced this pull request Jun 6, 2026
* fix: defer graph init to background, prevent health-check timeout

Move RetrieverGraph construction out of module-level import in
conversations.py and into a background thread spawned during the
FastAPI lifespan. This lets the server start instantly so the
Docker health-check passes within the reduced 30 s start_period
(instead of timing out after 22+ min waiting for FAISS embedding).

- Replace module-level rg = RetrieverGraph(...) with lazy singleton
  (get_graph / start_graph_init / reset_graph_state_for_testing)
- Add /conversations/ready readiness probe returning 'ready' or
  'initializing'
- Conversation endpoints return 503 / stream error when graph is
  not yet initialized
- Add readiness poll loop (30 min, 10 s intervals) before Run LLM CI
  step in ci-secret.yaml
- Reduce Docker healthcheck start_period default from 1200 s to 30 s
- Update streaming tests to use new public reset_graph_state_for_testing()

Signed-off-by: Jack Luar <jluar@precisioninno.com>

---------

Signed-off-by: Jack Luar <jluar@precisioninno.com>
luarss added a commit to luarss/ORAssistant that referenced this pull request Jun 6, 2026
* fix: defer graph init to background, prevent health-check timeout

Move RetrieverGraph construction out of module-level import in
conversations.py and into a background thread spawned during the
FastAPI lifespan. This lets the server start instantly so the
Docker health-check passes within the reduced 30 s start_period
(instead of timing out after 22+ min waiting for FAISS embedding).

- Replace module-level rg = RetrieverGraph(...) with lazy singleton
  (get_graph / start_graph_init / reset_graph_state_for_testing)
- Add /conversations/ready readiness probe returning 'ready' or
  'initializing'
- Conversation endpoints return 503 / stream error when graph is
  not yet initialized
- Add readiness poll loop (30 min, 10 s intervals) before Run LLM CI
  step in ci-secret.yaml
- Reduce Docker healthcheck start_period default from 1200 s to 30 s
- Update streaming tests to use new public reset_graph_state_for_testing()

Signed-off-by: Jack Luar <jluar@precisioninno.com>

---------

Signed-off-by: Jack Luar <jluar@precisioninno.com>
Signed-off-by: Jack Luar <jluar@precisioninno.com>
luarss added a commit to luarss/ORAssistant that referenced this pull request Jun 6, 2026
* fix: defer graph init to background, prevent health-check timeout

Move RetrieverGraph construction out of module-level import in
conversations.py and into a background thread spawned during the
FastAPI lifespan. This lets the server start instantly so the
Docker health-check passes within the reduced 30 s start_period
(instead of timing out after 22+ min waiting for FAISS embedding).

- Replace module-level rg = RetrieverGraph(...) with lazy singleton
  (get_graph / start_graph_init / reset_graph_state_for_testing)
- Add /conversations/ready readiness probe returning 'ready' or
  'initializing'
- Conversation endpoints return 503 / stream error when graph is
  not yet initialized
- Add readiness poll loop (30 min, 10 s intervals) before Run LLM CI
  step in ci-secret.yaml
- Reduce Docker healthcheck start_period default from 1200 s to 30 s
- Update streaming tests to use new public reset_graph_state_for_testing()

Signed-off-by: Jack Luar <jluar@precisioninno.com>

---------

Signed-off-by: Jack Luar <jluar@precisioninno.com>
Signed-off-by: Jack Luar <jluar@precisioninno.com>
luarss added a commit to luarss/ORAssistant that referenced this pull request Jun 6, 2026
* fix: defer graph init to background, prevent health-check timeout

Move RetrieverGraph construction out of module-level import in
conversations.py and into a background thread spawned during the
FastAPI lifespan. This lets the server start instantly so the
Docker health-check passes within the reduced 30 s start_period
(instead of timing out after 22+ min waiting for FAISS embedding).

- Replace module-level rg = RetrieverGraph(...) with lazy singleton
  (get_graph / start_graph_init / reset_graph_state_for_testing)
- Add /conversations/ready readiness probe returning 'ready' or
  'initializing'
- Conversation endpoints return 503 / stream error when graph is
  not yet initialized
- Add readiness poll loop (30 min, 10 s intervals) before Run LLM CI
  step in ci-secret.yaml
- Reduce Docker healthcheck start_period default from 1200 s to 30 s
- Update streaming tests to use new public reset_graph_state_for_testing()

Signed-off-by: Jack Luar <jluar@precisioninno.com>

---------

Signed-off-by: Jack Luar <jluar@precisioninno.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant