Skip to content

[Feat] Support "proxy" arch between coordinator and prover#1701

Merged
lispc merged 76 commits into
developfrom
coordinator_proxy
May 19, 2026
Merged

[Feat] Support "proxy" arch between coordinator and prover#1701
lispc merged 76 commits into
developfrom
coordinator_proxy

Conversation

@noel2004
Copy link
Copy Markdown
Member

@noel2004 noel2004 commented Jul 21, 2025

The PR has induced a "coordinator_proxy" app to support the provers cluster mode.

Designation: https://www.notion.so/scrollzkp/Almost-stateless-prover-proxy-2537792d22af80b08db3f8397e0824bd?d=29c7792d22af80ee98df001caf30b7e1#2537792d22af806880efc8cea92d0e19

Summary by CodeRabbit

  • New Features

    • Standalone coordinator proxy: proxy server for provers with smarter task routing, per-user session management, priority upstreams, and CLI entrypoint.
    • Proxy API & auth: JWT proxy login flow, per-client sessions, automatic token orchestration, /proxy endpoints for task retrieval and proof submission.
  • Chores

    • DB migrations, config, Docker/CI workflow, Makefile target, and build artifacts for proxy support.
  • Tests

    • Expanded proxy and session test coverage.
  • Improvements

    • Response payload decoding enhanced.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jul 21, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a coordinator proxy subsystem: CLI/server, upstream HTTP clients and managers, per-prover session management with optional persistence, proxy-aware auth and middleware, GET_TASK/SUBMIT_PROOF controllers, DB migrations, libzkp reorganizations, Response.DecodeData, tests, and Docker/CI assets.

Changes

Cohort / File(s) Summary
Build & CI / Config
coordinator/Makefile, coordinator/conf/config_proxy.json, common/go.mod, build/dockerfiles/coordinator-proxy.Dockerfile, build/dockerfiles/coordinator-proxy.Dockerfile.dockerignore, .github/workflows/docker.yml
Add coordinator_proxy build target, Dockerfile/dockerignore, CI job, proxy JSON config, and add mapstructure dependency.
Proxy CLI & Server
coordinator/cmd/proxy/main.go, coordinator/cmd/proxy/app/app.go, coordinator/cmd/proxy/app/flags.go
New proxy CLI entrypoint, Run() shim, CLI flags, and Gin server bootstrap with observability, controller init, routing, and graceful shutdown.
Proxy Controllers & Routing
coordinator/internal/route/route.go, coordinator/internal/controller/proxy/controller.go, coordinator/internal/controller/proxy/get_task.go, coordinator/internal/controller/proxy/submit_proof.go
Add proxy route group and InitController wiring; implement GetTask and SubmitProof controllers and priority upstream manager.
Auth & Middleware
coordinator/internal/controller/api/auth.go, coordinator/internal/controller/proxy/auth.go, coordinator/internal/middleware/challenge_jwt.go, coordinator/internal/middleware/login_jwt.go
Introduce proxy-aware AuthController, JWT payload/identity handlers; change middleware signatures to accept Auth, add ProxyLoginMiddleware, and adjust login flows (viaProxy handling).
Upstream Client & Manager
coordinator/internal/controller/proxy/client.go, coordinator/internal/controller/proxy/client_manager.go
Implement upstream HTTP client (upClient) and ClientManager including private-key derivation, login orchestration, token caching, compatibility-mode paths, and client factories.
Prover Session & Persistence
coordinator/internal/controller/proxy/prover_session.go, coordinator/internal/controller/proxy/persistent.go, database/migrate/migrations/proxy/0001_running_tables.sql
Add ProverManager and per-user proverSession with maintainLogin/ProxyLogin/GetTask/SubmitProof, GORM persistence for sessions and priority upstream, and corresponding SQL migration.
ORM / Migrations Helpers
database/migrate/migrate.go
Replace individual SQL embeds with directory embed and add MigrateModule, RollbackModule, and ResetModuleDB helpers.
libzkp Reorg & Mocks
coordinator/internal/logic/libzkp/message_types.go, coordinator/internal/logic/libzkp/universal_task.go, coordinator/internal/logic/libzkp/lib.go, coordinator/internal/logic/libzkp/lib_mock.go, coordinator/internal/logic/libzkp/mock_universal_task.go
Reorganize task-type mapping, add GenerateUniversalTask wrappers, implement mock_verifier build variant and mock implementations.
Login Logic / Dedup
coordinator/internal/logic/auth/login.go
LoginLogic now depends on VerifierConfig and an injectable ChallengeDeduplicator; split verification/version checks into VerifyMsg and CompatiblityCheck; add SimpleDeduplicator constructor.
Types & Response helper
common/types/response.go, coordinator/internal/types/prover.go, coordinator/internal/types/response_test.go
Add Response.DecodeData using mapstructure (JSON tags), add ProverProviderTypeProxy enum case and String mapping, and unit test for DecodeData.
Tests & Test Helpers
coordinator/test/*, coordinator/internal/controller/proxy/prover_session_test.go, coordinator/test/proxy_test.go, coordinator/test/mock_prover.go, coordinator/internal/orm/orm_test.go
Add extensive proxy integration/unit tests and scaffolding, mockProver token caching, test setup/launch helpers, and a minor test teardown guard fix.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant P as Prover (Client)
    participant PX as Proxy Server
    participant UP as Upstream Coordinator
    participant DB as DB/Persistence

    P->>PX: GET /coordinator/v1/challenge
    PX->>P: challenge

    P->>PX: POST /coordinator/v1/login (signed)
    PX->>UP: GET /coordinator/v1/challenge (upstream)
    UP->>PX: upstream challenge
    PX->>UP: POST /coordinator/v1/login (proxy-signed/forwarded)
    UP->>PX: login token
    PX->>DB: persist session/token (optional)
    PX->>P: return JWT

    P->>PX: POST /coordinator/v1/get_task (JWT)
    PX->>PX: choose upstream (priority or selection)
    PX->>UP: POST /coordinator/v1/get_task (with token)
    UP->>PX: task (annotated with upstream)
    PX->>DB: update priority mapping
    PX->>P: task

    P->>PX: POST /coordinator/v1/submit_proof (taskID includes upstream)
    PX->>PX: resolve upstream
    PX->>UP: POST /coordinator/v1/submit_proof (with token)
    UP->>PX: success
    PX->>DB: clear priority mapping
    PX->>P: success
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • lispc
  • georgehao
  • Thegaram

Poem

🐇 I hopped a proxy, quick and spry,

I fetched a challenge from the sky,
I signed and cached each little token,
Sessions hummed and tasks were woken,
I thumped my foot — the tests ran by!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is minimal and lacks detail on what, why, and how the changes are made; the PR title checkbox section is incomplete, and deployment/breaking change sections are not addressed. Complete the description template by explaining the purpose/design rationale, confirming the conventional commit type (feat), addressing deployment tag and breaking change status, and providing sufficient context for reviewers.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly describes the main change: adding support for a 'proxy' architecture between coordinator and prover components.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch coordinator_proxy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c2bc02 and 21326c2.

📒 Files selected for processing (1)
  • common/go.mod (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: check
  • GitHub Check: tests
  • GitHub Check: tests
  • GitHub Check: tests
  • GitHub Check: check
  • GitHub Check: check
  • GitHub Check: tests
  • GitHub Check: tests

Comment thread common/go.mod
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@coordinator/internal/controller/proxy/client.go`:
- Around line 70-122: In upClient.Login, the HTTP requests ignore the provided
ctx; replace both calls to http.NewRequest with http.NewRequestWithContext using
the ctx passed into Login (the initial challenge request that creates req and
the later login request that reuses req/loginResp) so cancellations/timeouts
propagate; leave the existing error handling, headers
(Content-Type/Authorization) and subsequent calls to c.httpClient.Do and
handleHttpResp unchanged.

In `@coordinator/internal/controller/proxy/prover_session.go`:
- Around line 51-53: The log statements in prover_session.go that print
authentication material (specifically the calls referencing ret.proverToken in
the log.Debug invocations near the block where ret.persistent is set and the
other occurrence around lines 226-227) must stop exposing tokens: remove the
"token", ret.proverToken field from the log arguments or replace it with a
non-sensitive representation (e.g., a constant like "<redacted>" or a
truncated/hashed version) and keep the rest of the context (e.g., "key",
userKey, and persistent assignment). Update both occurrences (the Debug call
that currently includes "token", ret.proverToken and the matching later log) to
use the masked value or omit the token entirely.
- Around line 205-216: In ProxyLogin (method proverSession.ProxyLogin) guard the
nil persistent store before calling Update: check if c.persistent != nil (and
only then call c.persistent.Update(param.PublicKey, up, newtoken.LoginSchema))
to avoid panics in non‑persistent mode; keep the existing log on Update error
and optionally add a debug/log branch when c.persistent is nil to make behavior
explicit.
- Around line 45-55: The Get method's deferred restore block assumes
m.persistent is non-nil and calls m.persistent.Get, causing a panic when
NewProverManager leaves persistent nil; update the defer to first check if
m.persistent != nil before calling m.persistent.Get(userKey) and
logging/restoring ret.proverToken and ret.persistent, so the code only attempts
to access the persistent store when it exists (reference symbols: Get,
NewProverManager, m.persistent, userKey, ret, ret.proverToken, ret.persistent).
♻️ Duplicate comments (1)
coordinator/internal/controller/proxy/client.go (1)

140-158: Compatibility-mode signing derives key from a public key
buildPrivateKey([]byte(param.PublicKey)) uses a public key string as seed material, which will not reproduce the original signing key and can break upstream verification. Please reuse the proxy’s configured private key (or pass a signer into upClient) instead of re-deriving from PublicKey.

🧹 Nitpick comments (1)
coordinator/internal/controller/proxy/client.go (1)

125-137: Make handleHttpResp errors endpoint-agnostic
The messages say “login …” even for get_task/submit_proof, which can mislead logs. Consider generic wording.

♻️ Suggested tweak
-			return nil, fmt.Errorf("login parsing expected response failed: %v", err)
+			return nil, fmt.Errorf("parsing expected response failed: %w", err)
 ...
-	return nil, fmt.Errorf("login request failed with status: %d", resp.StatusCode)
+	return nil, fmt.Errorf("request failed with status: %d", resp.StatusCode)

Comment thread coordinator/internal/controller/proxy/client.go
Comment thread coordinator/internal/controller/proxy/prover_session.go
Comment thread coordinator/internal/controller/proxy/prover_session.go
Comment thread coordinator/internal/controller/proxy/prover_session.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@coordinator/internal/controller/proxy/prover_session.go`:
- Line 271: Update the log message in the SubmitProof method to correct the
copy-paste error: replace the "Get Task first-try failed for broken token" text
with an accurate message such as "SubmitProof first-try failed for broken token"
(leave the existing structured fields like "up" and "errcode" and the log call
using log.Debug unchanged) so the log reflects the SubmitProof operation instead
of Get Task.
♻️ Duplicate comments (3)
coordinator/internal/controller/proxy/prover_session.go (3)

45-55: Guard nil persistent store to avoid panic in non-persistent mode.

NewProverManager leaves persistent nil, but line 48 dereferences it without a nil check, which will cause a panic when Get is called on a ProverManager created without persistence.

🐛 Proposed fix
 	defer func() {
-		if ret == nil {
+		if ret == nil && m.persistent != nil {
 			var err error
 			ret, err = m.persistent.Get(userKey)
 			if err != nil {
 				log.Error("Get persistent layer for prover tokens fail", "error", err)
 			} else if ret != nil {
 				log.Debug("restore record from persistent", "key", userKey)
 				ret.persistent = m.persistent
 			}
 		}

211-216: Guard nil persistent store before Update().

c.persistent can be nil in non-persistent mode, causing a panic at line 213 when attempting to update the persistent layer.

🐛 Proposed fix
 	if newtoken.phase > existedToken.phase {
-		if err := c.persistent.Update(param.PublicKey, up, newtoken.LoginSchema); err != nil {
-			log.Error("Update persistent layer for prover tokens fail", "error", err)
+		if c.persistent != nil {
+			if err := c.persistent.Update(param.PublicKey, up, newtoken.LoginSchema); err != nil {
+				log.Error("Update persistent layer for prover tokens fail", "error", err)
+			}
 		}
 	}

224-227: Remove prover tokens from log output.

Line 225 logs the entire c.proverToken map, which exposes authentication material. This should be removed for security.

🔒 Proposed fix
 	c.RLock()
-	log.Debug("call get task", "up", up, "tokens", c.proverToken)
+	log.Debug("call get task", "up", up)
 	token := c.proverToken[up]
 	c.RUnlock()

Comment thread coordinator/internal/controller/proxy/prover_session.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
common/go.mod (1)

15-15: ⚠️ Potential issue | 🟠 Major

Avoid adding the archived mitchellh/mapstructure module directly.

This change promotes the old archived import path to a first-class dependency. The original repository has been read-only since July 22, 2024, and the Go vulnerability database tracks GHSA-fv92-fjc5-jj9h on the maintained fork, with fixed versions starting at github.com/go-viper/mapstructure/v2@v2.3.0; v2.4.0 is available. Please depend on the maintained v2 module and update the new imports instead of pinning v1.5.0 here. (github.com)

🔧 Suggested dependency change
-	github.com/mitchellh/mapstructure v1.5.0
+	github.com/go-viper/mapstructure/v2 v2.4.0
#!/bin/bash
# Verify remaining references to the archived module before/after migration.
rg -n --type=go '"github\.com/mitchellh/mapstructure"' .
rg -n 'github\.com/mitchellh/mapstructure' common/go.mod
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@common/go.mod` at line 15, The go.mod entry currently adds the archived
module "github.com/mitchellh/mapstructure v1.5.0"; replace that dependency with
the maintained v2 module "github.com/go-viper/mapstructure/v2" (prefer v2.4.0 or
at least v2.3.0) and update all import paths in the codebase from
"github.com/mitchellh/mapstructure" to "github.com/go-viper/mapstructure/v2" so
imports include the /v2 module suffix; after updating go.mod and imports, run
module resolution (go get / go mod tidy) to ensure the new v2 module is fetched
and no references to the archived path remain.
coordinator/internal/controller/proxy/prover_session.go (1)

280-282: ⚠️ Potential issue | 🔴 Critical

Stop logging the token cache.

c.proverToken contains upstream auth material, so this debug line leaks credentials into logs. Log only the upstream name or a redacted summary.

Safe logging change
-	log.Debug("call get task", "up", up, "tokens", c.proverToken)
+	log.Debug("call get task", "up", up)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@coordinator/internal/controller/proxy/prover_session.go` around lines 280 -
282, The debug statement in the get-task path currently logs the full token
cache (c.proverToken) which leaks upstream auth material; change the log.Debug
call in the critical section around c.RLock()/token := c.proverToken[up] to stop
printing c.proverToken and instead log only the upstream identifier (up) or a
redacted/summary value (e.g., token count or masked string) so no secrets are
emitted; update the call site that references c.proverToken in log.Debug to use
the safe representation.
🧹 Nitpick comments (1)
coordinator/internal/controller/proxy/controller.go (1)

10-17: Consider making controller initialization explicitly one-shot.

InitController rewrites the package-level controllers every time it runs. If the proxy is initialized twice in one process, the previous ClientManager instances are just dropped even though they still own cached completion state (coordinator/internal/controller/proxy/client_manager.go, Lines 29-40). A sync.Once or explicit shutdown/reset path would make that lifecycle safer.

Also applies to: 24-44

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@coordinator/internal/controller/proxy/controller.go` around lines 10 - 17,
InitController currently reassigns package-level variables GetTask, SubmitProof
and Auth each time it's called which leaks previous ClientManager instances (see
ClientManager in client_manager.go); change the initialization to be one-shot by
guarding InitController with a sync.Once (or add an explicit shutdown/reset that
calls a Close/Stop on existing ClientManager instances before reassigning) so
you don't drop live state. Locate InitController and the package-level vars
GetTask/SubmitProof/Auth and either wrap the init body with a sync.Once or
implement and call a shutdown() that cleans up ClientManager resources before
overwriting the controllers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@coordinator/internal/controller/proxy/controller.go`:
- Around line 31-33: The panic in the NewClientManager error path discards
context; locate the call to NewClientManager in controller.go (the variables nm,
cfg.ProxyManager.Client, upCfg) and replace the blind panic with returning or
logging the wrapped error that includes the coordinator name (nm) and the
original err (e.g., fmt.Errorf("NewClientManager failed for coordinator %s: %w",
nm, err)) or propagate the error up from the function instead of panicking so
the upstream-specific init error is preserved.

In `@coordinator/internal/controller/proxy/prover_session.go`:
- Around line 192-203: The maintainLogin defer currently only updates the
in-memory c.proverToken[up] and logs, so refreshes are lost on restart; modify
maintainLogin to also persist the refreshed login (call persistent.Update or the
coordinator's persistence API) inside the same critical section when
result.LoginSchema != nil, and ensure you copy the expiry from the
loginResult/login response into the LoginSchema you persist (preserve Token and
set Expired/expiry field) so persistent.Update receives the full token+expiry
instead of a zero value; apply the same change for the second similar block
(around the 249-254 area) so both refresh paths write the token+expiry to
persistent storage.
- Around line 101-117: The deferred restore/insert path can race and create
duplicate proverSession objects for the same userKey; before assigning
m.data[userKey] you must acquire the write lock and re-check whether an entry
already exists and use that existing session instead of unconditionally writing
ret. Concretely, inside the defer in prover_session.go (the block that calls
m.persistent.Get and then does m.Lock(); m.data[userKey] = ret; m.Unlock()),
change the flow to: Lock, check m.data[userKey] again, if present use the
existing session (and discard or merge ret as appropriate), otherwise set
m.data[userKey]=ret and set ret.persistent; then Unlock. Apply the same
re-check-under-lock pattern to the similar block around lines 129-149 that also
writes m.data.

---

Duplicate comments:
In `@common/go.mod`:
- Line 15: The go.mod entry currently adds the archived module
"github.com/mitchellh/mapstructure v1.5.0"; replace that dependency with the
maintained v2 module "github.com/go-viper/mapstructure/v2" (prefer v2.4.0 or at
least v2.3.0) and update all import paths in the codebase from
"github.com/mitchellh/mapstructure" to "github.com/go-viper/mapstructure/v2" so
imports include the /v2 module suffix; after updating go.mod and imports, run
module resolution (go get / go mod tidy) to ensure the new v2 module is fetched
and no references to the archived path remain.

In `@coordinator/internal/controller/proxy/prover_session.go`:
- Around line 280-282: The debug statement in the get-task path currently logs
the full token cache (c.proverToken) which leaks upstream auth material; change
the log.Debug call in the critical section around c.RLock()/token :=
c.proverToken[up] to stop printing c.proverToken and instead log only the
upstream identifier (up) or a redacted/summary value (e.g., token count or
masked string) so no secrets are emitted; update the call site that references
c.proverToken in log.Debug to use the safe representation.

---

Nitpick comments:
In `@coordinator/internal/controller/proxy/controller.go`:
- Around line 10-17: InitController currently reassigns package-level variables
GetTask, SubmitProof and Auth each time it's called which leaks previous
ClientManager instances (see ClientManager in client_manager.go); change the
initialization to be one-shot by guarding InitController with a sync.Once (or
add an explicit shutdown/reset that calls a Close/Stop on existing ClientManager
instances before reassigning) so you don't drop live state. Locate
InitController and the package-level vars GetTask/SubmitProof/Auth and either
wrap the init body with a sync.Once or implement and call a shutdown() that
cleans up ClientManager resources before overwriting the controllers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 911d9818-e443-4180-9a9e-6dbe41b75e8c

📥 Commits

Reviewing files that changed from the base of the PR and between feeac8f and d6783b4.

📒 Files selected for processing (6)
  • common/go.mod
  • common/observability/server.go
  • coordinator/cmd/proxy/app/app.go
  • coordinator/internal/controller/proxy/controller.go
  • coordinator/internal/controller/proxy/prover_session.go
  • coordinator/internal/controller/proxy/prover_session_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • coordinator/internal/controller/proxy/prover_session_test.go
  • coordinator/cmd/proxy/app/app.go

Comment thread coordinator/internal/controller/proxy/controller.go
Comment thread coordinator/internal/controller/proxy/prover_session.go
Comment thread coordinator/internal/controller/proxy/prover_session.go
lispc
lispc previously approved these changes May 19, 2026
Copy link
Copy Markdown
Contributor

@lispc lispc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost fully new codes. approved

- Fix gas_oracle/app.go: number-1 underflow when L1 block number is 0
- Fix bridge_test.go: use require.NoError instead of assert.NoError in
  setupEnv/setupDB/prepareContracts to prevent nil pointer panic when
  preceding assertions fail
lispc
lispc previously approved these changes May 19, 2026
Comment thread coordinator/cmd/proxy/app/app.go Outdated
lispc
lispc previously approved these changes May 19, 2026
@lispc lispc merged commit 71f7160 into develop May 19, 2026
15 checks passed
@lispc lispc deleted the coordinator_proxy branch May 19, 2026 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants