[Feat] Support "proxy" arch between coordinator and prover#1701
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a coordinator proxy subsystem: CLI/server, upstream HTTP clients and managers, per-prover session management with optional persistence, proxy-aware auth and middleware, GET_TASK/SUBMIT_PROOF controllers, DB migrations, libzkp reorganizations, Response.DecodeData, tests, and Docker/CI assets. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant P as Prover (Client)
participant PX as Proxy Server
participant UP as Upstream Coordinator
participant DB as DB/Persistence
P->>PX: GET /coordinator/v1/challenge
PX->>P: challenge
P->>PX: POST /coordinator/v1/login (signed)
PX->>UP: GET /coordinator/v1/challenge (upstream)
UP->>PX: upstream challenge
PX->>UP: POST /coordinator/v1/login (proxy-signed/forwarded)
UP->>PX: login token
PX->>DB: persist session/token (optional)
PX->>P: return JWT
P->>PX: POST /coordinator/v1/get_task (JWT)
PX->>PX: choose upstream (priority or selection)
PX->>UP: POST /coordinator/v1/get_task (with token)
UP->>PX: task (annotated with upstream)
PX->>DB: update priority mapping
PX->>P: task
P->>PX: POST /coordinator/v1/submit_proof (taskID includes upstream)
PX->>PX: resolve upstream
PX->>UP: POST /coordinator/v1/submit_proof (with token)
UP->>PX: success
PX->>DB: clear priority mapping
PX->>P: success
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
4c31d33 to
64ef0f4
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
common/go.mod(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: check
- GitHub Check: tests
- GitHub Check: tests
- GitHub Check: tests
- GitHub Check: check
- GitHub Check: check
- GitHub Check: tests
- GitHub Check: tests
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@coordinator/internal/controller/proxy/client.go`:
- Around line 70-122: In upClient.Login, the HTTP requests ignore the provided
ctx; replace both calls to http.NewRequest with http.NewRequestWithContext using
the ctx passed into Login (the initial challenge request that creates req and
the later login request that reuses req/loginResp) so cancellations/timeouts
propagate; leave the existing error handling, headers
(Content-Type/Authorization) and subsequent calls to c.httpClient.Do and
handleHttpResp unchanged.
In `@coordinator/internal/controller/proxy/prover_session.go`:
- Around line 51-53: The log statements in prover_session.go that print
authentication material (specifically the calls referencing ret.proverToken in
the log.Debug invocations near the block where ret.persistent is set and the
other occurrence around lines 226-227) must stop exposing tokens: remove the
"token", ret.proverToken field from the log arguments or replace it with a
non-sensitive representation (e.g., a constant like "<redacted>" or a
truncated/hashed version) and keep the rest of the context (e.g., "key",
userKey, and persistent assignment). Update both occurrences (the Debug call
that currently includes "token", ret.proverToken and the matching later log) to
use the masked value or omit the token entirely.
- Around line 205-216: In ProxyLogin (method proverSession.ProxyLogin) guard the
nil persistent store before calling Update: check if c.persistent != nil (and
only then call c.persistent.Update(param.PublicKey, up, newtoken.LoginSchema))
to avoid panics in non‑persistent mode; keep the existing log on Update error
and optionally add a debug/log branch when c.persistent is nil to make behavior
explicit.
- Around line 45-55: The Get method's deferred restore block assumes
m.persistent is non-nil and calls m.persistent.Get, causing a panic when
NewProverManager leaves persistent nil; update the defer to first check if
m.persistent != nil before calling m.persistent.Get(userKey) and
logging/restoring ret.proverToken and ret.persistent, so the code only attempts
to access the persistent store when it exists (reference symbols: Get,
NewProverManager, m.persistent, userKey, ret, ret.proverToken, ret.persistent).
♻️ Duplicate comments (1)
coordinator/internal/controller/proxy/client.go (1)
140-158: Compatibility-mode signing derives key from a public key
buildPrivateKey([]byte(param.PublicKey))uses a public key string as seed material, which will not reproduce the original signing key and can break upstream verification. Please reuse the proxy’s configured private key (or pass a signer intoupClient) instead of re-deriving fromPublicKey.
🧹 Nitpick comments (1)
coordinator/internal/controller/proxy/client.go (1)
125-137: MakehandleHttpResperrors endpoint-agnostic
The messages say “login …” even for get_task/submit_proof, which can mislead logs. Consider generic wording.♻️ Suggested tweak
- return nil, fmt.Errorf("login parsing expected response failed: %v", err) + return nil, fmt.Errorf("parsing expected response failed: %w", err) ... - return nil, fmt.Errorf("login request failed with status: %d", resp.StatusCode) + return nil, fmt.Errorf("request failed with status: %d", resp.StatusCode)
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@coordinator/internal/controller/proxy/prover_session.go`:
- Line 271: Update the log message in the SubmitProof method to correct the
copy-paste error: replace the "Get Task first-try failed for broken token" text
with an accurate message such as "SubmitProof first-try failed for broken token"
(leave the existing structured fields like "up" and "errcode" and the log call
using log.Debug unchanged) so the log reflects the SubmitProof operation instead
of Get Task.
♻️ Duplicate comments (3)
coordinator/internal/controller/proxy/prover_session.go (3)
45-55: Guard nil persistent store to avoid panic in non-persistent mode.
NewProverManagerleavespersistentnil, but line 48 dereferences it without a nil check, which will cause a panic whenGetis called on a ProverManager created without persistence.🐛 Proposed fix
defer func() { - if ret == nil { + if ret == nil && m.persistent != nil { var err error ret, err = m.persistent.Get(userKey) if err != nil { log.Error("Get persistent layer for prover tokens fail", "error", err) } else if ret != nil { log.Debug("restore record from persistent", "key", userKey) ret.persistent = m.persistent } }
211-216: Guard nil persistent store before Update().
c.persistentcan be nil in non-persistent mode, causing a panic at line 213 when attempting to update the persistent layer.🐛 Proposed fix
if newtoken.phase > existedToken.phase { - if err := c.persistent.Update(param.PublicKey, up, newtoken.LoginSchema); err != nil { - log.Error("Update persistent layer for prover tokens fail", "error", err) + if c.persistent != nil { + if err := c.persistent.Update(param.PublicKey, up, newtoken.LoginSchema); err != nil { + log.Error("Update persistent layer for prover tokens fail", "error", err) + } } }
224-227: Remove prover tokens from log output.Line 225 logs the entire
c.proverTokenmap, which exposes authentication material. This should be removed for security.🔒 Proposed fix
c.RLock() - log.Debug("call get task", "up", up, "tokens", c.proverToken) + log.Debug("call get task", "up", up) token := c.proverToken[up] c.RUnlock()
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
common/go.mod (1)
15-15:⚠️ Potential issue | 🟠 MajorAvoid adding the archived
mitchellh/mapstructuremodule directly.This change promotes the old archived import path to a first-class dependency. The original repository has been read-only since July 22, 2024, and the Go vulnerability database tracks GHSA-fv92-fjc5-jj9h on the maintained fork, with fixed versions starting at
github.com/go-viper/mapstructure/v2@v2.3.0;v2.4.0is available. Please depend on the maintained v2 module and update the new imports instead of pinningv1.5.0here. (github.com)🔧 Suggested dependency change
- github.com/mitchellh/mapstructure v1.5.0 + github.com/go-viper/mapstructure/v2 v2.4.0#!/bin/bash # Verify remaining references to the archived module before/after migration. rg -n --type=go '"github\.com/mitchellh/mapstructure"' . rg -n 'github\.com/mitchellh/mapstructure' common/go.mod🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@common/go.mod` at line 15, The go.mod entry currently adds the archived module "github.com/mitchellh/mapstructure v1.5.0"; replace that dependency with the maintained v2 module "github.com/go-viper/mapstructure/v2" (prefer v2.4.0 or at least v2.3.0) and update all import paths in the codebase from "github.com/mitchellh/mapstructure" to "github.com/go-viper/mapstructure/v2" so imports include the /v2 module suffix; after updating go.mod and imports, run module resolution (go get / go mod tidy) to ensure the new v2 module is fetched and no references to the archived path remain.coordinator/internal/controller/proxy/prover_session.go (1)
280-282:⚠️ Potential issue | 🔴 CriticalStop logging the token cache.
c.proverTokencontains upstream auth material, so this debug line leaks credentials into logs. Log only the upstream name or a redacted summary.Safe logging change
- log.Debug("call get task", "up", up, "tokens", c.proverToken) + log.Debug("call get task", "up", up)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@coordinator/internal/controller/proxy/prover_session.go` around lines 280 - 282, The debug statement in the get-task path currently logs the full token cache (c.proverToken) which leaks upstream auth material; change the log.Debug call in the critical section around c.RLock()/token := c.proverToken[up] to stop printing c.proverToken and instead log only the upstream identifier (up) or a redacted/summary value (e.g., token count or masked string) so no secrets are emitted; update the call site that references c.proverToken in log.Debug to use the safe representation.
🧹 Nitpick comments (1)
coordinator/internal/controller/proxy/controller.go (1)
10-17: Consider making controller initialization explicitly one-shot.
InitControllerrewrites the package-level controllers every time it runs. If the proxy is initialized twice in one process, the previousClientManagerinstances are just dropped even though they still own cached completion state (coordinator/internal/controller/proxy/client_manager.go, Lines 29-40). Async.Onceor explicit shutdown/reset path would make that lifecycle safer.Also applies to: 24-44
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@coordinator/internal/controller/proxy/controller.go` around lines 10 - 17, InitController currently reassigns package-level variables GetTask, SubmitProof and Auth each time it's called which leaks previous ClientManager instances (see ClientManager in client_manager.go); change the initialization to be one-shot by guarding InitController with a sync.Once (or add an explicit shutdown/reset that calls a Close/Stop on existing ClientManager instances before reassigning) so you don't drop live state. Locate InitController and the package-level vars GetTask/SubmitProof/Auth and either wrap the init body with a sync.Once or implement and call a shutdown() that cleans up ClientManager resources before overwriting the controllers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@coordinator/internal/controller/proxy/controller.go`:
- Around line 31-33: The panic in the NewClientManager error path discards
context; locate the call to NewClientManager in controller.go (the variables nm,
cfg.ProxyManager.Client, upCfg) and replace the blind panic with returning or
logging the wrapped error that includes the coordinator name (nm) and the
original err (e.g., fmt.Errorf("NewClientManager failed for coordinator %s: %w",
nm, err)) or propagate the error up from the function instead of panicking so
the upstream-specific init error is preserved.
In `@coordinator/internal/controller/proxy/prover_session.go`:
- Around line 192-203: The maintainLogin defer currently only updates the
in-memory c.proverToken[up] and logs, so refreshes are lost on restart; modify
maintainLogin to also persist the refreshed login (call persistent.Update or the
coordinator's persistence API) inside the same critical section when
result.LoginSchema != nil, and ensure you copy the expiry from the
loginResult/login response into the LoginSchema you persist (preserve Token and
set Expired/expiry field) so persistent.Update receives the full token+expiry
instead of a zero value; apply the same change for the second similar block
(around the 249-254 area) so both refresh paths write the token+expiry to
persistent storage.
- Around line 101-117: The deferred restore/insert path can race and create
duplicate proverSession objects for the same userKey; before assigning
m.data[userKey] you must acquire the write lock and re-check whether an entry
already exists and use that existing session instead of unconditionally writing
ret. Concretely, inside the defer in prover_session.go (the block that calls
m.persistent.Get and then does m.Lock(); m.data[userKey] = ret; m.Unlock()),
change the flow to: Lock, check m.data[userKey] again, if present use the
existing session (and discard or merge ret as appropriate), otherwise set
m.data[userKey]=ret and set ret.persistent; then Unlock. Apply the same
re-check-under-lock pattern to the similar block around lines 129-149 that also
writes m.data.
---
Duplicate comments:
In `@common/go.mod`:
- Line 15: The go.mod entry currently adds the archived module
"github.com/mitchellh/mapstructure v1.5.0"; replace that dependency with the
maintained v2 module "github.com/go-viper/mapstructure/v2" (prefer v2.4.0 or at
least v2.3.0) and update all import paths in the codebase from
"github.com/mitchellh/mapstructure" to "github.com/go-viper/mapstructure/v2" so
imports include the /v2 module suffix; after updating go.mod and imports, run
module resolution (go get / go mod tidy) to ensure the new v2 module is fetched
and no references to the archived path remain.
In `@coordinator/internal/controller/proxy/prover_session.go`:
- Around line 280-282: The debug statement in the get-task path currently logs
the full token cache (c.proverToken) which leaks upstream auth material; change
the log.Debug call in the critical section around c.RLock()/token :=
c.proverToken[up] to stop printing c.proverToken and instead log only the
upstream identifier (up) or a redacted/summary value (e.g., token count or
masked string) so no secrets are emitted; update the call site that references
c.proverToken in log.Debug to use the safe representation.
---
Nitpick comments:
In `@coordinator/internal/controller/proxy/controller.go`:
- Around line 10-17: InitController currently reassigns package-level variables
GetTask, SubmitProof and Auth each time it's called which leaks previous
ClientManager instances (see ClientManager in client_manager.go); change the
initialization to be one-shot by guarding InitController with a sync.Once (or
add an explicit shutdown/reset that calls a Close/Stop on existing ClientManager
instances before reassigning) so you don't drop live state. Locate
InitController and the package-level vars GetTask/SubmitProof/Auth and either
wrap the init body with a sync.Once or implement and call a shutdown() that
cleans up ClientManager resources before overwriting the controllers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 911d9818-e443-4180-9a9e-6dbe41b75e8c
📒 Files selected for processing (6)
common/go.modcommon/observability/server.gocoordinator/cmd/proxy/app/app.gocoordinator/internal/controller/proxy/controller.gocoordinator/internal/controller/proxy/prover_session.gocoordinator/internal/controller/proxy/prover_session_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- coordinator/internal/controller/proxy/prover_session_test.go
- coordinator/cmd/proxy/app/app.go
lispc
left a comment
There was a problem hiding this comment.
almost fully new codes. approved
- Fix gas_oracle/app.go: number-1 underflow when L1 block number is 0 - Fix bridge_test.go: use require.NoError instead of assert.NoError in setupEnv/setupDB/prepareContracts to prevent nil pointer panic when preceding assertions fail
The PR has induced a "coordinator_proxy" app to support the provers cluster mode.
Designation: https://www.notion.so/scrollzkp/Almost-stateless-prover-proxy-2537792d22af80b08db3f8397e0824bd?d=29c7792d22af80ee98df001caf30b7e1#2537792d22af806880efc8cea92d0e19
Summary by CodeRabbit
New Features
Chores
Tests
Improvements