Integrate BNB tss-lib hardening#2
Conversation
The session-tagged ModChallenge path reduced a single 256-bit SHA512_256i_TAGGED output modulo Paillier N (~2^2048), producing challenges in [0, 2^256) instead of [0, N). The session-tagged path shares a verifier with the legacy HashToN path; the truncated challenges gave it a strictly weaker distribution than the path it replaced. Introduce HashToNTagged in common/hash_utils.go (tagged-hash analogue of HashToN) and switch the session-tagged branch of ModChallenge to it. The non-session path is unchanged. The iteration chaining of y[:i] into each y_i derivation is preserved. The session-tagged ModProof form was introduced by this PR and has not been deployed; no in-the-wild verifier is broken by this change. Add TestModChallenge_SessionPath_NotTruncated pinning y_i.BitLen() > N.BitLen()/2, and TestModChallenge_SessionPath_ChainsPreviousChallenges pinning the sequential-challenge invariant.
The signing constructors accepted a variadic fullBytesLen ...int and stored fullBytesLen[0] verbatim with no bounds check. A caller passing a negative value would later panic in make([]byte, fullBytesLen), and a caller passing a value smaller than the message's byte width would panic in (*big.Int).FillBytes with "insufficient length". Both panics fired inside a round-1 goroutine, crossing goroutine boundaries and bypassing the tss.Error reporting that the protocol state machine expects. Validate at constructor entry, before any state allocation: reject fullBytesLen < 0 and reject fullBytesLen > 0 with a message whose byte width exceeds it. Panic synchronously with a clear, parameter- named message at the caller's call site instead. The variadic shape is preserved (source-compat), so existing callers that omit fullBytesLen are unaffected. Mirror the fix in ECDSA NewLocalPartyWithKDD and EdDSA NewLocalParty. Add regression tests pinning both negative cases for both protocols. The valid positive path is covered by the existing E2E tests (which omit fullBytesLen); a future cherry-pick of upstream's TestE2EConcurrentWithLeadingZeroInMSG will exercise a valid fullBytesLen end-to-end.
Round 1 of both ECDSA and EdDSA signing fell back to SHA512_256(messageBytes) for the SSID nonce when the caller did not set one. Two concurrent ceremonies on the same canonical message therefore derived the same SSID, and after AppendUint64ToBytesSlice also the same per-party Fiat-Shamir context — enabling transcript splicing between the runs (CWE-294). Re-signing the same payload under the same committee is a common production pattern (e.g. tBTC-style sighash retries), so this silently broke the very property the BNB SSID hardening was meant to add. Return tss.Error from round 1 when SessionNonce is nil, naming the constraint and the setter the caller must call. Remove the now-dead messageSessionNonce + messageBytes helpers so the fallback can't be silently reintroduced. Keygen/resharing fall-back to zero is unchanged (tracked separately). Update the two ECDSA E2E tests and the EdDSA E2E test to call params.SetSessionNonce before constructing parties. Add TestSigning_Start_RequiresSessionNonce for each protocol pinning the fail-closed behavior. Document the new requirement on tss.Parameters.SetSessionNonce and flip the corresponding line in BNB_HARDENING_INTEGRATION.md from "signing defaults to message hash as nonce" to the new contract.
Upstream BNB commit fc38979 puts SSID on the wire in DGRound1Message and runs bytes.Equal(SSID, SSIDj) across all old-committee parties in the new committee's round 2. The PR strip-and-derive-locally lost this: each new-committee party derives SSID locally, so a corrupted old-committee party that broadcasts inconsistent SSIDs to different new-committee members (or whose SessionNonce diverges from the rest of the old committee) only surfaces as a downstream proof verification failure several rounds later, with no specific abort message naming the cause. Add `bytes ssid = 4` to DGRound1Message. Derive SSID in round 1 for both committees (using public inputs: party IDs, curve, round number, ssidNonce). Old-committee parties broadcast their SSID in the existing DGRound1Message; new-committee parties cross-check each received SSID against their locally-derived SSID before consuming any other field of the message, returning a tss.Error that names the sender if a mismatch is detected. SSID derivation is removed from round 2 since temp.ssid is set in round 1. ValidateBasic on DGRound1Message now requires the SSID field. EdDSA resharing has no SSID derivation today, so no changes are needed there. ecdsa-resharing.pb.go regenerated with protoc-gen-go v1.30.0 (the version pinned in the generated header); the only meaningful drift beyond the new field is the protoc binary version comment (v3.21.12 → v4.25.1) — both protoc versions emit identical wire format. Update ecdsa/resharing/local_party_test.go and eddsa/resharing/ local_party_test.go to call params.SetSessionNonce for the signing phase (made required by the preceding fail-closed fix). Add TestDGRound1Message_ValidateBasic_RequiresSsid pinning the message-format invariant. The E2E ceremony (TestE2EConcurrent) covers the broadcast → cross-verify round-trip end-to-end; a Byzantine-simulator regression test for the mismatch-detection branch is tracked under testing-plan Phase 5.
ECDSA keygen, EdDSA keygen, and ECDSA resharing all fell back to a zero ssidNonce when the caller did not set SessionNonce. The SSID derivation then collapsed to a single canonical value across any two ceremonies over otherwise-identical committees, neutralising the per-session proof-context binding that the BNB hardening was meant to add. Combined with the wire-format SSID broadcast added in the preceding commit, a zero-nonce fallback would also make the cross-old-committee SSID equality check trivially pass even when parties never explicitly agreed on a session. Return tss.Error from round 1 of all three protocols (ECDSA keygen, EdDSA keygen, ECDSA resharing) when SessionNonce is nil, naming the constraint and the setter the caller must call. EdDSA resharing has no SSID derivation today and is unchanged. Update the existing E2E ceremonies (ecdsa/keygen, ecdsa/resharing, eddsa/keygen) and the smaller keygen tests (TestStartRound1Paillier, TestFinishAndSaveH1H2, TestBadMessageCulprits) to call params.SetSessionNonce before constructing parties. Add TestKeygen_Start_RequiresSessionNonce for both protocols and TestResharing_Start_RequiresSessionNonce for ECDSA pinning the fail-closed behaviour. Update tss.Parameters.SetSessionNonce godoc and BNB_HARDENING_INTEGRATION.md so the disclosure reflects the new contract: keygen, resharing, and signing all require an explicit per-ceremony nonce.
Adds a regression test ensuring per-party Fiat-Shamir context derivation never collapses to the bare SSID, even for party index 0, and that distinct party indices produce distinct contexts. Guards against a future "skip leading zeros" optimization that would re-introduce the party-0 context collision the BNB hardening fixed.
|
Temporarily closing and reopening to trigger newly bootstrapped GitHub Actions checks. |
|
Reopened after bootstrapping Actions workflows on master to trigger CI for this head. |
# Conflicts: # .github/workflows/gofmt.yml
| if _, err := state.Write(tagBz); err != nil { | ||
| Logger.Errorf("SHA512_256i_TAGGED Write(tag) failed: %v", err) | ||
| return nil | ||
| } | ||
| if _, err := state.Write(tagBz); err != nil { | ||
| Logger.Errorf("SHA512_256i_TAGGED Write(tag) failed: %v", err) | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Not a bug, but a potential code robustness issue - state.Write should never return an error, so conditions inside the two ifs here are unreachable code. We do, however, suggest with them that SHA512_256i_TAGGED may return a nil value, and this path is never handled in the code using this function - in the DLN proof path, a direct .Bit() call would panic, and in the rejection sample path, the .Mod() call would panic.
That said, instead of suggesting SHA512_256i_TAGGED may return nil, I would panic here if state.Write returns an error which, by the Go spec, is not possible. Even if this somehow magically happens, we will panic in the right place, not later when doing .Bit() or .Mod().
| // SHA512_256i_TAGGED is a domain-separated variant of SHA512_256i. The tag is | ||
| // hashed and prepended twice, matching the tagged-hash construction used by BNB | ||
| // upstream for proof challenges. |
There was a problem hiding this comment.
| // SHA512_256i_TAGGED is a domain-separated variant of SHA512_256i. The tag is | |
| // hashed and prepended twice, matching the tagged-hash construction used by BNB | |
| // upstream for proof challenges. | |
| // SHA512_256i_TAGGED is a domain-separated variant of SHA512_256i. The tag is | |
| // hashed and prepended twice. |
Most of the code matches the BNB implementation, there is nothing special about this function. 😉
| if _, err := state.Write(data); err != nil { | ||
| Logger.Errorf("SHA512_256i_TAGGED Write(data) failed: %v", err) | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Similar comment here - state is a hash.Hash and state.Write should never return an error. I would panic here if that happens instead of having the downstream code panic because of a nil value returned.
| badZ := *proof | ||
| badZ.Z = big.NewInt(1) | ||
| assert.False(t, badZ.Verify(tss.EC(), pk, NTildei, h1i, h2i, c), "Z equal to one must fail") | ||
| } |
There was a problem hiding this comment.
We are missing a test to cross-verify the proofs. Here in BNB version, independent proof/key pairs are generated and verified that a proof from party 0 fails against party 1's parameters, and vice versa. We do test the happy path, session bindings, and malformed parameters, but we do not test the negative path with non-malformed parameters.
There was a problem hiding this comment.
I would also consider whether we do not want to copy existing tests from BNB, like TestProveRangeAliceBypassed, even if we add ours. We'll keep the two codebases closer and it will be easier to port any future changes between them.
| @@ -0,0 +1,79 @@ | |||
| # BNB Hardening Integration Report | |||
There was a problem hiding this comment.
Probably something unintentionally committed?
| func (pf *ProofBobWC) Verify(ec elliptic.Curve, pk *paillier.PublicKey, NTilde, h1, h2, c1, c2 *big.Int, X *crypto.ECPoint, session ...[]byte) bool { | ||
| Session := optionalProofSession(session) | ||
| if pf == nil || pf.ProofBob == nil || !pf.ProofBob.ValidateBasic() || | ||
| (X != nil && (pf.U == nil || !pf.U.ValidateBasic())) || |
There was a problem hiding this comment.
There is a confusing pattern coming from the original BNB implementation with ProofBobWC.Verify serving double duty. When X is nil, the call comes from ProofBob.Verify with U and X intentionally nil.
At the same time, we'd ideally call ProofBobWC.ValidateBasic() first, as it's a pattern used by other proofs, and frankly it makes sense to do basic validation first, just like we do with if pf == nil || pf.ProofBob == nil checks first. Also, ProofBobWC.ValidateBasic() already does pf.U != nil validation.
So what about...
if pf == nil || pf.ProofBob == nil {
return false
}
// ProofBob.Verify delegates here with X=nil and U=nil so only validate U
// when the call is for ProofBobWC so X is present.
if X != nil {
if !pf.ValidateBasic() {
return false
}
} else if !pf.ProofBob.ValidateBasic() {
return false
}
and then
func (pf *ProofBobWC) ValidateBasic() bool {
return pf.ProofBob.ValidateBasic() && pf.U != nil && pf.U.ValidateBasic()
}
And we might want to backport this change to the BNB repo.
| // – z_i^N = y_i for every i ∈ [m] | ||
| // – x_i^4 = (-1)^a_i * w^b_i * y_i mod N and a_i, b_i ∈ {0, 1} for every i ∈ [m]. | ||
| func (pf ModProof) ModVerify(N *big.Int) (bool, error) { | ||
| func (pf ModProof) ModVerify(N *big.Int, session ...[]byte) (bool, error) { |
There was a problem hiding this comment.
The BNB implementation has three positivity checks we are missing. I don't think it's directly exploitable as mod N operations will normalize them later, but doing the static range validation of inputs first feels safer. It also brings us closer to BNB implementation for any further cross-repo refinements.
So instead of just common.Lt(..., N) checks for pf.W, pf.X[i] and pf.Z[i], we would also perform > = 1 checks, like
if !common.Gt(pf.W, zero) || !common.Lt(pf.W, N) {
return false, fmt.Errorf("mod proof verify: w must be in [1, N), got %d", pf.W)
}
y := ModChallenge(N, pf.W, session...)
for i, yi := range y {
if !common.Gt(pf.X[i], zero) || !common.Lt(pf.X[i], N) {
return false, fmt.Errorf("mod proof verify: x_%d must be in [1, N), got %d", i, pf.X[i])
}
if !common.Gt(pf.Z[i], zero) || !common.Lt(pf.Z[i], N) {
return false, fmt.Errorf("mod proof verify: z_%d must be in [1, N), got %d", i, pf.Z[i])
}
(...)
| // – z_i^N = y_i for every i ∈ [m] | ||
| // – x_i^4 = (-1)^a_i * w^b_i * y_i mod N and a_i, b_i ∈ {0, 1} for every i ∈ [m]. | ||
| func (pf ModProof) ModVerify(N *big.Int) (bool, error) { | ||
| func (pf ModProof) ModVerify(N *big.Int, session ...[]byte) (bool, error) { |
There was a problem hiding this comment.
BNB version adds a GCD check to detect a case for fraudulent N and W. Our check yielding an error if big.Jacobi(pf.W, N) != -1 is stronger than their check yielding an error only if jacobi(W, N ) == 1 so we should be fine, but I'd port the test from this commit to ensure we will not introduce a regression by trying to pair with their == 1 check without adding an extra GCD condition.
| // N.BitLen() + 256 bits of entropy before reducing mod N. Reducing a single | ||
| // 256-bit SHA512_256i_TAGGED output mod ~2^2048 would emit challenges in | ||
| // [0, 2^256) instead of [0, N), giving the session-tagged path a strictly | ||
| // weaker challenge distribution than the legacy HashToN path it shares the |
There was a problem hiding this comment.
Is the HashToN path legacy, or is it just a path without a session - a different option, valid for specific use cases? The comment might be misleading if the latter is true.
| assert.Error(t, err) | ||
| assert.False(t, res, "proof verify result must be false") | ||
| }) | ||
| } |
There was a problem hiding this comment.
Nice addition of the FactorVerify function. We could cover more cases than just the Q field and test them on non-zero values as well.
There was a problem hiding this comment.
FWIW, there is a new upstream PR that may be relevant: bnb-chain#332
There was a problem hiding this comment.
Loose idea for follow-up work: Maybe a good move would be to remove all the code that is not used by keep-core, i.e. the whole eddsa and ecdsa/resharing packages? There are at least several issues in those packages that are inherited from upstream, for example:
- Resharing round 4 omits Paillier N and NTilde bit-length checks present in keygen, allowing weak-modulus key substitution ->
ecdsa/keygen/round_2.go:53-61rejects any peer-broadcast Paillier modulus or NTilde whose BitLen is not exactly paillierBitsLen (2048). The analogous loop inecdsa/resharing/round_4_new_step_2.goperforms DLN and mod-proof checks, h1!=h2, h1/h2 uniqueness — but NO BitLen validation of paiPK.N or NTildej. A malicious new-committee party can therefore broadcast a 1024-bit factorable N / a smooth-factor NTildej, generate valid DLN/Mod proofs over it (the proofs do not certify size), and have honest new-committee verifiers persist that weak key. This may open the door for a class of attacks that could lead to secret recovery given 1024-bit keys are considered vulnerable nowadays. - EdDSA resharing has no SSID-bound proof transcript in this port as the "Residual Risks" section of
BNB_HARDENING_INTEGRATION.mdmentions.
... and probably more. Getting rid of those packages greatly limits the potential attack surface, and will make the threshold-network/tss-lib far easier to maintain. Moreover, we avoid dead code with known problems to bite us one day or bite another protocols that will leverage the affected packages of this fork.
d898b16 to
f2a959a
Compare
Doc + small defensive-code follow-ups to the PR #2/#4/#5 review thread. None of these are security-blocking; they close out the cosmetic and pre-existing items that did not warrant their own PR earlier. - common/hash_utils.go: strengthen RejectionSample doc so the name does not mislead future readers — the function is modular reduction, not true rejection sampling, and the bias bound depends on q vs the hash width. - crypto/paillier/factor_proof.go: document that the tagged and legacy FactorChallenge paths emit different challenge distributions (positive-only vs signed), and note that FactorVerify's CmpAbs bounds exist to accommodate the legacy signed encoding. - tss/params.go: expand SetSessionNonceBytes docstring with the collision/uniqueness/entropy guidance reviewers asked for. - ecdsa/{keygen,signing}/rounds.go: document the round-1-capture invariant for getSSID so future refactors do not silently drift the hashed round.number domain separator. - crypto/ecpoint.go: flag SetCurve's in-place mutation in the doc comment; the chained-call style is a footgun on shared points. - crypto/vss/feldman_vss.go: reject nil shares, nil/zero share IDs, and duplicate IDs in ReConstruct before the Lagrange loop, so malformed inputs return an error instead of panicking through ModInverse(0). Add focused test coverage for each rejection path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BNB Hardening Integration Report
Scope
2e712689cfbeefede15f95a0ec7112227d86f7023f677ff761fcf692edb0243a5d812930844d879aafbe264b44b63155a864dbe0171040c66e442963ModProof/FactorProof.ae7075f3409e.Compatibility Notice
This is a protocol/wire compatibility break for proof transcripts. Proofs whose Fiat-Shamir challenges now use tagged hashing or session context will not verify across mixed old/new versions, even where the Go API remains source-compatible through variadic arguments. Operators should roll this out as a coordinated protocol upgrade rather than mixing parties from before and after this PR in the same keygen, signing, or resharing ceremony.
What This Ports
See
BNB_HARDENING_INTEGRATION.mdin this branch for the detailed commit-by-commit integration notes. In short, this PR manually adapts the applicable BNB hardening work for:ModProofandFactorProofhardening;tss.Parameters.SessionNonce/SetSessionNonce/SetSessionNonceBytes;Post-Review Tightening
Follow-up commits also address reviewer findings from pdyraga, Claude, and Codex:
SetSessionNoncenow rejects nil, zero, and negative nonces; round-1 guards also reject nonpositive nonce state.SetSessionNonceBytesnow requires at least 16 bytes before hashing an application session ID.fullBytesLenas a variadic argument for source compatibility, but exactly one positive value is required at runtime, it must fit the message, and it must not exceed the curve-order byte length. This rejects over-wide leading-zero-padded ECDSA messages before signing starts.DGRound1Message.ssidvalidation now requires the fixed 32-byte SSID width; SSID derivation returns fixed-width hashes.UnFlattenECPointserrors before applyingEightInvEight.Shares.ReConstructrejects an empty share slice before indexing.HashToNTagged,RejectionSample, empty session tags, fullBytesLen validation, nonce validation, and verifier bounds.Semantic Differences From BNB
ModProofandFactorProofremediation was retained. No BNB no-proof escape hatches were introduced.fullBytesLensource compatibility, but now require exactly one value at runtime so all signers agree on message byte width before the protocol starts.Parameters.SessionNonce(). Callers must provide a unique agreed positive nonce, for example viaSetSessionNonceBytes, for every ceremony.DGRound1Messageso the new committee can reject old-committee broadcasts whose SSID differs from the local protocol context.common.RejectionSamplekeeps BNB's function name for porting clarity, but this implementation is modular reduction rather than a looping rejection sampler.Validation
Earlier validation from the branch history:
go test ./crypto/... ./ecdsa/keygen ./ecdsa/signing ./eddsa/signinggo test ./eddsa/keygen ./eddsa/resharinggo test ./ecdsa/resharinggo test ./common ./crypto/paillier ./crypto/mta ./ecdsa/keygen ./ecdsa/resharing ./ecdsa/signing ./eddsa/keygen ./eddsa/signing ./eddsa/resharinggo test ./...go vet ./...Post-review validation on the final branch head:
go test -count=1 ./common ./tss ./crypto/dlnproof ./crypto/mta ./crypto/paillier ./crypto/vss ./ecdsa/signing ./ecdsa/resharing ./ecdsa/keygen ./eddsa/signing ./eddsa/keygen ./eddsa/resharingpassed. The EdDSA keygen fixture-save test failed only because fixture files already existed, while the other selected packages completed; see the narrower reruns below.go test -count=1 ./common ./tss ./crypto/dlnproof ./crypto/mta ./crypto/paillier ./crypto/vss ./ecdsa/signing ./ecdsa/resharing ./eddsa/signing ./eddsa/resharingpassed.go test -count=1 ./ecdsa/keygen -run 'TestSSIDIncludesSessionNonce|TestKeygen_Start_RequiresSessionNonce|TestE2EConcurrent'passed.go test -count=1 ./eddsa/keygen -run 'TestKeygen_Start_RequiresSessionNonce'passed.go test -count=1 -timeout 25m ./crypto/paillierpassed.git diff --checkpassed.Go fmt projectpassed andTestis running for the final branch head.Residual Risks
SetSessionNonceorSetSessionNonceBytesbefore keygen, signing, and ECDSA resharing; those protocols now fail closed without it.fullBytesLento every signing party in a ceremony.