Skip to content

fix(eth): make paired peer unregister instance-aware#2351

Open
gzliudan wants to merge 1 commit into
XinFinOrg:dev-upgradefrom
gzliudan:fix-peer-removal
Open

fix(eth): make paired peer unregister instance-aware#2351
gzliudan wants to merge 1 commit into
XinFinOrg:dev-upgradefrom
gzliudan:fix-peer-removal

Conversation

@gzliudan
Copy link
Copy Markdown
Collaborator

@gzliudan gzliudan commented May 16, 2026

Proposed changes

Handle paired eth peer teardown by instance so dropping the secondary connection clears pair state without removing the primary peer from the set.

Synchronize pairRw access to avoid torn interface reads during concurrent send and unregister paths, and cover the lifecycle with regression tests.

fix panic:

INFO [05-14|17:14:13.116] [voteHandler] Vote pool threashold reached: true, number of items in the pool: 21
INFO [05-14|17:14:13.117] Successfully commit and confirm block from continuous 3 blocks num=81,945,965 round=25662942   hash=f9a13f..81aa74
INFO [05-14|17:14:13.117] [setNewRound] new round and reset pools and workers round=25662945
INFO [05-14|17:14:13.117] Successfully processed the vote and produced QC! QcRound=25662944 QcNumOfSig=21 QcHash=2b4bbf..54f3bc QcNumber=81,945,967
INFO [05-14|17:14:13.117] [voteHandler] time cost from receive first vote under QC create elapsed=70.225291ms
INFO [05-14|17:14:14.780] Register peer                            id=f3c2f7d28018e67d conn=inbound nodeid=f3c2f7d28018e67d1e0fbc70eb4b539facf92e6d2c02ffdc98124e76e6ff8310bc8a4a14d58404dda2d9ca767c5f99ef354749133a68048d3727255170c404de version=100 addr=144.126.142.140:41902
INFO [05-14|17:14:15.075] [voteHandler] set vote pool time         round=25662945
INFO [05-14|17:14:15.078] Imported new chain segment               blocks=1   txs=0    mgas=0.000  elapsed=1.065ms         mgasps=0.000   number=81,945,968 hash=fefc22..b0e749 dirty=0.00B
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xc083ab]

goroutine 69688380 [running]:
github.com/XinFinOrg/XDPoSChain/p2p.(*protoRW).WriteMsg(0x1?, {0x4, 0x80b, {0x19c2ca0, 0xc05385c630}, {0x0, 0x0, 0x0}, {{0x0, 0x0}, ...}, ...})
        github.com/XinFinOrg/XDPoSChain/p2p/peer.go:425 +0x2b
github.com/XinFinOrg/XDPoSChain/p2p.Send({0x7fa0b8085e60, 0x0}, 0x4, {0x132fa40?, 0xc057a16408?})
        github.com/XinFinOrg/XDPoSChain/p2p/message.go:100 +0x9d
github.com/XinFinOrg/XDPoSChain/eth.(*peer).SendBlockHeaders(0xc00005b320?, {0xc04a8b72d8, 0x1, 0x1})
        github.com/XinFinOrg/XDPoSChain/eth/peer.go:275 +0x16c
github.com/XinFinOrg/XDPoSChain/eth.(*ProtocolManager).handleMsg(0xc0169662c0, 0xc037563700)
        github.com/XinFinOrg/XDPoSChain/eth/handler.go:508 +0x3579
github.com/XinFinOrg/XDPoSChain/eth.(*ProtocolManager).handle(0xc0169662c0, 0xc037563700)
        github.com/XinFinOrg/XDPoSChain/eth/handler.go:405 +0xa7b
github.com/XinFinOrg/XDPoSChain/eth.NewProtocolManager.func1(0x1609318?, {0x19c95e8?, 0xc05f580380?})
        github.com/XinFinOrg/XDPoSChain/eth/handler.go:179 +0x1bc
github.com/XinFinOrg/XDPoSChain/p2p.(*Peer).startProtocols.func1()
        github.com/XinFinOrg/XDPoSChain/p2p/peer.go:391 +0x3d
sync.(*WaitGroup).Go.func1()
        sync/waitgroup.go:239 +0x4a
created by sync.(*WaitGroup).Go in goroutine 69688377
        sync/waitgroup.go:237 +0x73

Types of changes

What types of changes does your code introduce to XDC network?
Put an in the boxes that apply

  • build: Changes that affect the build system or external dependencies
  • ci: Changes to CI configuration files and scripts
  • chore: Changes that don't change source code or tests
  • docs: Documentation only changes
  • feat: A new feature
  • fix: A bug fix
  • perf: A code change that improves performance
  • refactor: A code change that neither fixes a bug nor adds a feature
  • revert: Revert something
  • style: Changes that do not affect the meaning of the code
  • test: Adding missing tests or correcting existing tests

Impacted Components

Which parts of the codebase does this PR touch?
Put an in the boxes that apply

  • Consensus
  • Account
  • Network
  • Geth
  • Smart Contract
  • External components
  • Not sure (Please specify below)

Checklist

Put an in the boxes once you have confirmed below actions (or provide reasons on not doing so) that

  • This PR has sufficient test coverage (unit/integration test) OR I have provided reason in the PR description for not having test coverage
  • Tested on a private network from the genesis block and monitored the chain operating correctly for multiple epochs.
  • Provide an end-to-end test plan in the PR description on how to manually test it on the devnet/testnet.
  • Tested the backwards compatibility.
  • Tested with XDC nodes running this version co-exist with those running the previous version.
  • Relevant documentation has been updated as part of this PR
  • N/A

Copilot AI review requested due to automatic review settings May 16, 2026 17:26
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e915310d-62ff-4163-97a8-fba9eeb6ecb7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes cleanup for “paired” eth peers (two connections sharing the same peer ID) so that removing the secondary connection doesn’t accidentally remove the primary peer or leave a stale pairRw behind. It also adds regression tests to ensure paired-peer cleanup works and that header serving continues after the paired connection drops.

Changes:

  • Update ProtocolManager peer removal to unregister by peer instance (not just by ID), and only unregister from the downloader when removing the primary connection.
  • Extend peerSet removal logic to clear pairing state (pairRw and PairPeer) and add UnregisterPeer(*peer) to support instance-based removal.
  • Add regression tests for paired peer cleanup and for serving block headers after the paired connection disconnects.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
eth/peer.go Adds UnregisterPeer(*peer) and clears pairing state during unregister to avoid stale pairRw and incorrect removals.
eth/handler.go Switches removal to instance-based (removePeer(*peer)), adds removePeerByID wrapper for downloader/fetcher callbacks.
eth/peer_test.go Adds a unit test ensuring unregistering a paired instance keeps the primary and clears pair state.
eth/handler_test.go Adds an integration-style test ensuring headers are still served after the paired connection drops.

Comment thread eth/peer.go Outdated
Comment thread eth/peer.go Outdated
@gzliudan gzliudan changed the title fix(eth): clear stale pair writers on peer removal fix(eth): clean up paired peers by instance May 16, 2026
@gzliudan gzliudan force-pushed the fix-peer-removal branch from f18843c to 4926447 Compare May 16, 2026 17:57
@gzliudan gzliudan requested a review from Copilot May 16, 2026 17:57
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

eth/peer.go:583

  • Same race/tearing risk exists in this code path: current.pairRw = nil (and the earlier existPeer.pairRw = p.rw assignment in Register) are not synchronized with reads in Send* methods. Even if the peerSet lock is held, send paths don’t take that lock, so this doesn’t prevent concurrent access. Consider encapsulating pair-writer access behind atomic load/store helpers or a dedicated lock on peer to avoid data races and typed-nil panics.
	if current == p {
		current.pairRw = nil
		delete(ps.peers, p.id)
		return nil
	}
	if current.ClearPairPeer(p.Peer) {
		p.ClearPairPeer(current.Peer)
		current.pairRw = nil
		return nil

Comment thread eth/peer.go Outdated
@gzliudan gzliudan force-pushed the fix-peer-removal branch from 4926447 to b862d5a Compare May 16, 2026 18:23
@gzliudan gzliudan changed the title fix(eth): clean up paired peers by instance fix(eth): make paired peer unregister instance-aware May 16, 2026
@gzliudan gzliudan requested a review from Copilot May 16, 2026 18:26
@gzliudan gzliudan force-pushed the fix-peer-removal branch from b862d5a to f3fc79e Compare May 16, 2026 18:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

Comment thread test_output.txt Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@gzliudan gzliudan force-pushed the fix-peer-removal branch 4 times, most recently from 92ac72a to 040bff6 Compare May 27, 2026 02:37
@gzliudan gzliudan requested a review from Copilot May 27, 2026 02:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@gzliudan gzliudan force-pushed the fix-peer-removal branch 16 times, most recently from 143b44e to 616c64b Compare May 27, 2026 10:21
@gzliudan gzliudan requested a review from Copilot May 27, 2026 10:30
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Handle paired eth peer teardown by instance so dropping the secondary connection clears pair state without removing the primary peer from the set.

Synchronize pairRw access to avoid torn interface reads during concurrent send and unregister paths, and cover the lifecycle with regression tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants