Protect addnode peers during IBD (p2p)

May 28, 2025

https://github.com/bitcoin/bitcoin/pull/32051

The PR branch HEAD was 93b07997e9a38523f5ab850aa32ca57983fd2552 at the time of this review club meeting.

Notes

Motivation

While doing initial block download (IBD) over a fluctuating and slow internet connection in El Salvador, I observed very frequent peer disconnections in the debug log, on the order of 100+ per hour. These disconnections were often of manually added “addnode” peers, and logged as Peer is stalling block download, disconnecting <peer>. Ping requests to these peers often took 20-100 seconds.

Even after IBD was completed, addnode peer disconnections still happened:

Timeout downloading block <hex>, disconnecting <peer>

Discussion

When an addnode peer is disconnected by the IBD headers/blocks download timeout or stalling logic, ThreadOpenAddedConnections attempts to immediately reconnect it – unless “onetry” was passed to the addnode RPC – up to the limit of 8 addnode connections. This limit is separate from the regular peer connection limits.

ThreadOpenAddedConnections will continue to attempt reconnection of the disconnected addnode peer until it succeeds.

When these disconnection/reconnection cycles happen frequently with addnode peers, it is likely network, resource and time intensive. This is particularly true for I2P peers, as these involve destroying and rebuilding 2 tunnels for each peer connection. It seems worth avoiding this if it is straightforward to do so.

Automatic (non-addnode) peers are also disconnected by the same logic, but they are a different category and case (non-protected peers, no immediate connection/reconnection) that would require monitoring over time to adjust the timeouts accordingly. Martin Zumsande was looking into optimizing this (see https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2025-01-22#1083993): “The challenge is to distinguish this situation from making things worse for fast/reliable connections that just have some slow peers which should be disconnected.”

The goal of this pull request is thus to avoid unnecessary frequent disconnections and immediate reconnections of addnode peers, both during IBD and afterwards.

Approach

The first commit, “p2p: protect addnode peers during IBD”, provides addnode peers the max BLOCK_STALLING_TIMEOUT_MAX value of 64 seconds for the IBD stalling logic (“Peer is stalling block download”) in src/net_processing.cpp.
The second commit, “p2p: don’t disconnect addnode peers for block download timeout”, proposes to protect addnode peers from disconnection. Review feedback suggested that we also clear their block requests, so that these blocks can be requested from other peers.
The third commit, “p2p: don’t disconnect addnode peers for slow initial-headers-sync”, proposes the same protection for addnode peers that we currently already provide to peers with NetPermissionFlags::NoBan permission.
The fourth commit, “rpc, doc: update addnode documentation”, updates the RPC addnode help documentation.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
What is an addnode peer? How can you specify them to your node?
Do you provide addnode peers to the node(s) that you run? Why or why not? What kind of peers do you choose?
What is a fundamental protection that addnode peers can provide to your node?
What do you think of the review suggestion to rate peers and use that to scale the number of blocks we request from them?
One reviewer suggested clearing block requests instead of disconnecting peers. How would you implement this?
How is the test coverage for the code affected by this change? Can you think of any tests that would be worthwhile adding?

Erratum

Attendee stringintech correctly identified that the addnode code was introduced by Satoshi, namely in commit e66ec79b.