Block-relay-only is a type of outbound P2P connection: the node initiates a P2P connection to
another peer with fRelay set to false in the version message, indicating that it does not wish
to receive unconfirmed transactions. The node will also not relay addrs with this peer.
By default, nodes maintain connections to two block-relay-only peers (introduced in PR #15759).
PR #19858 introduced a new routine to
regularly initiate temporary, block-relay-only connections with new peers and sync headers, in
an effort to make eclipse attacks more difficult. We discussed this PR in a previous review club,
#19858.
PR #23614 adds a unit test for the extra
block-relay-only eviction logic.
What is MINIMUM_CONNECT_TIME? Why do we wait a period of time before considering a peer for
eviction?
The unit test
checks
the fDisconnect values of peers in order to test eviction logic, and
prevents the peer from being disconnected
by setting fDisconnect to false. Why is this appropriate? Why don’t we wait for the peers to be
disconnected, like in the
p2p_eviction.py
functional test?
<glozow> first question: `EvictExtraOutboundPeers()` selects outbound peers to evict when the number of connections exceeds our limit. Is this normal? How can this happen?
<ccdle12> I think we add an extra peer that;s one beyond the limit, while we compare it with our existing outbound peers and drop it's not a better choice for eviction
<stickies-v> I suppose it also could happen if we change the number of allowed outbound peers, but I'm not sure if that's possible without restarting bitcoind?
<glozow> stale tip is a good example - if we haven't heard about anything for a while, we'll send out a feeler to see if somebody else knows about new blocks
<glozow> andrewtoth_: hm i don't think so but i'm not sure. i assume that we're not making those types of feeler connections with the intent that we might replace one of our current outbounds with them. idk 🤷
<stickies-v> These extra block-relay-only peers from #19858 are (initially) short-lived additional connections made to make eclipse attacks more difficult by quickly checking if this new peer has previously unheard of blocks, and then disconnecting if that's not the case (i.e. no eclipsing is happening)
<glozow> raj: you're right that they wouldn't disconnect us for a transaction unless they explicitly also told us not to relay them. but we don't relay transactions to block-relay-only peers
<stickies-v> raj I think that's exactly the point of first exchanging version and verack messages when connecting to another peer - it ensures that both parties know what the other node can and wants to do, so there can't be any confusion about who's block-relay-only and who isn't
<glozow> Ok, and we've already answered the other question, _extra_ block-relay-only peers are those additional connections made to help prevent eclipse attacks, from #19858
<ccdle12> we also make sure the existing node we are going to evict has been connected for at least the minimum amount of time connected and also that there are no current blocks in flight between us
<ccdle12> glozow: I think to give a connection a reasonable amount of time to gossip/send information? (could be a scenario where we would be essentially "thrashing" between connections and not necessarily sharing information)
<stickies-v> oh because in an eclipse attack you wouldn't have heard about new blocks that were produced in the past, so that's why you can just quickly connect to a new peer and verify if you're missing anything, and then if it's all looking good you just disconnect again
<stickies-v> yeah I think so. And I think there's a security aspect to having a relatively high minimum time (e.g. 30 seconds) as well, since if as an attacker you could somehow slow down your peers connections it would be easier to just have them all expire before new blocks were communicated?
<ccdle12> I suppose there could be an optimization on the time for block relay only, since 30 seconds is also applied to full outbound and they would be communicating msgs that we would be more frequent. But I guess hard to prove what would be optimum?
<glozow> i'm not sure if this should be different for full and block-relay-only. it's basically a time for "we assume if they had something useful to say they would've said it by now"
<glozow> The unit test checks the fDisconnect values of peers in order to test eviction logic, and prevents the peer from being disconnected by setting fDisconnect to false. Why is this appropriate? Why don’t we wait for the peers to be disconnected, like in the p2p_eviction.py functional test?
<raj> Because its unit test? We just need to check that eviction switch is triggered, and not wait for real eviction. Also there is no real nodes actually in unit tests i think?
<glozow> In a normal running node, how often is `CheckForStaleTipAndEvictPeers` executed? In unit tests, can we trigger it to run automatically by using `SetMockTime()` to fast-forward?
<glozow> how does `SetMockTime()` work? :) did anyone try modifying the test to setmocktime much further ahead, and see if CheckForStaleTipAndEvictPeers happened?