Add unit test for block-relay-only eviction (tests)

Dec 1, 2021

https://github.com/bitcoin/bitcoin/pull/23614

Host: glozow - PR author: mzumsande

Notes

Block-relay-only is a type of outbound P2P connection: the node initiates a P2P connection to another peer with fRelay set to false in the version message, indicating that it does not wish to receive unconfirmed transactions. The node will also not relay addrs with this peer.
- By default, nodes maintain connections to two block-relay-only peers (introduced in PR #15759).
- PR #19858 introduced a new routine to regularly initiate temporary, block-relay-only connections with new peers and sync headers, in an effort to make eclipse attacks more difficult. We discussed this PR in a previous review club, #19858.
PR #23614 adds a unit test for the extra block-relay-only eviction logic.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?
EvictExtraOutboundPeers() selects outbound peers to evict when the number of connections exceeds our limit. Is this normal? How can this happen?
What are block-relay-only peers? What are extra block-relay-only peers?
Describe the eviction logic for extra block-relay-only peers.

4a. (Bonus) Compare the extra block-relay-only eviction logic with the extra full-relay eviction logic.

4b. (Bonus #2) Compare the extra outbound eviction logic with the normal outbound eviction logic. When do we care about block announcement recency, and when do we care about ping time? Why?
What is MINIMUM_CONNECT_TIME? Why do we wait a period of time before considering a peer for eviction?
The unit test checks the fDisconnect values of peers in order to test eviction logic, and prevents the peer from being disconnected by setting fDisconnect to false. Why is this appropriate? Why don’t we wait for the peers to be disconnected, like in the p2p_eviction.py functional test?
Why does the test need to call FinalizeNode()?
How often is CheckForStaleTipAndEvictPeers executed? In unit tests, can we trigger it to run automatically by using SetMockTime() to fast-forward?
Can a block-relay-only peer become a full-relay peer?

Meeting Log

17:00

<glozow> #startmeeting

17:00

<glozow> Welcome to PR review club!

17:01

<glozow> Feel free to say hi so I don't think I'm talking to myself!

17:01

<neha> hi!

17:01

<b10c> hi

17:01

<ccdle12> hi

17:01

<glozow> Today we're looking at PR #23614: Add unit test for block-relay-only eviction

17:01

<glozow> notes in the usual place: https://bitcoincore.reviews/23614

17:01

<svav> Hi

17:01

<stickies-v> hi everyone!

17:02

<glozow> woohoo! did anybody get a chance to review the PR/the notes? y/n

17:02

<stickies-v> 0.5y

17:03

<ccdle12> n (not in depth)

17:03

<raj_> Hello..

17:03

<raj_> y

17:03

<svav> Read them briefly

17:03

<neha> y

17:03

<neha> .5y

17:04

<glozow> ok well we've got lots of conceptual questions to warm ourselves up

17:04

<glozow> first question: `EvictExtraOutboundPeers()` selects outbound peers to evict when the number of connections exceeds our limit. Is this normal? How can this happen?

17:04

<andrewtoth_> hi

17:05

<gene> hi

17:05

<tr3xx> hi

17:05

<stickies-v> This is normal, we regularly initiate extra short-lived block-only connections to a wider set of peers to prevent eclipse attacks.

17:05

<ccdle12> I think we add an extra peer that;s one beyond the limit, while we compare it with our existing outbound peers and drop it's not a better choice for eviction

17:05

<raj_> Mostly when we detect a potential stale tip? At that time we try to increase our peer count by connecting to extra outbounds?

17:06

<stickies-v> I suppose it also could happen if we change the number of allowed outbound peers, but I'm not sure if that's possible without restarting bitcoind?

17:06

<glozow> stickies-v: ccdle12: raj_: yes! we'll sometimes initiate an extra outbound

17:06

<schmidty> hi

17:06

<glozow> stickies-v: yeah, I think the limit doesn't change while the node is running

17:06

<andrewtoth_> there are also feeler connections every few minutes right?

17:07

<willcl_ark> The stale tip interval is every 10 minutes

17:07

<raj_> noob question: what are filler connections?

17:07

<glozow> stale tip is a good example - if we haven't heard about anything for a while, we'll send out a feeler to see if somebody else knows about new blocks

17:08

<glozow> raj_: you misspelled _good_ question! a feeler is just a temporary outbound connection

17:08

<raj_> oh its "feeler" not "filler".

17:08

<raj_> ok thanks @glozow

17:09

<glozow> andrewtoth_: perhaps you are referring to feelers used to regularly verify whether an addr corresponds to a real node

17:09

<andrewtoth_> i was yes

17:09

<andrewtoth_> are those also evicted via EvictExtraOutboundPeers()?

17:10

<glozow> cool. and a third example is one relevant to this PR, introduced in PR #19858. can anyone describe what these extra connections are?

17:12

<glozow> andrewtoth_: hm i don't think so but i'm not sure. i assume that we're not making those types of feeler connections with the intent that we might replace one of our current outbounds with them. idk 🤷

17:12

<stickies-v> These extra block-relay-only peers from #19858 are (initially) short-lived additional connections made to make eclipse attacks more difficult by quickly checking if this new peer has previously unheard of blocks, and then disconnecting if that's not the case (i.e. no eclipsing is happening)

17:13

<glozow> stickies-v: yes! ding ding ding

17:13

<glozow> this leads us to the next question: what are block-relay-only peers?

17:13

<raj> peers who only relay blocks to us.

17:15

<glozow> raj: indeed. does anyone know how we make sure the peer only relays blocks to us?

17:15

<stickies-v> setting the fRelay flag to false in the version message

17:15

<glozow> bonus question: do we only relay blocks to them?

17:15

<glozow> stickies-v: correct! and what happens if they send us an unconfirmed transaction?

17:16

<stickies-v> disconnect I think?

17:16

<raj> Not necessarily I would guess.. They might not have us as block relay only?

17:16

<brunoerg> we discard it and disconnect?

17:16

<glozow> stickies-v: correct :) https://github.com/bitcoin/bitcoin/blob/1a369f006fd0bec373b95001ed84b480e852f191/src/net_processing.cpp#L3087-L3091

17:19

<glozow> raj: you're right that they wouldn't disconnect us for a transaction unless they explicitly also told us not to relay them. but we don't relay transactions to block-relay-only peers

17:19

<stickies-v> raj I think that's exactly the point of first exchanging version and verack messages when connecting to another peer - it ensures that both parties know what the other node can and wants to do, so there can't be any confusion about who's block-relay-only and who isn't

17:20

<glozow> Ok, and we've already answered the other question, _extra_ block-relay-only peers are those additional connections made to help prevent eclipse attacks, from #19858

17:20

<glozow> can anybody Describe the eviction logic for extra block-relay-only peers?

17:21

<raj> We check between the latest and next-latest connected peer, and evict whoever didn't send us a block for more time..

17:22

<ccdle12> we also make sure the existing node we are going to evict has been connected for at least the minimum amount of time connected and also that there are no current blocks in flight between us

17:23

<glozow> raj: what if the youngest and/or second-youngest hasn't given us a block?

17:25

<raj> Good point.. I don't know.. :D

17:25

<glozow> ccdle12: very good point. why do we wait a period of time before considering a peer for eviction?

17:25

<stickies-v> if the peer hasn't sent a block yet -> evict

17:26

<glozow> stickies-v: yep. and if both haven't sent any?

17:26

<raj> @glozow, I think then we remove the most youngest connection?

17:26

<glozow> raj: bingo

17:27

<glozow> any ideas why we care about longevity of connection?

17:27

<ccdle12> glozow: I think to give a connection a reasonable amount of time to gossip/send information? (could be a scenario where we would be essentially "thrashing" between connections and not necessarily sharing information)

17:27

<raj> longer connection == more reliable honest peer?

17:28

<neha> glozow: is 30 seconds long enough for blocks-relay-only peers?

17:28

<glozow> ccdle12: right, we should give the peer a chance to say something before we decide they haven't been useful enough :P

17:29

<glozow> raj: not necessarily

17:29

<glozow> neha: not sure. how long does it take for new peers to tell us what their tip is?

17:30

<glozow> i guess we include start height in the version message

17:31

<neha> glozow: why shouldn't we give them 10 minutes? do they send other things besides new block announcements?

17:32

<stickies-v> there's a DOS tradeoff there as well, dishonest nodes could keep your connection open for 10 minutes for no good reason

17:33

<neha> stickies-v: but how can you even tell if there's a "good reason" when there hasn't been a new block?

17:34

<stickies-v> oh because in an eclipse attack you wouldn't have heard about new blocks that were produced in the past, so that's why you can just quickly connect to a new peer and verify if you're missing anything, and then if it's all looking good you just disconnect again

17:34

<glozow> i guess we make extra connections with the explicit intention of discovering a new block

17:35

<neha> ah, ok. so the goal *is* to cycle through peers! but then not sure it's worth waiting 30 seconds?

17:35

<glozow> so maybe it's fair here to make a decision based on whether or not we learn new information from this peer

17:36

<stickies-v> yeah I think so. And I think there's a security aspect to having a relatively high minimum time (e.g. 30 seconds) as well, since if as an attacker you could somehow slow down your peers connections it would be easier to just have them all expire before new blocks were communicated?

17:37

<glozow> i guess it's a guesstimate upper bound of how long it takes to do a handshake and sync tips?

17:37

<ccdle12> I suppose there could be an optimization on the time for block relay only, since 30 seconds is also applied to full outbound and they would be communicating msgs that we would be more frequent. But I guess hard to prove what would be optimum?

17:39

<glozow> i'm not sure if this should be different for full and block-relay-only. it's basically a time for "we assume if they had something useful to say they would've said it by now"

17:40

<glozow> eviction for extra full relay is also based on block announcements: https://github.com/bitcoin-core-review-club/bitcoin/blob/93a0ec1a629af533bb21418a3e134c268bc57395/src/net_processing.cpp#L4002

17:41

<glozow> i'll throw out the next question, but feel free to continue discussing eviction logic choices

17:41

<glozow> The unit test checks the fDisconnect values of peers in order to test eviction logic, and prevents the peer from being disconnected by setting fDisconnect to false. Why is this appropriate? Why don’t we wait for the peers to be disconnected, like in the p2p_eviction.py functional test?

17:42

<glozow> link to the code: https://github.com/bitcoin-core-review-club/bitcoin/blob/4c449a55c29b4b382660852b20800d0ae2bc9e22/src/test/denialofservice_tests.cpp#L232-L234

17:42

<raj> Because its unit test? We just need to check that eviction switch is triggered, and not wait for real eviction. Also there is no real nodes actually in unit tests i think?

17:43

<glozow> raj: yeah exactly, in a unit test we can test much more granularly

17:44

<Kaizen_Kintsugi> gah, here

17:46

<glozow> In a normal running node, how often is `CheckForStaleTipAndEvictPeers` executed? In unit tests, can we trigger it to run automatically by using `SetMockTime()` to fast-forward?

17:46

<raj> 45 secs, if i remember correctly. and yes SetMockTime() should work too..

17:47

<stickies-v> yeah the default value for EXTRA_PEER_CHECK_INTERVAL is 45 seconds

17:48

<glozow> yep: https://github.com/bitcoin/bitcoin/blob/4633199cc8a466b8a2cfa14ba9d7793dd4c469f4/src/net_processing.cpp#L1452-L1457

17:49

<glozow> it's the min of `EXTRA_PEER_CHECK_INTERVAL` and `STALE_CHECK_INTERVAL`

17:49

<raj> are these values hardcoded, or they can be changed via some configs?

100

17:50

<glozow> how does `SetMockTime()` work? :) did anyone try modifying the test to setmocktime much further ahead, and see if CheckForStaleTipAndEvictPeers happened?

101

17:51

<stickies-v> raj it doesn't look like there's any config or cli override for EXTRA_PEER_CHECK_INTERVAL so I think the only way is to change the code

102

17:51

<glozow> why would a user want to change the extra peer check interval?

103

17:52

<raj> stickies-v, then I wonder why there is a check in https://github.com/bitcoin/bitcoin/blob/4633199cc8a466b8a2cfa14ba9d7793dd4c469f4/src/net_processing.cpp#L1456

104

17:52

<glozow> raj: it's a compile-time sanity check