Add unit test for block-relay-only eviction (tests)

https://github.com/bitcoin/bitcoin/pull/23614

Host: glozow  -  PR author: mzumsande

Notes

  • Block-relay-only is a type of outbound P2P connection: the node initiates a P2P connection to another peer with fRelay set to false in the version message, indicating that it does not wish to receive unconfirmed transactions. The node will also not relay addrs with this peer.

    • By default, nodes maintain connections to two block-relay-only peers (introduced in PR #15759).

    • PR #19858 introduced a new routine to regularly initiate temporary, block-relay-only connections with new peers and sync headers, in an effort to make eclipse attacks more difficult. We discussed this PR in a previous review club, #19858.

  • PR #23614 adds a unit test for the extra block-relay-only eviction logic.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. EvictExtraOutboundPeers() selects outbound peers to evict when the number of connections exceeds our limit. Is this normal? How can this happen?

  3. What are block-relay-only peers? What are extra block-relay-only peers?

  4. Describe the eviction logic for extra block-relay-only peers.

    4a. (Bonus) Compare the extra block-relay-only eviction logic with the extra full-relay eviction logic.

    4b. (Bonus #2) Compare the extra outbound eviction logic with the normal outbound eviction logic. When do we care about block announcement recency, and when do we care about ping time? Why?

  5. What is MINIMUM_CONNECT_TIME? Why do we wait a period of time before considering a peer for eviction?

  6. The unit test checks the fDisconnect values of peers in order to test eviction logic, and prevents the peer from being disconnected by setting fDisconnect to false. Why is this appropriate? Why don’t we wait for the peers to be disconnected, like in the p2p_eviction.py functional test?

  7. Why does the test need to call FinalizeNode()?

  8. How often is CheckForStaleTipAndEvictPeers executed? In unit tests, can we trigger it to run automatically by using SetMockTime() to fast-forward?

  9. Can a block-relay-only peer become a full-relay peer?

Meeting Log

  117:00 <glozow> #startmeeting
  217:00 <glozow> Welcome to PR review club!
  317:01 <glozow> Feel free to say hi so I don't think I'm talking to myself!
  417:01 <neha> hi!
  517:01 <b10c> hi
  617:01 <ccdle12> hi
  717:01 <glozow> Today we're looking at PR #23614: Add unit test for block-relay-only eviction
  817:01 <glozow> notes in the usual place: https://bitcoincore.reviews/23614
  917:01 <svav> Hi
 1017:01 <stickies-v> hi everyone!
 1117:02 <glozow> woohoo! did anybody get a chance to review the PR/the notes? y/n
 1217:02 <stickies-v> 0.5y
 1317:03 <ccdle12> n (not in depth)
 1417:03 <raj_> Hello..
 1517:03 <raj_> y
 1617:03 <svav> Read them briefly
 1717:03 <neha> y
 1817:03 <neha> .5y
 1917:04 <glozow> ok well we've got lots of conceptual questions to warm ourselves up
 2017:04 <glozow> first question: `EvictExtraOutboundPeers()` selects outbound peers to evict when the number of connections exceeds our limit. Is this normal? How can this happen?
 2117:04 <andrewtoth_> hi
 2217:05 <gene> hi
 2317:05 <tr3xx> hi
 2417:05 <stickies-v> This is normal, we regularly initiate extra short-lived block-only connections to a wider set of peers to prevent eclipse attacks.
 2517:05 <ccdle12> I think we add an extra peer that;s one beyond the limit, while we compare it with our existing outbound peers and drop it's not a better choice for eviction
 2617:05 <raj_> Mostly when we detect a potential stale tip? At that time we try to increase our peer count by connecting to extra outbounds?
 2717:06 <stickies-v> I suppose it also could happen if we change the number of allowed outbound peers, but I'm not sure if that's possible without restarting bitcoind?
 2817:06 <glozow> stickies-v: ccdle12: raj_: yes! we'll sometimes initiate an extra outbound
 2917:06 <schmidty> hi
 3017:06 <glozow> stickies-v: yeah, I think the limit doesn't change while the node is running
 3117:06 <andrewtoth_> there are also feeler connections every few minutes right?
 3217:07 <willcl_ark> The stale tip interval is every 10 minutes
 3317:07 <raj_> noob question: what are filler connections?
 3417:07 <glozow> stale tip is a good example - if we haven't heard about anything for a while, we'll send out a feeler to see if somebody else knows about new blocks
 3517:08 <glozow> raj_: you misspelled _good_ question! a feeler is just a temporary outbound connection
 3617:08 <raj_> oh its "feeler" not "filler".
 3717:08 <raj_> ok thanks @glozow
 3817:09 <glozow> andrewtoth_: perhaps you are referring to feelers used to regularly verify whether an addr corresponds to a real node
 3917:09 <andrewtoth_> i was yes
 4017:09 <andrewtoth_> are those also evicted via EvictExtraOutboundPeers()?
 4117:10 <glozow> cool. and a third example is one relevant to this PR, introduced in PR #19858. can anyone describe what these extra connections are?
 4217:12 <glozow> andrewtoth_: hm i don't think so but i'm not sure. i assume that we're not making those types of feeler connections with the intent that we might replace one of our current outbounds with them. idk 🤷
 4317:12 <stickies-v> These extra block-relay-only peers from #19858 are (initially) short-lived additional connections made to make eclipse attacks more difficult by quickly checking if this new peer has previously unheard of blocks, and then disconnecting if that's not the case (i.e. no eclipsing is happening)
 4417:13 <glozow> stickies-v: yes! ding ding ding
 4517:13 <glozow> this leads us to the next question: what are block-relay-only peers?
 4617:13 <raj> peers who only relay blocks to us.
 4717:15 <glozow> raj: indeed. does anyone know how we make sure the peer only relays blocks to us?
 4817:15 <stickies-v> setting the fRelay flag to false in the version message
 4917:15 <glozow> bonus question: do we only relay blocks to them?
 5017:15 <glozow> stickies-v: correct! and what happens if they send us an unconfirmed transaction?
 5117:16 <stickies-v> disconnect I think?
 5217:16 <raj> Not necessarily I would guess.. They might not have us as block relay only?
 5317:16 <brunoerg> we discard it and disconnect?
 5417:16 <glozow> stickies-v: correct :) https://github.com/bitcoin/bitcoin/blob/1a369f006fd0bec373b95001ed84b480e852f191/src/net_processing.cpp#L3087-L3091
 5517:19 <glozow> raj: you're right that they wouldn't disconnect us for a transaction unless they explicitly also told us not to relay them. but we don't relay transactions to block-relay-only peers
 5617:19 <stickies-v> raj I think that's exactly the point of first exchanging version and verack messages when connecting to another peer - it ensures that both parties know what the other node can and wants to do, so there can't be any confusion about who's block-relay-only and who isn't
 5717:20 <glozow> Ok, and we've already answered the other question, _extra_ block-relay-only peers are those additional connections made to help prevent eclipse attacks, from #19858
 5817:20 <glozow> can anybody Describe the eviction logic for extra block-relay-only peers?
 5917:21 <raj> We check between the latest and next-latest connected peer, and evict whoever didn't send us a block for more time..
 6017:22 <ccdle12> we also make sure the existing node we are going to evict has been connected for at least the minimum amount of time connected and also that there are no current blocks in flight between us
 6117:23 <glozow> raj: what if the youngest and/or second-youngest hasn't given us a block?
 6217:25 <raj> Good point.. I don't know.. :D
 6317:25 <glozow> ccdle12: very good point. why do we wait a period of time before considering a peer for eviction?
 6417:25 <stickies-v> if the peer hasn't sent a block yet -> evict
 6517:26 <glozow> stickies-v: yep. and if both haven't sent any?
 6617:26 <raj> @glozow, I think then we remove the most youngest connection?
 6717:26 <glozow> raj: bingo
 6817:27 <glozow> any ideas why we care about longevity of connection?
 6917:27 <ccdle12> glozow: I think to give a connection a reasonable amount of time to gossip/send information? (could be a scenario where we would be essentially "thrashing" between connections and not necessarily sharing information)
 7017:27 <raj> longer connection == more reliable honest peer?
 7117:28 <neha> glozow: is 30 seconds long enough for blocks-relay-only peers?
 7217:28 <glozow> ccdle12: right, we should give the peer a chance to say something before we decide they haven't been useful enough :P
 7317:29 <glozow> raj: not necessarily
 7417:29 <glozow> neha: not sure. how long does it take for new peers to tell us what their tip is?
 7517:30 <glozow> i guess we include start height in the version message
 7617:31 <neha> glozow: why shouldn't we give them 10 minutes? do they send other things besides new block announcements?
 7717:32 <stickies-v> there's a DOS tradeoff there as well, dishonest nodes could keep your connection open for 10 minutes for no good reason
 7817:33 <neha> stickies-v: but how can you even tell if there's a "good reason" when there hasn't been a new block?
 7917:34 <stickies-v> oh because in an eclipse attack you wouldn't have heard about new blocks that were produced in the past, so that's why you can just quickly connect to a new peer and verify if you're missing anything, and then if it's all looking good you just disconnect again
 8017:34 <glozow> i guess we make extra connections with the explicit intention of discovering a new block
 8117:35 <neha> ah, ok. so the goal *is* to cycle through peers! but then not sure it's worth waiting 30 seconds?
 8217:35 <glozow> so maybe it's fair here to make a decision based on whether or not we learn new information from this peer
 8317:36 <stickies-v> yeah I think so. And I think there's a security aspect to having a relatively high minimum time (e.g. 30 seconds) as well, since if as an attacker you could somehow slow down your peers connections it would be easier to just have them all expire before new blocks were communicated?
 8417:37 <glozow> i guess it's a guesstimate upper bound of how long it takes to do a handshake and sync tips?
 8517:37 <ccdle12> I suppose there could be an optimization on the time for block relay only, since 30 seconds is also applied to full outbound and they would be communicating msgs that we would be more frequent. But I guess hard to prove what would be optimum?
 8617:39 <glozow> i'm not sure if this should be different for full and block-relay-only. it's basically a time for "we assume if they had something useful to say they would've said it by now"
 8717:40 <glozow> eviction for extra full relay is also based on block announcements: https://github.com/bitcoin-core-review-club/bitcoin/blob/93a0ec1a629af533bb21418a3e134c268bc57395/src/net_processing.cpp#L4002
 8817:41 <glozow> i'll throw out the next question, but feel free to continue discussing eviction logic choices
 8917:41 <glozow> The unit test checks the fDisconnect values of peers in order to test eviction logic, and prevents the peer from being disconnected by setting fDisconnect to false. Why is this appropriate? Why don’t we wait for the peers to be disconnected, like in the p2p_eviction.py functional test?
 9017:42 <glozow> link to the code: https://github.com/bitcoin-core-review-club/bitcoin/blob/4c449a55c29b4b382660852b20800d0ae2bc9e22/src/test/denialofservice_tests.cpp#L232-L234
 9117:42 <raj> Because its unit test? We just need to check that eviction switch is triggered, and not wait for real eviction. Also there is no real nodes actually in unit tests i think?
 9217:43 <glozow> raj: yeah exactly, in a unit test we can test much more granularly
 9317:44 <Kaizen_Kintsugi> gah, here
 9417:46 <glozow> In a normal running node, how often is `CheckForStaleTipAndEvictPeers` executed? In unit tests, can we trigger it to run automatically by using `SetMockTime()` to fast-forward?
 9517:46 <raj> 45 secs, if i remember correctly. and yes SetMockTime() should work too..
 9617:47 <stickies-v> yeah the default value for EXTRA_PEER_CHECK_INTERVAL is 45 seconds
 9717:48 <glozow> yep: https://github.com/bitcoin/bitcoin/blob/4633199cc8a466b8a2cfa14ba9d7793dd4c469f4/src/net_processing.cpp#L1452-L1457
 9817:49 <glozow> it's the min of `EXTRA_PEER_CHECK_INTERVAL` and `STALE_CHECK_INTERVAL`
 9917:49 <raj> are these values hardcoded, or they can be changed via some configs?
10017:50 <glozow> how does `SetMockTime()` work? :) did anyone try modifying the test to setmocktime much further ahead, and see if CheckForStaleTipAndEvictPeers happened?
10117:51 <stickies-v> raj it doesn't look like there's any config or cli override for EXTRA_PEER_CHECK_INTERVAL so I think the only way is to change the code
10217:51 <glozow> why would a user want to change the extra peer check interval?
10317:52 <raj> stickies-v, then I wonder why there is a check in https://github.com/bitcoin/bitcoin/blob/4633199cc8a466b8a2cfa14ba9d7793dd4c469f4/src/net_processing.cpp#L1456
10417:52 <glozow> raj: it's a compile-time sanity check
10517:52 <raj> if its hardcoded, this should alsways be true right?
10617:53 <raj> glozow, oh ok. that makes sense. So just to check if the variables are set correctly at compile time?
10717:54 <glozow> they are not variables, they are constexprs - we're making sure we wrote the code correctly
10817:54 <glozow> some info on static asserts: https://stackoverflow.com/questions/1647895/what-does-static-assert-do-and-what-would-you-use-it-for
10917:54 <raj> thanks..
11017:54 <glozow> ok last, hopefully fun, question: Can a block-relay-only peer become a full-relay peer?
11117:55 <raj> yes it can, if we disconnect from them and connect back again but this time in full relay mode?
11217:56 <stickies-v> I didn't check, but what happens if peers just exchange version messages again?
11317:56 <glozow> oof you caught me. rephrase: Can a block-relay-only peer become a full-relay peer while connected?
11417:56 <raj> :D
11517:57 <stickies-v> Long running peers are also valuable so ideally you wouldn't just want to disconnect if it wasn't necessary
11617:57 <glozow> stickies-v: no, only 1 version message per connection
11717:58 <jnewbery> stickies-v: peers are only allowed to exchange version messages once per connection. See https://github.com/bitcoin/bitcoin/blob/4633199cc8a466b8a2cfa14ba9d7793dd4c469f4/src/net_processing.cpp#L2561-L2564
11817:58 <glozow> hint: fRelay was introduced with BIP37 https://github.com/bitcoin/bips/blob/master/bip-0037.mediawiki#extensions-to-existing-messages
11917:58 <jnewbery> sorry, wrong line numbers: https://github.com/bitcoin/bitcoin/blob/4633199cc8a466b8a2cfa14ba9d7793dd4c469f4/src/net_processing.cpp#L2494-L2498
12018:00 <glozow> alrighty we're out of time, left as exercise to the reader
12118:00 <glozow> #endmeeting