The PR branch HEAD was 32c6ce0be3 at the time of this review club meeting.
Notes
A node allows a limited number of connections, the default is 125. When this
limit is reached, the node reacts to the connection of a new inbound peer by
terminating the connection with an existing inbound peer (āevictionā).
Certain peers are protected from eviction by a heuristic algorithm, based on
various metrics such as netgroup, connection time, ping time, them having sent
us transactions or blocks. This is implemented in
CConnman::AttemptToEvictConnection()
and was added in PR #6374 and
extended in PR #8084.
The eviction logic is currently covered by neither functional nor unit tests
(Test Coverage), which was noticed in Issue #16660 .
This PR suggests a functional test for the eviction mechanism, covering
protection of fast responding peers and peers that have sent us transactions
and blocks. This is achieved by adding several mininode peers with different
properties and testing that after reaching the maximum number of connections,
a peer is being evicted, which is not among the peers assumed to be protected.
<jnewbery> Before we start though, I want to draw everyone's attention to the code of conduct that we added to the website last week: https://bitcoincore.reviews/code-of-conduct. Please read it!
<jnewbery> We want to make sure that PR Review Club is a safe, respectful, and productive environment for everyone. The Code of Conduct documents the behaviour we expect to ensure that's the case.
<amiti> it seems like we try to evict from the netgroup with the max number of connections of unprotected peers, so maybe as a method of ensuring peers across multiple netgroups?
<sipa> just in general: to try to improve the quality of inbound connections (a combination of peers that relay blocks quickly, don't misbehave, and come from a variety of networks across the world)
<michaelfolkson> So you kind of want to get best of both worlds or a happy medium. Go too far in either direction and you don't get benefits of both extremes
<gzhao408> this might be a big claim, but I actually think any eviction policy that doesn't protect certain characteristics (i.e. ones that are hard to fake like minping and longest connection time) would almost guarantee that an attacker could fill up your slots
<ecurrencyholder> It's preferable to keep a connection that has proven to be reputable. By evicting it, you are opening yourself up to potentially connect to a malicious one.
<lightlike> yes, i think it's a balance. we don't want to have our peers too static, but we also need to protect against attackers actively eviction other peers.
<decipher> having defined peering preferences gives an attacker the success criteria they need to meet to perform an attack -- regardless of what criteria are chosen. Our objective should be - choose the criteria which suite are desires best but also make it as costly as possible for an attacker
<jnewbery> I think amiti and decipher's point about having different criteria is very important. We obviously want to receive blocks quickly so a naive approach would be to just protect the peers that send us blocks. If we did that would it be easier or harder for an attacker to eclipse you?
<sipa> the network serves a variety of purposes; for blocks it's partition resistance and fast relay; for transactions it's original privacy; for addresses there are eclipse attack concerns, ...
<jnewbery> decipher: compared to not just protecting block-serving peers. I think the answer is easier because a determined adversary can be a better block-serving peer than an average peer.
<nehan> michaelfolkson: I think the issue is that if you measure nodes by how "good" they are and it's something an attacker can optimize, they can become better than average nodes and take up all the good nodes slots. I think :)
<raj_149> It seems focusing on any sinle character will by definition open up gamability. the only way is to have as many character as possible as criteria?
<gzhao408> lightlike: make a list of eviction candidates, sort them by characteristics and remove the top ones of each category, whittling the list down based on peers that you definitely want to protect. if you still have candidates left over, make an eviction decision?
<amiti> lightlike: start by selecting eviction candidates that are inbound peers that are not on the NO BAN list or already marked for disconnect. remove the top candidates from each of the desirable categories (netgroup, ping time, recently sent txns, recently sent blocks, and the longest connected). If there are any candidates left over, choose from the netgroup with the most connections
<raj_149> is it implicite that there will always be a candidate for eviction (assuming peercount limit reached)? or it might be the case that none of them gets evicted and new connection is rejected?
<nehan> so the idea is to optimize for metrics that are "proof of good and hard to fake behavior". like low ping times, bw usage for relaying blocks, etc. others? sort of like proof of work, ha
<gzhao408> I think it's important that it's not just favoring some but /protecting/ them - that's why we remove them from the list so even if a node is both a long-lasting connection and has short ping times we consider the same number regardless of overlap. and to nehan's point, the more expensive behavior comes at the end
<lightlike> raj_149: I'm not sure how we would not find a candidate - if we have maxconnections large enough that that there are more than the fixed 20 peers (4 block +4 tx + 4netgroup +8 ping).
<raj_149> IMO weighting might be problematic as an attacker then gets higher probabailty to get into protected sets by playing good just before the attack?
<lightlike> gzhao408: right. that's why maxconnections needed to be raised by two when the PR introducing two more outbound connections (blocks-only) was merged
<michaelfolkson> Is it testing a Core node by spinning up a bunch of regtest nodes that connect to that Core node? Or a separate regtest network and testing how that independent network works?
<raj_149> michaelfolkson: its making some peers satisfy the above criterias so to add them in protected group, then checking eviction is happening only from the non protected group, and protected peers are kept intact as intended.
<lightlike> I originally had another commit with a unit test (testing the sorting in more detail), but that was controversial because in order to make the methods from Boost accessible, quite some code needed to be moved from net.cpp to net.h
<troygiorshev> andrewtoth: a functional test should try and simulate the real world as closely as possible. forcing us to actually spin up multiple nodes and connect them could help us realize if we've made an incorrect assupmtion somewhere along the line
<lightlike> amiti: I think this should be possible. One problem is that due to the already mentioned impossibility to mock netgroup, there is some randomness involved (the first step kicks out 4 "random" peers which makes writing the tests harder)
<andrewtoth> on one hand, it does simulate a lot of different things which would test for incorrect assumptions. on the other, it should be testing exactly the behaviour we want and not more.
<andrewtoth> michaelfolkson nothing particularly but to simulate real world we should use defaults. Could be some incorrect assumptions about how eviction works that we are not considering with default peers
<troygiorshev> andrewtoth: good point. I think, if you have multiple peers left over after protecting the 20, the logic to choose which one to evict is very simple, so there isn't much benefit to testing that it picks the right one?
<jnewbery> it'd be great if we could somehow mock remote IP ranges in the test framework. There are quite a few differences between how we treat local peers and remote peers
<gzhao408> lightlike I wondered for a while why minping (best ping time ever) is used instead of pingtime (most recent ping time), but for the purposes of protection it seems like mingping would be much harder for an attacker to manipulate and thus...safer?
<sipa> gzhao408: ping times vary due to network conditions, node overloading, ...; but an attacker still can't fake minping (if minping is N ms, then you know the node is at most N*150 km away)
<MarcoFalke> michaelfolkson: I think having the tests call out to remote peers by default could be problematic, though if it is opt-in and can't be achieved in another way, why not
<jnewbery> Don't forget that tomorrow is the last of our BIP 157 specials, and probably the most interesting from a protocol-design perspective. Notes are here: https://bitcoincore.reviews/19070.html
<jnewbery> And next week we'll get into some nice crypto implementation. Fabian will host a meeting on sipa's Python implementation of MuHash. Notes and questions will be up soon.
<gzhao408> sipa: ah, that makes sense that it can serve as a measurement of physical distance. so this would fall under "hard for attacker to fake" category more than "favorable behavior" category? I was confused because I thought we used it because we cared about how fast our peers were responding