The PR branch HEAD was 32c6ce0be3 at the time of this review club meeting.
A node allows a limited number of connections, the default is 125. When this
limit is reached, the node reacts to the connection of a new inbound peer by
terminating the connection with an existing inbound peer (“eviction”).
Certain peers are protected from eviction by a heuristic algorithm, based on
various metrics such as netgroup, connection time, ping time, them having sent
us transactions or blocks. This is implemented in
and was added in PR #6374 and
extended in PR #8084.
The eviction logic is currently covered by neither functional nor unit tests
(Test Coverage), which was noticed in Issue #16660 .
This PR suggests a functional test for the eviction mechanism, covering
protection of fast responding peers and peers that have sent us transactions
and blocks. This is achieved by adding several mininode peers with different
properties and testing that after reaching the maximum number of connections,
a peer is being evicted, which is not among the peers assumed to be protected.
<gzhao408> this might be a big claim, but I actually think any eviction policy that doesn't protect certain characteristics (i.e. ones that are hard to fake like minping and longest connection time) would almost guarantee that an attacker could fill up your slots
<decipher> having defined peering preferences gives an attacker the success criteria they need to meet to perform an attack -- regardless of what criteria are chosen. Our objective should be - choose the criteria which suite are desires best but also make it as costly as possible for an attacker
<jnewbery> I think amiti and decipher's point about having different criteria is very important. We obviously want to receive blocks quickly so a naive approach would be to just protect the peers that send us blocks. If we did that would it be easier or harder for an attacker to eclipse you?
<nehan> michaelfolkson: I think the issue is that if you measure nodes by how "good" they are and it's something an attacker can optimize, they can become better than average nodes and take up all the good nodes slots. I think :)
<gzhao408> lightlike: make a list of eviction candidates, sort them by characteristics and remove the top ones of each category, whittling the list down based on peers that you definitely want to protect. if you still have candidates left over, make an eviction decision?
<amiti> lightlike: start by selecting eviction candidates that are inbound peers that are not on the NO BAN list or already marked for disconnect. remove the top candidates from each of the desirable categories (netgroup, ping time, recently sent txns, recently sent blocks, and the longest connected). If there are any candidates left over, choose from the netgroup with the most connections
<gzhao408> I think it's important that it's not just favoring some but /protecting/ them - that's why we remove them from the list so even if a node is both a long-lasting connection and has short ping times we consider the same number regardless of overlap. and to nehan's point, the more expensive behavior comes at the end
<raj_149> michaelfolkson: its making some peers satisfy the above criterias so to add them in protected group, then checking eviction is happening only from the non protected group, and protected peers are kept intact as intended.
<lightlike> I originally had another commit with a unit test (testing the sorting in more detail), but that was controversial because in order to make the methods from Boost accessible, quite some code needed to be moved from net.cpp to net.h
<troygiorshev> andrewtoth: a functional test should try and simulate the real world as closely as possible. forcing us to actually spin up multiple nodes and connect them could help us realize if we've made an incorrect assupmtion somewhere along the line
<lightlike> amiti: I think this should be possible. One problem is that due to the already mentioned impossibility to mock netgroup, there is some randomness involved (the first step kicks out 4 "random" peers which makes writing the tests harder)
<andrewtoth> michaelfolkson nothing particularly but to simulate real world we should use defaults. Could be some incorrect assumptions about how eviction works that we are not considering with default peers
<troygiorshev> andrewtoth: good point. I think, if you have multiple peers left over after protecting the 20, the logic to choose which one to evict is very simple, so there isn't much benefit to testing that it picks the right one?
<gzhao408> lightlike I wondered for a while why minping (best ping time ever) is used instead of pingtime (most recent ping time), but for the purposes of protection it seems like mingping would be much harder for an attacker to manipulate and thus...safer?
<gzhao408> sipa: ah, that makes sense that it can serve as a measurement of physical distance. so this would fall under "hard for attacker to fake" category more than "favorable behavior" category? I was confused because I thought we used it because we cared about how fast our peers were responding