The PR branch HEAD was 93a0ec1a at the time of this review club meeting.
Notes
An eclipse attack is
when an adversary is able to isolate a victim from the Bitcoin network by
taking over all of its connections. We covered eclipse attacks in the review
club meetings for PR 16702
and PR 17428.
Block-relay-only connections are a type of connection where nodes do not
participate in transaction or address relay and only relay blocks. An
effective way for a spy node to infer the network topology is to observe the
timing and details of transaction and address relay, so these block-relay-only
connections obfuscate network topology, helping to mitigate eclipse attacks.
Block-relay-only connections were introduced in
#15759. After these changes,
nodes by default open two outbound block-relay-only connections on startup.
We discussed this PR in a previous review
club. Refer to the notes and log from
that meeting if you’re interested in learning more.
This week’s PR, #19858,
proposes a more advanced use of block-relay-only connections to further
mitigate eclipse attacks. After this PR, the node will periodically initiate
an additional block-relay-only connection, and then sync headers to try to
learn about new blocks.
After the node opens an additional block-relay-only connection to a peer, it
begins to sync headers. If this reveals new blocks, the eviction logic will
rotate out an existing block-relay-only connection. If no new blocks are
discovered, the connection is closed. It’s important for this eviction logic to
be carefully reviewed.
<jonatack> y. i've been collecting data from a node running the branch with custom logging. need to analyze it further, but connectivity does seem improved.
<dhruvm> (1) Mitigation against eclipse attacks by periodically creating new relay-only connections to look for higher PoW elsewhere and (2) Honest peers helping bridge other trapped honest nodes.
<jnewbery> dappdever: yes, relaying transactions and addrs reveals information about the network topology, which could be used by a spy to map your connections
<elle> even if an attacker can map your connections, how can they actually perform the eclipse attack? how can the attacker get you to disconnect you other peers and connect to them instead?
<lightlike> I'm still a a bit sceptical: if your addrman has been corrupted, the additional periodic block-only connections don't seem to help much (because you likely won't connect to a good peer). But if your addrman is healthy, how did you get eclipsed in the first place?
<pinheadmz> elle i think a real eclipse involves filling up a peers addrman with "poinson" entries and just waiting and waiting for those to get rotated in, or possibly joinging with a DoS attack to force the node to restart ?
<michaelfolkson> lightlike, glozow: I think you are thinking in boolean rather than on a continuum. Your addrman isn't either perfectly pure or perfectly poisoined. It could be partially poisoned or under attack
<dhruvm> Since certain nodes are probably larger targets for eclipse attacks, small incremental benefits for them should have significant network benefits?
<jnewbery> We've already started touching on the next question: Describe a scenario in which periodic block-relay-only connections could help prevent an attack?
<sipa> one argument may be the symmetry: we create extra normal connections that may get promoted to normal ones... it feels natural to do the same with blockonly connections
<amiti> building on the conversation so far- I guess if an adversary has been trying to poison your addrman, has taken over some of it eg. 60%. you could happen to have all your outbound connections to them, but then rotating through some of the others connects you to the honest network?
<willcl_ark> If we are worried about having a poisoned addrman, and then with a feeler we detect that we're on a stale tip, would we not want to promote that new ("honest") peer to a NODE_NETWORK peer, rather than blocksonly?
<amiti> dhruvm: ah fair, but I think it would be harder to be eclipsed if you're also enabling inbounds? although maybe not that much harder if you were just spinning up 🤔
<jnewbery> Just so everyone is clear, -blocksonly mode and block-relay-only peers are two different concepts. Again, I think it's important to be precise with terminology because they mean different things and it's easy to get confused between them otherwise.
<michaelfolkson> How long does it take for a newly mined block to propagate across the network? You would want your block-relay connection to have the chance to give you the latest block rather than being cut off instantly?
<jnewbery> joelklabo: yes. In fact, you're connecting to them and saying "no transactions please". They may still gossip addresses to you but you'd ignore them
<willcl_ark> so what are the assumptions here: we are currently eclipsed, but we have _some_ honest addresses in our addrman. This PR will eventually rotate through enough block-relay-only peers that we will discover a more-work chain, then replace one of our current block-relay-only peers with the new peer?
<michaelfolkson> And all connections are taken from the addrman right? If your addrman is 100 percent poisoned and all connections are controlled by adversary you're done unless you restart
<sipa> willcl_ark: another way to put it is that with more connections, even just occasionally, the threshold of how much of addrman needs to be under attacker control increases
<jnewbery> joelklabo: I'd say mostly less leaking of network topology. The fact that they use less bandwidth/memory/CPU is a nice property that means we can potentially have more of them
<willcl_ark> sipa: thats an interesting way to look at it. To be honest, I am kind of suprised nodes don't handshake and request tip hash (or a getheaders) from more nodes before this PR
<jesseposner> There are many flavors of eclipse attacks, and I believe that the block-relay-only connections are trying to mitigate against attacks that target the connected peers directly. It doesn't protect against a 100% poisoned addrman state. It simply makes it more difficult to discover all the connected peers.
<jnewbery> sorry, that was a bit flippant. But using DNS seeds does have a downside. One is that you're potentially telling people that you're running a bitcoin node that you wouldn't otherwise be doing
<michaelfolkson> dhruvm: "the behavior introduced here is beneficial to not just the node doing it, but to the whole network, as a node already connected to the honest network that is periodically connecting to new peers to sync tips with others is helping bridge the entire network"
<sipa> joelklabo: tx relay and addr relay are relatively easy to use for identifying nodes and connectioms; there are various known techniques, and it's not so surprising... just construct a random tx or addr message, send it somewhere, and see where it appears
<ajonas> michaelfolkson dhruvm - well there is also a subtle upgrade to find new block-only peers that are "useful" as in they deliver blocks that our current peers aren't delivering
<jnewbery> We do make use of techniques to try to mask tx origination (e.g. randomized timer, batching invs), but I think those are really adding statistical noise. With a large enough sample set, I think it would still be possible to map the network
<lightlike> one downside would be that an attacker that we connect to and that can send us a new block would replace a potentially honest blocks-only connection - although this would be really costly for the attacker, because they would basically need to be a miner withholding a new block from the network just to eclipse us further.
<dhruvm> michaelfolkson: ajonas: How is a node already connected to an honest network, helping bridge the entire network by making new "feeler" block-relay-only connections?
<blanching> batching invs only hides data flows / timing analysis which in some domains can obfuscate some topology but weakly, probabilistically (high resolution: eve knows x was sent before y and can conclude A)
<amiti> dhruvm, michaelfolkson: block-relay-only connections also help with grander partition attacks than just eclipse attacks (eg. a malicious party trying to create a network split). which is more a network level concern (and everybody loses if that happens)
<michaelfolkson> How long does it take for a newly mined block to propagate across the network? You would want your block-relay connection to have the chance to give you the latest block rather than being cut off instantly?
<jnewbery> ajonas: if we receive a block from the additional block-relay-only peer while we're connected, will we _always_ disconnect the second youngest?
<jnewbery> willcl_ark: right, we see which of our two youngest block-relay-only peers (which will be one of the existing block-relay-only peers and the 'additional' block-relay-only peer) gave us the block least recently and consider it for disconnection
<glozow> oh, you mean like, there are fewer total available inbounds in the network? so it'd be harder for a new node to get a good set of outbound connections?
<lightlike> jonatack: In your test on mainnet, how often per hour would you actually succesfully connect to a new outbound peer and compare headers? In a short run on testnet yesterday with a fresh node (no peers.dat), that was rather rare, most of the time I'd get no response and the 5min poisson timer would start again.
<jonatack> lightlike: i've been busy with things for the 0.21 release and meaning to get back to the data, but IIRC, using this PR i did see greater peer diversity (e.g. higher peer id numbers) than usual. don't depend on that info though; i need to refresh and look at the data
<jnewbery> ok, so if we call EvictExtraOutboundPeers(), and we currently have an additional block-relay-only peer (i.e. 3 in total), and the youngest gave us a block more recently than the second youngest, what do we do?
<glozow> sipa: makes sense. i always thought the inbound : outbound ratio seemed really high, and people explained because not everyone enables inbounds, so the people who do have to make up for it.
<lightlike> jonatack: if you had net logging enabled, you could also grep for the new log entries (e.g. "keeping block-relay-only peer=X chosen for eviction")
<jnewbery> sipa: apparently so. The half-dose/full-dose oxford vaccine regime is more effective than the full-dose/full-dose regime and they only found out because they'd accidentally given the half dose to a bunch of people
<jnewbery> willcl_ark: youngest_peer.first refers to the peer id. (youngest_peer is a std::pair of peer id and the time of the latest block from that peer)
<michaelfolkson> sipa jnewbery: Misunderstandings are great for inventing things. Not so great for analyzing a robustness of a system and avoiding bugs ;)
<jnewbery> ajonas: exactly right. If the timer for EvictExtraOutboundPeers() pops a few seconds after connecting to the additional block-relay-only peer, then we haven't given it enough time to send us a block, so we don't disconnect
<jnewbery> That function is really long and really confusing, but it's essential to understanding how we manage our outbound connections, so definitly worth digging into if this stuff interests you