The PR branch HEAD was 78b2f9503 at the time of this review club meeting.
Notes
An eclipse attack is when an attacker is able to isolate a node from all honest
peers.
This PR seeks to mitigate a specific type of eclipse attack by introducing
anchor connections, which are peers that a node tries to reconnect to on
startup.
Background
PR 15759 changed the
peer-to-peer behavior to add two blocks-only connections. We discussed
this change in a previous Review Club
meeting. I recommend you read
the notes, questions and meeting log from that meeting if you’re not
already familiar with those changes.
Eclipse attacks occur
when an adversary is able to isolate a victim’s node from the rest of the
network. We discussed eclipse attacks in previous Review Club meetings,
including the discussion on PR
16702. If you’re unfamiliar with the
concept of eclipse attacks, the Optech topics page and the notes from that
meeting contain links to many resources on the subject.
A restart-based eclipse attack occurs when the adversary is able to add its
own addresses to the victim’s address manager and then force the victim to
restart. If the attack succeeds, the victim will make all of its connections
to the adversary’s addresses when it restarts.
Issue 17326 proposed
persisting the node’s outbound connection list to disk, and on restart
reconnecting to the same peers. It’s worth reading the full discussion in
that issue, since there are a lot of subtle points around which peers
should be persisted.
This PR, PR 17428 adds
functionality to preserve outbound block-relay-only connections during
restart.
Issue 17326
The most likely way for an eclipse attack to occur is by forcing a victim
to restart and then making it connect only to malicious peers.
In the 2015 implementation of Bitcoin Core, an attacker could carry out a
“connection starvation attack”. In this attack, the adversary fills up
all of the available inbound connections on the network and then forces
the victim to reboot. When the victim restarts, he’s unable to connect to
any honest peers on the network since their inbound connections slots are
all filled up. Since anchor connections only help identify honest peers, not
connect to them, they are not an effective mitigation strategy against such an
attack.
Since then, Bitcoin Core has implemented logic to sometimes rotate through
incoming connections when the max number is hit. This means the victim has a
way to successfully evict an attacker’s connections to honest peers and
reconnect to those honest peers. While this means that it’s possible for an
attacker to eclipse a node without it rebooting, it also makes carrying out the
eclipse attack more complex.
Reconnecting to the same outgoing connections on restart makes the network graph more static. While
this helps maintain longstanding honest connections and prevent an eclipse
attack, this can have a negative effect on privacy since the network topology
is easier to map.
There are some other reasons that we might not want to anchor all connections:
A victim of an eclipse attack has no way of escaping if all of his connections
are anchored to the adversary.
Strong persistence can contribute to the network self-partitioning. For
example, if the longer distance connections are less reliable, nodes are
incentivized to connect locally and this could lead to subgraphs per continent.
An attacker could eclipse nodes with a small number of nodes but 100% uptime
and a large capacity for inbound connections. Every time a node adds a
connection to a new peer, if that peer is an adversary then it would be “locked in”
forever with anchoring logic.
The suggestion of using blocks-relay-only peers mitigates the concerns around
reduced outbound peer rotation negatively impacting transaction privacy,
while maintaing the benefit of mitigating from an eclipse attack.
PR 17428
The 2 outbound block-relay-only connections are written to the anchors.dat
file. When a node restarts, it first tries establishing connections with the
anchors before attempting to connect to other peers.
Previously, a node that would quickly reattempt the same outgoing connections
was detected as a spy or mass connector. The new behavior in this PR means that honest
nodes will attempt the same outgoing connections more than once, which was
unlikely before.
The current state of this implementation introduces a new risk – if one of the
anchor peers is malicious and is able to exploit a remote crash vulnerability, they
can repeatedly crash the victim’s node. Every time the victim node restarts,
it would connect to the same malicious anchor peer, which could force another
remote crash.
The proposed fix is to forget the anchor peers on an unclean exit.
In reaction to these findings, the PR was closed.
The PR has subsequently been reopened since there are still advantages to
preserving connections when restarting a node. The implementation of the
proposed fix is yet to come.
What attacks can an adversary execute on the victim?
How does this PR mitigate the attack? What are some limitations of the proposed fix?
How does the anchors.dat file work?
When are peers persisted and removed?
When does the node ignore the contents of anchors.dat?
What are some implications of the timing of adding peers?
What are the conditions for adding an anchor?
Why do we only consider adding blocks-only peers as anchor peers?
What would be the issue of using normal (transaction & block relay) peers
as anchors?
What are limitations of how this implementation chooses anchors?
Why limit to only 2 anchors?
With the current implementation of this PR, how could an adversary carry out
an attack? How is this different from what an attacker would have to do
currently?
What is the new risk that this implementation currently introduces? What are
the tradeoffs of the proposed fix (yet to be implemented)? Do you think the
changeset would still be worthwhile? Why?
<amiti> _andrewtoth_: I'm curious to learn more about this too. the addrman is populated based on addr messages received, but I'm unfamiliar with the exact logic of how
<amiti> pinheadmz: but there's also the case where a victim has one malicious connection, and then receives solicited addr messages and then over time the adversary poisons the addrman and takes over all the connections
<rockzombie2> How successful would an eclipse attack be? because ultimately, wouldn't the attacker need to do a 51% attack on the whole network before the "eclipsed" node realizes they are not on the main chain?
<ecurrencyhodler> rockzomebie2: You only need enough hashpower to find a block to serve a malicious one. If your node is eclipsed, then they can take their sweet time generating the block.
<_andrewtoth_> the blocks wouldn't really be malicious with double spends, they would still have to be valid blocks, but they would be on a lower proof of work chain
<amiti> there's a lot of different attacks possible on an eclipsed node... I found the notes on the optech topics page interesting for highlighting some at the txn level that I hadn't thought about: https://bitcoinops.org/en/topics/eclipse-attacks/
<pinheadmz> embark: no yore right i think, there is the chain-width attack. starting from an old block, the attacker creates a dsihonest chain with fake timestamps that force the difficulty down
<jnewbery> embark: yes, theoretically if you can eclipse a node over a difficulty retarget, you could decrease the difficulty for their retarget, but you'd still need to do a lot of work to get to that difficulty retarget height. I feel like we're getting a little in the weeds of an unlikely and very expensive attack though
<pinheadmz> amiti: by saving the addr of two peers to disk, then on restart, reconnecting to them instead of picking addrs from the list that might belong to an attacker
<amiti> when we get a message about new addresses, feelers are a way of testing out those connections and then moving them to a "tried" table if they were successful. this prevents us from filling up our address tables with bogus info
<amiti> ecurrencyhodler: great question! this is a point that has been discussed in the PR conversation. lets defer for a few minutes and come back to it
<embark> Maybe: If a node only uses outbound connections it increasing the costs an eclipse attacker would need to marshal to shunt the node to their nodes
<nehan_> anchors are a bit dangerous if you get stuck with evil anchors. and the real issue with being eclipsed is not seeing new blocks, transactions are less concerning. but i don't see why it couldn't be txn+block connections as well...
<lightlike> you will stay connected to evil anchors forever (unless they disconnect), and they know it because all blocks-only connections will be anchors.
<ecurrencyhodler> So the assumption is that if your node crashes, one reason is because it was crashed remotely on purpose. So now your node will look for new anchor outputs.
<amiti> but in the existing implementation (use anchors for unclean restarts as well), we would not be able to start our node in the worst case scenario
<nothingmuch> currently it seems that DumpAnchors is called eagerly. an alternative would be to only save on clean shutdown, but that is problematic if the user never restarts.
<nehan_> hebasto: then I think your comment is incorrect... it looks (to me) like it claims it helps with an eclipse attacker who can cause a power shutdown to do a restart. it does not, because that is an unclean restart, so the anchors would be wiped.
<amiti> I've been thinking about this.. and I wonder if its desirable to be able to start your node if you're the victim of an eclipse attack? I think I'd rather not be able to start up because that would force me to figure out what's wrong
<luke-jr> also, the worst-case scenario isn't inability to restart: it's being prevented from syncing because your node is killed too quickly, but not until RPC calls are made
<amiti> luke-jr: right. the two options of implementation are 1. use anchors on every restart & 2. use anchors only on clean restarts. #1 has the issue you pointed out where the node is unable to function. #2 has less protection under an eclipse attack. right?
<amiti> so, the question I'm floating is ... which is more desirable? to me, #1 seems to make more sense because I wouldn't want to be able to operate if I'm eclipsed anyways
<nothingmuch> amiti: i think there are more options (what i meant to imply by my question). for example the anchor file could be a write ahead log with commitments on successful restarts so that it's not all or nothing
<nothingmuch> hence my question above - i'm not sure sensible lead time from an outbound block only seeming non malicious, but my intuition is that it should not depend on the user's restart habits
<embark> even if the attack isn't common or known, we know it's possibly so we don't want to enable an attack vector with design choices that deny its possibility
<luke-jr> it might or might not be worth the extra logic to do this - perhaps better to save for another PR unless someone thinks of a reason it's critical
<amiti> pinheadmz: "use it against"? I think theres ways an eclipse attacker can take over the anchors, but it requires more work. do you think there's further exploits they could do with the anchors?