Delete anchors.dat after trying to connect to that peers (p2p)

https://github.com/bitcoin/bitcoin/pull/24034

Host: brunoerg  -  PR author: brunoerg

Notes

  • PR #17428 adds “anchors” to help prevent an eclipse attack where an attacker tricks a restarting node into connecting to adversarial peers. It saves block-relay-only connections to an anchors.dat file. When a node restarts, it tries to connect to the addresses from anchors.dat. We discussed this PR in a previous review club meeting.

  • Currently, anchors.dat is deleted right after it is read. However, if a node shuts down before trying to connect to that anchor peers, it will create an empty anchors.dat. Thus, an attacker could make a node shut down before it tries to connect to those peers, thereby clearing their anchors.dat and making the node connect to new peers (possible malicious ones) when starting again.

  • PR #24034 changes the behavior so that anchors.dat is only deleted after the node has tried to connect to the peers from it. So, if a node stops before trying to connect to anchor peers, the anchors.dat file will be preserved. Also, it avoids calling DumpAnchors if not all anchors.dat peers have not been tried.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. What is an eclipse attack and how does #17428 help prevent it?

  3. Why do we delete anchors.dat? Why not just read and write?

  4. There are two ways to shutdown a node: a “clean” way and “unclean” one. What does that mean?

  5. On current master, what happens if we shut down (cleanly or uncleanly) before the node has tried to connect to the peers from anchors.dat?

  6. In this PR, in which scenarios do we delete anchors.dat?

  7. Avoiding deletion of anchors.dat file before trying to connect to all peers from it is not enough to preserve it in a clean shutdown. Why? How does the PR deal with this?

Meeting Log

  117:00 <brunoerg> #startmeeting
  217:00 <svav> Hi
  317:00 <erik-etsuji-kato> hi
  417:00 <brunoerg> hi
  517:00 <jaonoctus> hi
  617:00 <theStack> hi
  717:00 <Kaizen_Kintsugi_> hi
  817:00 <brunoerg> Hello, everyone! Today we’re looking at #24034 (Delete anchors.dat after trying to connect to that peers)!
  917:01 <brunoerg> Anyone here for the first time? :)
 1017:01 <larryruane> hi
 1117:01 <effexzi> Hi every1
 1217:01 <brunoerg> no one?
 1317:02 <bitcoin_1o1> hi all
 1417:02 <brunoerg> did anyone review the PR?
 1517:02 <brunoerg> y/n/partial?
 1617:02 <Kaizen_Kintsugi_> y
 1717:02 <erik-etsuji-kato> 0.76y
 1817:02 <jaonoctus> y
 1917:03 <larryruane> 0.5y
 2017:03 <glozow> hi
 2117:03 <bitplebpaul> conceptually Y
 2217:03 <svav> I read the notes and looked at some of the code
 2317:03 <theStack> y, concept ack
 2417:03 <brunoerg> ok, cool :)
 2517:03 <brunoerg> Before discussing about this PR, let's do a quick review!
 2617:04 <brunoerg> What is an eclipse attack? Could someone explain it?
 2717:04 <svav> In an eclipse attack, a malicious actor isolates a specific user or node within a peer-to-peer (P2P) network. The attacker’s goal is to obscure a user’s view of the P2P network in preparation for more complex attacks or to cause general disruption.
 2817:04 <bitcoin_1o1> y
 2917:04 <brunoerg> svav: great
 3017:04 <Kaizen_Kintsugi_> +1 svav
 3117:04 <svav> In an eclipse attack, an attacker tries to redirect the target network participant’s inbound and outbound connections from legitimate nodes to the attacker’s nodes. By doing so, the target is sealed off from the actual network.
 3217:04 <svav> Since the target is disconnected from the blockchain ledger, the isolated node can then be manipulated by the attacker. An eclipse attack can lead to block mining disruptions as well as illegitimate transaction confirmations.
 3317:05 <bitcoin_1o1> +1 svav
 3417:05 <larryruane> I've always been a little unclear on the difference between an eclipse attack and a sybil attack .. are these related?
 3517:05 <erik-etsuji-kato> + the attacker can see what transactions is been made by this peer, hurting privacy
 3617:05 <svav> One different is eclipse is one node and sybil is whole network
 3717:06 <svav> *difference*
 3817:06 <brunoerg> larryruane: great question
 3917:06 <larryruane> svav: do you mean one node is the victim in the eclipse attack? (or one node is the attacker?)
 4017:06 <santimena> #startmeeting hi
 4117:07 <erik-etsuji-kato> An eclipse attack is when most (if not all) of your peers are malicious and they basically prevent you from being well-connected to the network to obtain information about transactions you're interested in. An eclipse attack is particular useful when a payer has sent some bitcoins to you in some transaction, then decides to also doublespend the same bitcoins.
 4217:07 <bitplebpaul> one node is the victim
 4317:07 <bitplebpaul> i believe
 4417:07 <glozow> in my head i consider “sybil attacking” to refer to the technique used, ie making lots of sybils, and “eclipse” as the outcome to the victim of an attack.
 4517:08 <brunoerg> I think in Sybil the attacker tries to take control of the network by creating multiple peers
 4617:08 <glozow> but this is my imaginary distinction and not scientific haha
 4717:08 <ls55> I think eclipse attack implies isolate the victim in a subnet controlled by the attacker. In sybil attack, there isn't his concept.
 4817:08 <svav> Sybil Attack is a type of attack seen in peer-to-peer networks in which a node in the network operates multiple identities actively at the same time and undermines the authority/power in reputation systems. The main aim of this attack is to gain the majority of influence in the network to carry out illegal(with respect to rules and laws set in the
 4917:08 <svav> network) actions in the system.
 5017:08 <svav>  A single entity(a computer) has the capability to create and operate multiple identities(user accounts, IP address based accounts). To outside observers, these multiple fake identities appear to be real unique identities.
 5117:08 <sipa> I think they are distinct concepts.
 5217:08 <erik-etsuji-kato> A sybil attack on the other hand is where a malicious actor is trying to spam the network with nodes that they control attempting to subvert the network's reputation system.
 5317:08 <erik-etsuji-kato> https://bitcoin.stackexchange.com/questions/61151/eclipse-attack-vs-sybil-attack
 5417:08 <sipa> The goal of an eclipse attack is making a node isolated (i.e., no honest peers).
 5517:09 <sipa> The goal of a Sybil attack is exploiting a reputation mechanism where a node e.g. believes most of its peers.
 5617:09 <svav> So eclipse, one node is surrounded by malicious info, in sybil, one person impersonates lots of nodes
 5717:09 <ls55> Exact
 5817:09 <brunoerg> +1 sipa
 5917:09 <sipa> I think in early Bitcoin history, the two were often conflated.
 6017:10 <brunoerg> good explanations, any more questions about this?
 6117:10 <sipa> And the term Sybil attack was used to what should really be called eclipse attacks. Sybil attackd generally don't apply to Bitcoin's P2P network, because we never do something like believing the majority of our peers.
 6217:10 <sipa> It suffices to have one honest peer.
 6317:11 <Kaizen_Kintsugi_> So #17428 prevents this by using the anchors.dat file to reconnect to honest peers
 6417:11 <Kaizen_Kintsugi_> ?
 6517:11 <larryruane> sipa: +1 yes because of proof-of-work
 6617:11 <svav> larryruane: I mean one node is the victim in an eclipse attack.
 6717:11 <Kaizen_Kintsugi_> so when the node restarts it can't be automatically surrounded by the attacker?
 6817:12 <ls55> `So #17428 prevents this by using the anchors.dat file to reconnect to honest peers` How does the node consider a peer honest ? What are the criteria?
 6917:12 <Kaizen_Kintsugi_> Most work?
 7017:12 <bitplebpaul> +1 question of ls55
 7117:13 <Kaizen_Kintsugi_> or is it the the type of node
 7217:13 <Kaizen_Kintsugi_> only block relay nodes are saved as anchors?
 7317:13 <brunoerg> not sure if i agree with "honest"
 7417:13 <Kaizen_Kintsugi_> yea, "honest" is a vague term, I'm using it to describe a node that is acting in good faith
 7517:14 <brunoerg> Kaizen_Kintsugi_: yes, only block-relay-only
 7617:14 <erik-etsuji-kato> I think we take a long-lived and stable connection we had during operation
 7717:16 <brunoerg> we mentioned here "anchors.dat", Kaizen said that `#17428 prevents this by using the anchors.dat file to reconnect to honest peers`, but how it works? what we save into anchors.dat? when?
 7817:16 <Kaizen_Kintsugi_> Anchors are last known outgoing block-relay-only peers that are tried to re-connect to on startup
 7917:17 <larryruane> Really basic Q: The earlier PR that added anchors.dat, 17428, could it have included the changes that this PR (24034) is making, but didn't due to just an oversight (or ease of programming)? Or was 17428's behavior intentional, and then only later it was understood that 24034's behavior would be an improvement?
 8017:17 <svav> This PR helps prevent an eclipse attack because it prevents the possibility of there being an empty anchors.dat file on restart, that a malicious user could use as part of a a restart-based eclipse attack.
 8117:17 <ls55> anchors.dat is written when the node starts and shuts down
 8217:18 <Kaizen_Kintsugi_> and anchors seems to be simply last known block replays
 8317:18 <Kaizen_Kintsugi_> err *relays
 8417:19 <svav> This PR prevents this - A restart-based eclipse attack occurs when the adversary is able to add its own addresses to the victim’s address manager and then force the victim to restart. If the attack succeeds, the victim will make all of its connections to the adversary’s addresses when it restarts.
 8517:19 <brunoerg> larryruane: I didn't follow the discussions in 17428 but i think 24034's behavior would be an improvement
 8617:20 <larryruane> "... then force the victim to restart ..." That's something else unclear to me, how can an attacker for the victim to restart?
 8717:20 <brunoerg> larryduane: DoS?
 8817:20 <larryruane> *force
 8917:20 <brunoerg> sorry, larryruane*
 9017:21 <larryruane> ok but, not that we're aware of a specific way? Just that there *may* be a way (to force a restart)?
 9117:21 <brunoerg> larryruane: or maybe social engineering?
 9217:21 <Kaizen_Kintsugi_> what would the DoS be? Spamming the node with bad transactions? Or is there an assumption that their could be something out there?
 9317:21 <larryruane> oh right, good point brunoerg !
 9417:22 <Kaizen_Kintsugi_> ah okay okay, I get it know,
 9517:22 <Kaizen_Kintsugi_> attacks can be multi-pronged
 9617:22 <ls55> if a peer spams bad transactions, it will be blocked automatically.
 9717:22 <larryruane> or (to further answer my own question) maybe an attacker somehow can cut power to the victim?
 9817:23 <larryruane> (not any attacker could do that obviously, but maybe one can)
 9917:23 <svav> Just on my previous comment, which is slightly incorrect - I think currently the weakness is that an attacker could force node to shutdown before it attempts anchors.dat connection, creating blank anchors.dat, which can then be repopulated by attacker before node restarted - I think
10017:23 <brunoerg> larryruane: it's a possibility
10117:23 <theStack> the DoS doesn't have to be bitcoin-specific, it could target another service running on the same or the OS in general (e.g. ping flood or alike) that causes it to crash or needing to restart
10217:23 <erik-etsuji-kato> +ls55: yes, but what if you make it with tons of machines? Verifying transactions ain't cheap
10317:23 <larryruane> theStack: +1
10417:24 <brunoerg> theStack: +1, excellent
10517:24 <Kaizen_Kintsugi_> theStack: thx
10617:24 <bitcoin_1o1> theStack: +1
10717:24 <Kaizen_Kintsugi_> so this leads in to the next question of the clean and unclean shutdown?
10817:24 <Kaizen_Kintsugi_> an attack or interuption would be an unclean shutdown?
10917:25 <brunoerg> yes!
11017:25 <erik-etsuji-kato> yes, unclean
11117:25 <brunoerg> but before it, we can discuss why anchor.dat is deleted?
11217:25 <brunoerg> anchors.dat*
11317:25 <larryruane> So to me, even on a very superficial level, we can know that this PR deserves a concept ack because if it's a good idea to persist some data across restarts, then it's good to also do so in the case of a *quick* restart ... is that legitimate reasoning?
11417:25 <erik-etsuji-kato> Because if we have an unclean shutdown, we don't reuse anchors?
11517:25 <brunoerg> when a node starts, it deletes the anchors.dat file, why?
11617:26 <Kaizen_Kintsugi_> my only guess is to make sure that it you are starting from a clean anchors file
11717:26 <ls55> erik-etsuji-kato yes, ddos is more difficult to handle, but it is also more expensive for the attacker.
11817:26 <Kaizen_Kintsugi_> like if you reinstalled and there was a very old anchors.data with very old data in it
11917:27 <ls55> `when a node starts, it deletes the anchors.dat file, why?`To ensure the anchor peers are up to date
12017:27 <svav> This PR could do with a detailed explanation of the lifecycle of anchors.dat, i.e. when it is created, read/write and deleted.
12117:28 <brunoerg> larryruane: I think so. I discovered this 'issue' after a quick restart, I just noticed in my terminal that it read 2 anchors from anchors.dat and right after it dumped 0 ones
12217:28 <brunoerg> ls55: yes, to keep it up to date
12317:29 <brunoerg> So, couldn't we only read and write?
12417:29 <Kaizen_Kintsugi_> so node starts, reads anchors, deletes, then dumps anchors
12517:29 <Kaizen_Kintsugi_> on shut down
12617:29 <ls55> Exact
12717:30 <Kaizen_Kintsugi_> brunoerg, no because we want to clear the old anchors
12817:30 <Kaizen_Kintsugi_> ?
12917:30 <brunoerg> we could clear it by reading and writing... couldn't we?
13017:31 <brunoerg> we can clear a db or a file without deleting it
13117:31 <Kaizen_Kintsugi_> I guess we could, but I think that would open up to more complexity? I can't think of a specific reason? Is there something that reads the creation date of the anchor file?
13217:31 <brunoerg> clean*
13317:32 <Kaizen_Kintsugi_> is a clean file read as empty?
13417:32 <ls55> `So, couldn't we only read and write? `It is simpler to delete the file.
13517:32 <Kaizen_Kintsugi_> ls55, that is the logic that I am arriving at.
13617:32 <larryruane> ".. clear it by reading and writing .." There is a subtle difference between deleting a file and truncating it to zero-length ... if the user changes permissions on the file, then it gets deleted and re-created, its permissions will revert ... also hard-link behavior is slightly different (not that that matters much)
13717:33 <Kaizen_Kintsugi_> oh sweet
13817:33 <brunoerg> larryruane: nice!
13917:33 <jaonoctus> larryruane: +1
14017:33 <Kaizen_Kintsugi_> I didn't think of file permmissions coming into play
14117:33 <bitplebpaul> sorry, when would the permissions be changed? under what conditions
14217:33 <bitplebpaul> +1 Kaizen
14317:34 <Kaizen_Kintsugi_> I think the logic is being prepared for whatever condition could happen
14417:34 <Kaizen_Kintsugi_> we don't know why someone would change permissions of the file, but they can
14517:34 <ls55> The file is created by the `bitcoind` process. I think the permissions will be inherited.
14617:34 <larryruane> bitplebpaul: if the bitcoind node creates the file with (let's say) mode 0755, and the user changes it to 0777 for some reason, then bitcoind deletes and recreates the file, it goes back to 0755
14717:34 <Kaizen_Kintsugi_> so there is an open vunderability that needs to be addressed
14817:35 <Kaizen_Kintsugi_> damn I learn so much at these things damn
14917:35 <ls55> Why would that be a vulnerability?
15017:35 <larryruane> (or the owner or group or ACLs could be manually changed)
15117:36 <bitplebpaul> it sounds like resetting to 0755 is avoiding a vulnerability
15217:36 <Kaizen_Kintsugi_> yea
15317:36 <ls55> This file is made to be ephemeral
15417:36 <bitplebpaul> sorry ACLs?
15517:36 <Kaizen_Kintsugi_> yea if the person is on a computer that doesn't belong to them and the administrator does something to .dat files
15617:36 <larryruane> bitplebpaul: well that's just an example ... presumably if the user (owner of the machine) changes permissions, there's some good reason for that (i guess!)
15717:36 <bitplebpaul> access control list**
15817:37 <bitplebpaul> got it
15917:37 <bitplebpaul> thx
16017:38 <brunoerg> Kaizen_Kintsugi_: well, if someone can access my computer, he could replace anchors.dat, not sure if changing the permission would be the biggest problem
16117:39 <larryruane> the return value (indicating success or failure) from `fs::remove()` is ignored, which is ok (we're just making a best-effort), but maybe add a comment to that effect, so readers don't think it was just an oversight? (this is more of a comment i should leave on the PR)
16217:40 <Kaizen_Kintsugi_> brunoerg: agreed, under those conditions, thats probably the least of your worries :)
16317:41 <brunoerg> cool! can we go to the next question?
16417:42 <erik-etsuji-kato> y
16517:42 <theStack> larryruane: or as alternative, casting the return value to (void)? though i am not sure if this is in line with our coding guidelines, i didn't see it much
16617:42 <larryruane> theStack: +1 that does indicate that you're aware of the return value
16717:43 <jaonoctus> brunoerg: y
16817:43 <brunoerg> There are two ways to shutdown a node: a “clean” way and “unclean” one. What does that mean?
16917:43 <Kaizen_Kintsugi_> clean is uninterupted
17017:43 <svav> Clean is an operator requested shutdown where bitcoind can go through all its shutdown processes. Unclean is an inadvertent shutdown.
17117:43 <Kaizen_Kintsugi_> unclean is forced
17217:44 <theStack> svav: +1
17317:44 <Kaizen_Kintsugi_> I like svav's description better
17417:44 <Kaizen_Kintsugi_> +1 svav
17517:44 <brunoerg> svav: +1
17617:44 <erik-etsuji-kato> svav: +1
17717:45 <brunoerg> Now that we have a good definition about "clean" and "unclean" shutdown, we can discuss the next question...
17817:45 <ls55> "Clean" mode implies that all data is stored properly (UTXO Set, Blockchain, anchors.dat, etc ...) before shutdown
17917:46 <larryruane> good answers, i think of clean as "flushing in-memory data to disk" (unclean, not)
18017:46 <brunoerg> good ones
18117:46 <brunoerg> So...
18217:46 <brunoerg> On current master, what happens if we shut down (cleanly or uncleanly) before the node has tried to connect to the peers from anchors.dat?
18317:46 <svav> An empty anchors.dat is created.
18417:47 <brunoerg> svav: in both scenarios?
18517:47 <svav> Not sure
18617:47 <larryruane> brunoerg: if the node has had a chance to read `anchors.dat` then that file is deleted
18717:47 <ls55> Exact
18817:47 <bitcoin_1o1> svav: +1
18917:47 <erik-etsuji-kato> If we shutdown cleanly, then anchors.dat is created again
19017:48 <erik-etsuji-kato> I guess
19117:48 <brunoerg> erik-etsuji-kato: +1
19217:48 <brunoerg> larryruane: yes and it happens in both scenarios
19317:50 <bitplebpaul> how long after opening bitcoin core does it take to connect to the peers on anchors.dat
19417:50 <bitplebpaul> usually*
19517:50 <bitplebpaul> ie what time window are we discussing? 2-10 seconds?
19617:51 <larryruane> bitplebpaul: +1 I was wondering this too!
19717:51 <bitplebpaul> cool! i was worried o was only being tangentially relevant
19817:51 <brunoerg> bitplebpaul: i don't know exactly, but I don't think it takes more than 10 seconds
19917:51 <Kaizen_Kintsugi_> I think the old way, there is a window of opportunity to connect to the attackers nodes if that anchors.dat file is empty
20017:53 <brunoerg> We can go to the next question that is related to this one
20117:53 <Kaizen_Kintsugi_> It seems like nodes just start connecting
20217:53 <brunoerg> In this PR, in which scenarios do we delete anchors.dat?
20317:53 <erik-etsuji-kato> If we actually tried those peers
20417:54 <bitplebpaul> what if we only tried some peers and then force/unclean close?
20517:54 <theStack> after we read it, and after we tried all of its peers?
20617:54 <ls55> After `m_anchors` is no longer empty.
20717:55 <bitcoin_1o1> +1 theStack
20817:56 <erik-etsuji-kato> bitplebpaul: Then it's not deleted
20917:56 <theStack> ah no, my first part was wrong; only after we read it and it's empty
21017:56 <Kaizen_Kintsugi_> Yea I think it only gets deleted after it's read
21117:56 <brunoerg> theStack: +1
21217:56 <Kaizen_Kintsugi_> I'm reading around line 1975, cant find a delete
21317:56 <svav> So, node starts, anchors.dat exists, it is read, anchors.dat deleted - all ok. But if node starts, anchors.dat exists and populated, it does not get read, then node forced shut down, anchors.dat is recreated at startup as an empty file - vulnerable to start-up eclipse attack .... is this correct???
21417:57 <Kaizen_Kintsugi_> start > read > delete > connect
21517:58 <brunoerg> No, in this PR we do: start > read > try to connect to all peers from the file > delete
21617:58 <brunoerg> So, if we shut down our node before trying to connect to that peers, the file is preserved
21717:59 <Kaizen_Kintsugi_> I understand now
21817:59 <larryruane> quick question on testing... there is one small test change, but are you planning to add more tests? Or is it just very difficult to test these changes?
21917:59 <brunoerg> of course, if the file is empty as theStack said, we delete it right after reading the file
22018:00 <ls55> How does this PR know the anchor peers are connected ?
22118:00 <larryruane> also wondering, is the one small test change just fixing a timing window (does not depend on this PR)?
22218:00 <brunoerg> larryruane: Yes and i intend to use the stress test to test it
22318:00 <larryruane> brunoerg: +1 perfect, looking forward to seeing how to do that!
22418:00 <theStack> would be interested what happens if the file contains garbage (for example, due to power loss while writing). is it still deleted, as the deserialization leads to an empty vector, or is there some other exception thrown?
22518:00 <larryruane> (you mean fuzz?)
22618:01 <bitcoin_1o1> brunoerg: what about functional tests?
22718:01 <larryruane> theStack: great question
22818:01 <brunoerg> larryruane: no, see feature_init.py into functional tests
22918:01 <svav> Some diagrams about how anchors.dat currently works and will then work after this PR would have been nice ...
23018:02 <svav> or flow charts
23118:02 <brunoerg> our time is over :(
23218:02 <larryruane> brunoerg: thanks, this was great!
23318:02 <brunoerg> thanks everyone!
23418:03 <theStack> thanks for hosting brunoerg!
23518:03 <brunoerg> #endmeeting