Delete anchors.dat after trying to connect to that peers (p2p)

Mar 2, 2022

https://github.com/bitcoin/bitcoin/pull/24034

Host: brunoerg - PR author: brunoerg

Notes

PR #17428 adds “anchors” to help prevent an eclipse attack where an attacker tricks a restarting node into connecting to adversarial peers. It saves block-relay-only connections to an anchors.dat file. When a node restarts, it tries to connect to the addresses from anchors.dat. We discussed this PR in a previous review club meeting.
Currently, anchors.dat is deleted right after it is read. However, if a node shuts down before trying to connect to that anchor peers, it will create an empty anchors.dat. Thus, an attacker could make a node shut down before it tries to connect to those peers, thereby clearing their anchors.dat and making the node connect to new peers (possible malicious ones) when starting again.
PR #24034 changes the behavior so that anchors.dat is only deleted after the node has tried to connect to the peers from it. So, if a node stops before trying to connect to anchor peers, the anchors.dat file will be preserved. Also, it avoids calling DumpAnchors if not all anchors.dat peers have not been tried.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?
What is an eclipse attack and how does #17428 help prevent it?
Why do we delete anchors.dat? Why not just read and write?
There are two ways to shutdown a node: a “clean” way and “unclean” one. What does that mean?
On current master, what happens if we shut down (cleanly or uncleanly) before the node has tried to connect to the peers from anchors.dat?
In this PR, in which scenarios do we delete anchors.dat?
Avoiding deletion of anchors.dat file before trying to connect to all peers from it is not enough to preserve it in a clean shutdown. Why? How does the PR deal with this?

Meeting Log

17:00

<brunoerg> #startmeeting

17:00

<svav> Hi

17:00

<erik-etsuji-kato> hi

17:00

<brunoerg> hi

17:00

<jaonoctus> hi

17:00

<theStack> hi

17:00

<Kaizen_Kintsugi_> hi

17:00

<brunoerg> Hello, everyone! Today we’re looking at #24034 (Delete anchors.dat after trying to connect to that peers)!

17:01

<brunoerg> Anyone here for the first time? :)

17:01

<larryruane> hi

17:01

<effexzi> Hi every1

17:01

<brunoerg> no one?

17:02

<bitcoin_1o1> hi all

17:02

<brunoerg> did anyone review the PR?

17:02

<brunoerg> y/n/partial?

17:02

<Kaizen_Kintsugi_> y

17:02

<erik-etsuji-kato> 0.76y

17:02

<jaonoctus> y

17:03

<larryruane> 0.5y

17:03

<glozow> hi

17:03

<bitplebpaul> conceptually Y

17:03

<svav> I read the notes and looked at some of the code

17:03

<theStack> y, concept ack

17:03

<brunoerg> ok, cool :)

17:03

<brunoerg> Before discussing about this PR, let's do a quick review!

17:04

<brunoerg> What is an eclipse attack? Could someone explain it?

17:04

<svav> In an eclipse attack, a malicious actor isolates a specific user or node within a peer-to-peer (P2P) network. The attacker’s goal is to obscure a user’s view of the P2P network in preparation for more complex attacks or to cause general disruption.

17:04

<bitcoin_1o1> y

17:04

<brunoerg> svav: great

17:04

<Kaizen_Kintsugi_> +1 svav

17:04

<svav> In an eclipse attack, an attacker tries to redirect the target network participant’s inbound and outbound connections from legitimate nodes to the attacker’s nodes. By doing so, the target is sealed off from the actual network.

17:04

<svav> Since the target is disconnected from the blockchain ledger, the isolated node can then be manipulated by the attacker. An eclipse attack can lead to block mining disruptions as well as illegitimate transaction confirmations.

17:05

<bitcoin_1o1> +1 svav

17:05

<larryruane> I've always been a little unclear on the difference between an eclipse attack and a sybil attack .. are these related?

17:05

<erik-etsuji-kato> + the attacker can see what transactions is been made by this peer, hurting privacy

17:05

<svav> One different is eclipse is one node and sybil is whole network

17:06

<svav> *difference*

17:06

<brunoerg> larryruane: great question

17:06

<larryruane> svav: do you mean one node is the victim in the eclipse attack? (or one node is the attacker?)

17:06

<santimena> #startmeeting hi

17:07

<erik-etsuji-kato> An eclipse attack is when most (if not all) of your peers are malicious and they basically prevent you from being well-connected to the network to obtain information about transactions you're interested in. An eclipse attack is particular useful when a payer has sent some bitcoins to you in some transaction, then decides to also doublespend the same bitcoins.

17:07

<bitplebpaul> one node is the victim

17:07

<bitplebpaul> i believe

17:07

<glozow> in my head i consider “sybil attacking” to refer to the technique used, ie making lots of sybils, and “eclipse” as the outcome to the victim of an attack.

17:08

<brunoerg> I think in Sybil the attacker tries to take control of the network by creating multiple peers

17:08

<glozow> but this is my imaginary distinction and not scientific haha

17:08

<ls55> I think eclipse attack implies isolate the victim in a subnet controlled by the attacker. In sybil attack, there isn't his concept.

17:08

<svav> Sybil Attack is a type of attack seen in peer-to-peer networks in which a node in the network operates multiple identities actively at the same time and undermines the authority/power in reputation systems. The main aim of this attack is to gain the majority of influence in the network to carry out illegal(with respect to rules and laws set in the

17:08

<svav> network) actions in the system.

17:08

<svav> A single entity(a computer) has the capability to create and operate multiple identities(user accounts, IP address based accounts). To outside observers, these multiple fake identities appear to be real unique identities.

17:08

<sipa> I think they are distinct concepts.

17:08

<erik-etsuji-kato> A sybil attack on the other hand is where a malicious actor is trying to spam the network with nodes that they control attempting to subvert the network's reputation system.

17:08

<erik-etsuji-kato> https://bitcoin.stackexchange.com/questions/61151/eclipse-attack-vs-sybil-attack

17:08

<sipa> The goal of an eclipse attack is making a node isolated (i.e., no honest peers).

17:09

<sipa> The goal of a Sybil attack is exploiting a reputation mechanism where a node e.g. believes most of its peers.

17:09

<svav> So eclipse, one node is surrounded by malicious info, in sybil, one person impersonates lots of nodes

17:09

<ls55> Exact

17:09

<brunoerg> +1 sipa

17:09

<sipa> I think in early Bitcoin history, the two were often conflated.

17:10

<brunoerg> good explanations, any more questions about this?

17:10

<sipa> And the term Sybil attack was used to what should really be called eclipse attacks. Sybil attackd generally don't apply to Bitcoin's P2P network, because we never do something like believing the majority of our peers.

17:10

<sipa> It suffices to have one honest peer.

17:11

<Kaizen_Kintsugi_> So #17428 prevents this by using the anchors.dat file to reconnect to honest peers

17:11

<Kaizen_Kintsugi_> ?

17:11

<larryruane> sipa: +1 yes because of proof-of-work

17:11

<svav> larryruane: I mean one node is the victim in an eclipse attack.

17:11

<Kaizen_Kintsugi_> so when the node restarts it can't be automatically surrounded by the attacker?

17:12

<ls55> `So #17428 prevents this by using the anchors.dat file to reconnect to honest peers` How does the node consider a peer honest ? What are the criteria?

17:12

<Kaizen_Kintsugi_> Most work?

17:12

<bitplebpaul> +1 question of ls55

17:13

<Kaizen_Kintsugi_> or is it the the type of node

17:13

<Kaizen_Kintsugi_> only block relay nodes are saved as anchors?

17:13

<brunoerg> not sure if i agree with "honest"

17:13

<Kaizen_Kintsugi_> yea, "honest" is a vague term, I'm using it to describe a node that is acting in good faith

17:14

<brunoerg> Kaizen_Kintsugi_: yes, only block-relay-only

17:14

<erik-etsuji-kato> I think we take a long-lived and stable connection we had during operation

17:16

<brunoerg> we mentioned here "anchors.dat", Kaizen said that `#17428 prevents this by using the anchors.dat file to reconnect to honest peers`, but how it works? what we save into anchors.dat? when?

17:16

<Kaizen_Kintsugi_> Anchors are last known outgoing block-relay-only peers that are tried to re-connect to on startup

17:17

<larryruane> Really basic Q: The earlier PR that added anchors.dat, 17428, could it have included the changes that this PR (24034) is making, but didn't due to just an oversight (or ease of programming)? Or was 17428's behavior intentional, and then only later it was understood that 24034's behavior would be an improvement?

17:17

<svav> This PR helps prevent an eclipse attack because it prevents the possibility of there being an empty anchors.dat file on restart, that a malicious user could use as part of a a restart-based eclipse attack.

17:17

<ls55> anchors.dat is written when the node starts and shuts down

17:18

<Kaizen_Kintsugi_> and anchors seems to be simply last known block replays

17:18

<Kaizen_Kintsugi_> err *relays

17:19

<svav> This PR prevents this - A restart-based eclipse attack occurs when the adversary is able to add its own addresses to the victim’s address manager and then force the victim to restart. If the attack succeeds, the victim will make all of its connections to the adversary’s addresses when it restarts.

17:19

<brunoerg> larryruane: I didn't follow the discussions in 17428 but i think 24034's behavior would be an improvement

17:20

<larryruane> "... then force the victim to restart ..." That's something else unclear to me, how can an attacker for the victim to restart?

17:20

<brunoerg> larryduane: DoS?

17:20

<larryruane> *force

17:20

<brunoerg> sorry, larryruane*

17:21

<larryruane> ok but, not that we're aware of a specific way? Just that there *may* be a way (to force a restart)?

17:21

<brunoerg> larryruane: or maybe social engineering?

17:21

<Kaizen_Kintsugi_> what would the DoS be? Spamming the node with bad transactions? Or is there an assumption that their could be something out there?

17:21

<larryruane> oh right, good point brunoerg !

17:22

<Kaizen_Kintsugi_> ah okay okay, I get it know,

17:22

<Kaizen_Kintsugi_> attacks can be multi-pronged

17:22

<ls55> if a peer spams bad transactions, it will be blocked automatically.

17:22

<larryruane> or (to further answer my own question) maybe an attacker somehow can cut power to the victim?

17:23

<larryruane> (not any attacker could do that obviously, but maybe one can)

17:23

<svav> Just on my previous comment, which is slightly incorrect - I think currently the weakness is that an attacker could force node to shutdown before it attempts anchors.dat connection, creating blank anchors.dat, which can then be repopulated by attacker before node restarted - I think

100

17:23

<brunoerg> larryruane: it's a possibility

101

17:23

<theStack> the DoS doesn't have to be bitcoin-specific, it could target another service running on the same or the OS in general (e.g. ping flood or alike) that causes it to crash or needing to restart

102

17:23

<erik-etsuji-kato> +ls55: yes, but what if you make it with tons of machines? Verifying transactions ain't cheap

103

17:23

<larryruane> theStack: +1

104

17:24

<brunoerg> theStack: +1, excellent

105

17:24

<Kaizen_Kintsugi_> theStack: thx

106

17:24

<bitcoin_1o1> theStack: +1

107

17:24

<Kaizen_Kintsugi_> so this leads in to the next question of the clean and unclean shutdown?

108

17:24

<Kaizen_Kintsugi_> an attack or interuption would be an unclean shutdown?

109

17:25

<brunoerg> yes!

110

17:25

<erik-etsuji-kato> yes, unclean

111

17:25

<brunoerg> but before it, we can discuss why anchor.dat is deleted?

112

17:25

<brunoerg> anchors.dat*

113

17:25

<larryruane> So to me, even on a very superficial level, we can know that this PR deserves a concept ack because if it's a good idea to persist some data across restarts, then it's good to also do so in the case of a *quick* restart ... is that legitimate reasoning?

114

17:25

<erik-etsuji-kato> Because if we have an unclean shutdown, we don't reuse anchors?

115

17:25

<brunoerg> when a node starts, it deletes the anchors.dat file, why?

116

17:26

<Kaizen_Kintsugi_> my only guess is to make sure that it you are starting from a clean anchors file

117

17:26

<ls55> erik-etsuji-kato yes, ddos is more difficult to handle, but it is also more expensive for the attacker.

118

17:26

<Kaizen_Kintsugi_> like if you reinstalled and there was a very old anchors.data with very old data in it

119

17:27

<ls55> `when a node starts, it deletes the anchors.dat file, why?`To ensure the anchor peers are up to date

120

17:27

<svav> This PR could do with a detailed explanation of the lifecycle of anchors.dat, i.e. when it is created, read/write and deleted.

121

17:28

<brunoerg> larryruane: I think so. I discovered this 'issue' after a quick restart, I just noticed in my terminal that it read 2 anchors from anchors.dat and right after it dumped 0 ones

122

17:28

<brunoerg> ls55: yes, to keep it up to date

123

17:29

<brunoerg> So, couldn't we only read and write?

124

17:29

<Kaizen_Kintsugi_> so node starts, reads anchors, deletes, then dumps anchors

125

17:29

<Kaizen_Kintsugi_> on shut down

126

17:29

<ls55> Exact

127

17:30

<Kaizen_Kintsugi_> brunoerg, no because we want to clear the old anchors

128

17:30

<Kaizen_Kintsugi_> ?

129

17:30

<brunoerg> we could clear it by reading and writing... couldn't we?

130

17:31

<brunoerg> we can clear a db or a file without deleting it

131

17:31

<Kaizen_Kintsugi_> I guess we could, but I think that would open up to more complexity? I can't think of a specific reason? Is there something that reads the creation date of the anchor file?

132

17:31

<brunoerg> clean*

133

17:32

<Kaizen_Kintsugi_> is a clean file read as empty?

134

17:32

<ls55> `So, couldn't we only read and write? `It is simpler to delete the file.

135

17:32

<Kaizen_Kintsugi_> ls55, that is the logic that I am arriving at.

136

17:32

<larryruane> ".. clear it by reading and writing .." There is a subtle difference between deleting a file and truncating it to zero-length ... if the user changes permissions on the file, then it gets deleted and re-created, its permissions will revert ... also hard-link behavior is slightly different (not that that matters much)

137

17:33

<Kaizen_Kintsugi_> oh sweet

138

17:33

<brunoerg> larryruane: nice!

139

17:33

<jaonoctus> larryruane: +1

140

17:33

<Kaizen_Kintsugi_> I didn't think of file permmissions coming into play

141

17:33

<bitplebpaul> sorry, when would the permissions be changed? under what conditions

142

17:33

<bitplebpaul> +1 Kaizen

143

17:34

<Kaizen_Kintsugi_> I think the logic is being prepared for whatever condition could happen

144

17:34

<Kaizen_Kintsugi_> we don't know why someone would change permissions of the file, but they can

145

17:34

<ls55> The file is created by the `bitcoind` process. I think the permissions will be inherited.

146

17:34

<larryruane> bitplebpaul: if the bitcoind node creates the file with (let's say) mode 0755, and the user changes it to 0777 for some reason, then bitcoind deletes and recreates the file, it goes back to 0755

147

17:34

<Kaizen_Kintsugi_> so there is an open vunderability that needs to be addressed

148

17:35

<Kaizen_Kintsugi_> damn I learn so much at these things damn

149

17:35

<ls55> Why would that be a vulnerability?

150

17:35

<larryruane> (or the owner or group or ACLs could be manually changed)

151

17:36

<bitplebpaul> it sounds like resetting to 0755 is avoiding a vulnerability

152

17:36

<Kaizen_Kintsugi_> yea

153

17:36

<ls55> This file is made to be ephemeral

154

17:36

<bitplebpaul> sorry ACLs?

155

17:36

<Kaizen_Kintsugi_> yea if the person is on a computer that doesn't belong to them and the administrator does something to .dat files

156

17:36

<larryruane> bitplebpaul: well that's just an example ... presumably if the user (owner of the machine) changes permissions, there's some good reason for that (i guess!)

157

17:36

<bitplebpaul> access control list**

158

17:37

<bitplebpaul> got it

159

17:37

<bitplebpaul> thx

160

17:38

<brunoerg> Kaizen_Kintsugi_: well, if someone can access my computer, he could replace anchors.dat, not sure if changing the permission would be the biggest problem

161

17:39

<larryruane> the return value (indicating success or failure) from `fs::remove()` is ignored, which is ok (we're just making a best-effort), but maybe add a comment to that effect, so readers don't think it was just an oversight? (this is more of a comment i should leave on the PR)

162

17:40

<Kaizen_Kintsugi_> brunoerg: agreed, under those conditions, thats probably the least of your worries :)

163

17:41

<brunoerg> cool! can we go to the next question?

164

17:42

<erik-etsuji-kato> y

165

17:42

<theStack> larryruane: or as alternative, casting the return value to (void)? though i am not sure if this is in line with our coding guidelines, i didn't see it much

166

17:42

<larryruane> theStack: +1 that does indicate that you're aware of the return value

167

17:43

<jaonoctus> brunoerg: y

168

17:43

<brunoerg> There are two ways to shutdown a node: a “clean” way and “unclean” one. What does that mean?

169

17:43

<Kaizen_Kintsugi_> clean is uninterupted

170

17:43

<svav> Clean is an operator requested shutdown where bitcoind can go through all its shutdown processes. Unclean is an inadvertent shutdown.

171

17:43

<Kaizen_Kintsugi_> unclean is forced

172

17:44

<theStack> svav: +1

173

17:44

<Kaizen_Kintsugi_> I like svav's description better

174

17:44

<Kaizen_Kintsugi_> +1 svav

175

17:44

<brunoerg> svav: +1

176

17:44

<erik-etsuji-kato> svav: +1

177

17:45

<brunoerg> Now that we have a good definition about "clean" and "unclean" shutdown, we can discuss the next question...

178

17:45

<ls55> "Clean" mode implies that all data is stored properly (UTXO Set, Blockchain, anchors.dat, etc ...) before shutdown

179

17:46

<larryruane> good answers, i think of clean as "flushing in-memory data to disk" (unclean, not)

180

17:46

<brunoerg> good ones

181

17:46

<brunoerg> So...

182

17:46

<brunoerg> On current master, what happens if we shut down (cleanly or uncleanly) before the node has tried to connect to the peers from anchors.dat?

183

17:46

<svav> An empty anchors.dat is created.

184

17:47

<brunoerg> svav: in both scenarios?

185

17:47

<svav> Not sure

186

17:47

<larryruane> brunoerg: if the node has had a chance to read `anchors.dat` then that file is deleted

187

17:47

<ls55> Exact

188

17:47

<bitcoin_1o1> svav: +1

189

17:47

<erik-etsuji-kato> If we shutdown cleanly, then anchors.dat is created again

190

17:48

<erik-etsuji-kato> I guess

191

17:48

<brunoerg> erik-etsuji-kato: +1

192

17:48

<brunoerg> larryruane: yes and it happens in both scenarios

193

17:50

<bitplebpaul> how long after opening bitcoin core does it take to connect to the peers on anchors.dat

194

17:50

<bitplebpaul> usually*

195

17:50

<bitplebpaul> ie what time window are we discussing? 2-10 seconds?

196

17:51

<larryruane> bitplebpaul: +1 I was wondering this too!

197

17:51

<bitplebpaul> cool! i was worried o was only being tangentially relevant

198

17:51

<brunoerg> bitplebpaul: i don't know exactly, but I don't think it takes more than 10 seconds

199

17:51

<Kaizen_Kintsugi_> I think the old way, there is a window of opportunity to connect to the attackers nodes if that anchors.dat file is empty

200

17:53

<brunoerg> We can go to the next question that is related to this one

201

17:53

<Kaizen_Kintsugi_> It seems like nodes just start connecting

202

17:53

<brunoerg> In this PR, in which scenarios do we delete anchors.dat?

203

17:53

<erik-etsuji-kato> If we actually tried those peers

204

17:54

<bitplebpaul> what if we only tried some peers and then force/unclean close?

205

17:54

<theStack> after we read it, and after we tried all of its peers?

206

17:54

<ls55> After `m_anchors` is no longer empty.

207

17:55

<bitcoin_1o1> +1 theStack

208

17:56

<erik-etsuji-kato> bitplebpaul: Then it's not deleted

209

17:56

<theStack> ah no, my first part was wrong; only after we read it and it's empty

210

17:56

<Kaizen_Kintsugi_> Yea I think it only gets deleted after it's read

211

17:56

<brunoerg> theStack: +1

212

17:56

<Kaizen_Kintsugi_> I'm reading around line 1975, cant find a delete

213

17:56

<svav> So, node starts, anchors.dat exists, it is read, anchors.dat deleted - all ok. But if node starts, anchors.dat exists and populated, it does not get read, then node forced shut down, anchors.dat is recreated at startup as an empty file - vulnerable to start-up eclipse attack .... is this correct???

214

17:57

<Kaizen_Kintsugi_> start > read > delete > connect

215

17:58

<brunoerg> No, in this PR we do: start > read > try to connect to all peers from the file > delete

216

17:58

<brunoerg> So, if we shut down our node before trying to connect to that peers, the file is preserved

217

17:59

<Kaizen_Kintsugi_> I understand now

218

17:59

<larryruane> quick question on testing... there is one small test change, but are you planning to add more tests? Or is it just very difficult to test these changes?

219

17:59

<brunoerg> of course, if the file is empty as theStack said, we delete it right after reading the file

220

18:00

<ls55> How does this PR know the anchor peers are connected ?

221

18:00

<larryruane> also wondering, is the one small test change just fixing a timing window (does not depend on this PR)?

222

18:00

<brunoerg> larryruane: Yes and i intend to use the stress test to test it

223

18:00

<larryruane> brunoerg: +1 perfect, looking forward to seeing how to do that!

224

18:00

<theStack> would be interested what happens if the file contains garbage (for example, due to power loss while writing). is it still deleted, as the deserialization leads to an empty vector, or is there some other exception thrown?

225

18:00

<larryruane> (you mean fuzz?)

226

18:01

<bitcoin_1o1> brunoerg: what about functional tests?

227

18:01

<larryruane> theStack: great question

228

18:01

<brunoerg> larryruane: no, see feature_init.py into functional tests