Treat handshake misbehavior like unknown message (p2p)

Nov 18, 2020

https://github.com/bitcoin/bitcoin/pull/20079

Host: MarcoFalke - PR author: MarcoFalke

The PR branch HEAD was fac8e9a at the time of this review club meeting.

Notes

The Bitcoin P2P network is open for anyone to join and peers are generally assumed to be untrusted. A node can function as long as it has at least one well-behaved connection to the network to participate in block and transaction relay.
Bitcoin Core has several options for how to treat peers that violate the rules of the P2P protocol:
- ignore the individual message, but continue processing other messages from that peer;
- increment the peer’s “misbehavior” score, and punish the peer once its score goes above a certain amount
- disconnect from the peer
- disconnect from the peer and prevent any later connections from that peer’s address (discouragement)
Today’s pull request 20079 was created as a follow up to PR 19723, which changed how non-verack messages are treated when the peer hasn’t yet sent a verack message.

Questions

Did you review the PRs? Concept ACK, approach ACK, ACK, or NACK?
What message types can be sent/received during a handshake? Which are optional? In what order can they be sent?
What happens if the peer doesn’t complete the handshake?
How are incorrect messages from peers dealt with during the handshake? How does the pull request change that?
What are the risks or benefits of disconnecting a peer?
What are your thoughts on balancing the risks and benefits of disconnecting peers? What should be used as criteria to disconnect a peer?

Meeting Log

18:00

<MarcoFalke> #startmeeting

18:00

<jnewbery> hi!

18:00

<emzy> hi

18:00

<carlakc> hello!

18:00

<willcl_ark> hi

18:00

<tonyfabeen> hi

18:00

<pinheadmz> hi

18:00

<stacie> hi

18:00

<elle> hi!

18:00

<felixweis> hi

18:00

<dappdever> hi

18:00

<dhruvm> hi

18:00

<michaelfolkson> hi

18:00

<sergei-t> hi

18:01

<jonatack> hi

18:01

<nehan> hi

18:01

<hmrawal> hi

18:02

<MarcoFalke> Today we'll chat about peer handshake, and peer misbehavior. Reminder: Don't ask to ask, just ask your question :)

18:02

<MarcoFalke> Anyones first review club today?

18:02

<drainable> hi

18:03

<amiti> hi

18:03

<MarcoFalke> Ok, lets get started. Did you review the PRs (y/n)?

18:03

<elle> y

18:03

<dhruvm> y

18:03

<sergei-t> y

18:03

<nehan> y

18:03

<amiti> y

18:03

<felixweis> y

18:03

<carlakc> y

18:03

<stacie> y

18:03

<dappdever> y

18:03

<jnewbery> y

18:03

<MarcoFalke> nice. A lot of yes

18:03

<emzy> n

18:03

<pinheadmz> y

18:03

<jonatack> y, would like to propose peers wear a mask and avoid handshakes

18:04

<MarcoFalke> heh

18:04

<MarcoFalke> What message types can be sent/received during a handshake? Which are optional? In what order can they be sent?

18:04

<pinheadmz> 🤝 😷

18:04

<hmrawal> jonatack: :-D good one

18:04

<pinheadmz> MarcoFalke VERSION and VERACK; WTXIDRELAY is optional

18:05

<sergei-t> version - (verack+version) - verack. Not sure about the order in (version + verack)...

18:05

<dhruvm> Both peers in the connection must send a VERSION and receive a VERACK before other messages are accepted.

18:05

<MarcoFalke> The handshake is done by several functions, but here is a good start to begin your search: https://github.com/bitcoin/bitcoin/blob/50e019a97a5b49caee867ec7d630fca908caed9d/src/net_processing.cpp#L2274

18:05

<sergei-t> What is wtxidrelay?

18:05

<pinheadmz> sergei-t it's new!

18:05

<carlakc> and WTXIDRELAY needs to come before verack iiuc

18:05

<hmrawal> does verack mean version acknowledged ?

18:06

<felixweis> its a new feature where nodes relay tx via wtxid instead of txid

18:06

<pinheadmz> sergei-t peers can relay transactions by their witness txid

18:06

<pinheadmz> a tx hash that includes witness data

18:06

<pinheadmz> hmrawal yup

18:06

<felixweis> the hash over the full transaction not just the base version without the segwit stuff

18:06

<MarcoFalke> Indeed, wtxidrelay is a newly introduced protocol feature

18:06

<sergei-t> may be off topic, but why is wtxidrelay useful / better than regular txid relay?

18:07

<sipa> sergei-t: BIP 339

18:07

<pinheadmz> MarcoFalke pretty funny "start your search here"... link to 3,000 line function :-)

18:07

<felixweis> wtxidrelay is a marker to annouce to your peer, to negociate this ability for the connection

18:07

<MarcoFalke> pinheadmz: I wish we still had main.cpp. You could find everything in one file ;)

18:08

<felixweis> lolz

18:08

<pinheadmz> ha "yeah just check out the code here" ....main.cpp answers all your questions

18:08

<felixweis> no need to figure out how to exit vim all the time

18:08

<dhruvm> Why is wtxisrelay a message and not a service bit?

18:08

<dhruvm> wtxidrelay*

18:09

<carlakc> do SENDHEADERS and SENDCMPCT count as part of the handshake? they're optionally sent once we receive VERACK

18:09

<sipa> dhruvm: service bits are useful for feature discovery; he it's just a negotiation

18:09

<jnewbery> sergei-t dhruvm : it's probably a bit offtopic for this meeting. There's lots more information and context about wtxid relay in https://github.com/bitcoin/bitcoin/pull/18044

18:09

<dhruvm> jnewbery: thanks

18:09

<MarcoFalke> dhruvm: Good question. service bits are used when peering behavior is changed. wtxidrelay is just a toggle for tx relay and shouldn't influence peering

18:10

<jnewbery> and the issues that it solves are documented in https://github.com/bitcoin/bitcoin/issues/8279

18:10

<elle> carlakc: those are send only once handshake is complete i think. ie: after VERACK as you said

18:10

<elle> *sent

18:10

<dhruvm> MarcoFalke: i see. thanks.

18:10

<pinheadmz> also stuff like GETADDR and SENDCMPT kinda stuff

18:10

<pinheadmz> comes after handshake but is kinda part of the initial connection

18:11

<felixweis> you can't "wtxidrelay" it to older nodes too because you risk get disconnected (Misbehaving)

18:11

<carlakc> elle: got it, thanks!

18:11

<jonatack> dhruvm: there's even a review club for that https://bitcoincore.reviews/18044

18:11

<hmrawal> what's the order ? first Version followed by VERACK ?

18:11

<pinheadmz> hmrawal yeah VERSION-> VERACK<- VERSION<- VERACK-> I think

18:11

<sergei-t> If I receive a Version, do I first send my Version and then Verack or vice versa? Or t doesn't matter?

18:12

<hmrawal> and that's multiple VERSION/VERACK for multiples nodes I guess

18:12

<MarcoFalke> version must be the first message sent/received

18:12

<felixweis> the connecting node sends the first message

18:13

<sipa> for a lot of behavior around initiao connection (and the p2p protocol in general, especially parts that are old), there is no clear distinction between "what is part of the protocol" and "what bitcoin core does"

18:13

<sergei-t> MarcoFalke: so it's strictly Version1 - Version2 - Verack2 - Verack1?

18:13

<sipa> as the initial protocol was simply defined by how the reference client behaved

18:13

<sipa> and what it accepted

18:13

<MarcoFalke> So who wants to answer the full question? Hint: There are more message types than version/verack

18:14

<elle> sergei-t: i dont think the connection needs to happen both ways. that is optional. a peer may decide to connect to me but then i dont make an outgoing connection to them. i think

18:14

<dappdever> MarcoFalke: how do you define exactly what is part of the handshake?

18:14

<stacie> Do the rest of the message types handled in ProcessMessage() count as part of the handshake? Because there is a long list 😅

18:15

<michaelfolkson> Not formally defined in code dappdever. I would say any data sent once connection is formalized isn't part of handshake

18:15

<nehan> There is also SENDADDRV2

18:16

<pinheadmz> stacie hint: search for erros that throw a message with "...before version handshake"

18:16

<jonatack> nehan: yes, BIP155

18:17

<MarcoFalke> the handshake includes anything that involves feature negotiation

18:17

<dappdever> michaelfolkson: and a connection is formalized when fSuccessfullyConnected becomes true?

18:17

<MarcoFalke> so inv or tx wouldn't be part of the handshake (even if they might be sent out during one)

100

18:17

<sipa> MarcoFalke: also sendheaders and sendcmpct which carlakc mentiomed earlier in that case

101

18:18

<elle> Surely the only other optional message is wtxidrelay?

102

18:18

<MarcoFalke> sipa: I'd count those as features negotiation, so part of the handshake

103

18:19

<jnewbery> elle: the initial handshake does require a version message in both directions and a verack message in both directions. You may be getting confused with how the functional tests open multiple connections between nodes

104

18:19

<elle> ok cool, my bad. Sorry sergei-t and carlakc! im spreading false info here

105

18:20

<MarcoFalke> Ok, I think we covered all message types during the handshake

106

18:20

<sipa> MarcoFalke: yeah; i think people here earlier suggested that it wasn't because those are after verack

107

18:20

<sipa> just a semantics issue

108

18:20

<sergei-t> So what's the correct answer? to question 2

109

18:20

<carlakc> are sendheaders and sendcmpct also optional? because we only send them if the peer version supports them

110

18:20

<nehan> carlakc: yes

111

18:20

<MarcoFalke> Let me try to summarise: version (must be first message sent/received, required), wtxidrelay (sent/received after version, before verack, optional), verack (sent/received after version,required), sendaddrv2/sendheaders/sendcmpct (sent/received after verack,optional)

112

18:20

<pinheadmz> cant SENDCMPCT be sent at any time ?

113

18:21

<MarcoFalke> pinheadmz: Right

114

18:21

<pinheadmz> so its an early message but to me the "handshake" is just the two required messages

115

18:21

<MarcoFalke> though, must be after verack

116

18:21

<pinheadmz> er, and then wtxidrelay yeah

117

18:22

<nehan> MarcoFalke: so are you defining the "handshake" as first VERSION to last VERACK, or first VERSION to complete feature negotiation?

118

18:22

<MarcoFalke> ok, I think this question wasn't well defined because handshake is unclear

119

18:22

<MarcoFalke> nehan: I'd say until feature negotiation is complete

120

18:22

<nehan> ok, I think that was the confusion.

121

18:22

<MarcoFalke> It is a long and awkward handshake

122

18:22

<elle> my assumption was that once pfrom.fSuccessfullyConnected is true, then the handshake is complete?

123

18:23

<anir> that's what I understood^

124

18:23

<felixweis> thats expected, the nodes are just starting to get to know each other

125

18:23

<michaelfolkson> You can define handshake however you want. It is a word. It isn't code :)

126

18:23

<MarcoFalke> Ok, so my understanding of handshake is wrong ;). Moving on ...

127

18:23

<MarcoFalke> What happens if the peer doesn’t complete the handshake?

128

18:23

<sipa> maybe we can call the steps until verack "handshake" and the colection of all messages that determine features "feature negotiation", and these two only partially overlap (wtxidrelay)

129

18:24

<MarcoFalke> Oh, I see. This question only makes sense when the handshake is done after the verack msg

130

18:24

<willcl_ark> It looks like, after the PR, we just fall back to timing out after `DEFAULT_PEER_CONNECT_TIMEOUT = 60` seconds?

131

18:24

<pinheadmz> we ignore any other messages / bump misbehave score befoer this PR, and eventually the peer probably times out

132

18:24

<stacie> Per comments in PR #19723 they are disconnected after 60 seconds/99 unsupported/non-handshake messages (the messages part of the previous question). Can someone help direct me to where this is in the code?

133

18:24

<hmrawal> they are disconnected

134

18:24

<MarcoFalke> can anyone link to the code?

135

18:25

<willcl_ark> src/net.cpp#CConnman::InactivityCheck

136

18:26

<nehan> pre-PR, if nVersion hasn't been set, add to the misbehaving score. if the nVersion was set but fSuccessfullyConnected wasn't true yet, it looks like it just logs.

137

18:26

<MarcoFalke> willcl_ark: There are several inactivity checks. Which one would be hit?

138

18:26

<pinheadmz> stacie i thikn the old 99 value comes from the misbehavior score getting +1'ed and then we ban at 100

139

18:27

<willcl_ark> I think `else if (!pnode->fSuccessfullyConnected)`

140

18:27

<dhruvm> https://github.com/bitcoin/bitcoin/blob/5bcae7967f73353aff5e6d2f696bbf47ec6fdbb3/src/net.cpp#L1257

141

18:27

<stacie> pinheadmz oh! that makes sense. Especially since incrementing misbehaving score is what was removed from this PR

142

18:28

<MarcoFalke> willcl_ark: dhruvm: right

143

18:28

<pinheadmz> and theres a few other Misbehaving() calls, e.g. "non-connecting headers" is a score of 20 etc

144

18:28

<MarcoFalke> How are incorrect message from peers dealt with during the handshake? How does the pull request change that?

145

18:29

<dhruvm> prior to PR incorrect messages would contribute to themisbehavior score, now they won't

146

18:29

<hmrawal> earlier it used to add to the misbehaving score after PR it will jus tlog

147

18:29

<emzy> How can tor nodes be baned? As I understand they have no source address.

148

18:29

<nehan> i think my answer above addresses the first part. the PR changes it by no longer adding to the misbehaving score. I don't see how timeouts are involved though...

149

18:29

<dhruvm> however, since the 60 seconds timeout will disconnect the peer anyway, it seems unnecessary to punish them

150

18:30

<felixweis> emzy: you can still get disconnected if your score reaches 100

151

18:30

<dappdever> it is more likely that the peer would timeout before having a score reach 100

152

18:30

<sipa> emzy: they're just disconnected

153

18:31

<jnewbery> Does everyone see that? In the InactivityCheck() function: https://github.com/bitcoin/bitcoin/blob/50e019a97a5b49caee867ec7d630fca908caed9d/src/net.cpp#L1235-L1257

154

18:31

<jnewbery> That top if statement with nTimeConnected means we don't do any of these checks until we've been connected to the peer for 60 seconds. The final else if statement with fSuccessfullyConnected means that if the handshake hasn't completed in that time then we'll disconnect.

155

18:31

<michaelfolkson> emzy sipa: Tor nodes aren't subject to the these misbehaving scores? Just disconnected at first sign of misbehaving?

156

18:32

<nehan> jnewbery: thanks!

157

18:32

<MarcoFalke> The InactivityCheck is called by Connman for every node in a loop

158

18:32

<MarcoFalke> jnewbery: thanks for clarifying

159

18:32

<sergei-t> How often in the inactivity check performed?

160

18:33

<sipa> michaelfolkson: they're subject to misbehavior and disconnection, not discouraging

161

18:33

<MarcoFalke> m_peer_connect_timeout is 60 seconds by default

162

18:33

<jnewbery> sergei-t: every loop of the socket handler thread

163

18:33

<sipa> (discouraging is the new name of the auto-banning mechanism)

164

18:33

<MarcoFalke> every loop for every peer

165

18:33

<carlakc> jnewbery: this one https://github.com/bitcoin/bitcoin/blob/50e019a97a5b49caee867ec7d630fca908caed9d/src/net.cpp#L1544?

166

18:33

<jnewbery> look for where InactivityCheck() is called

167

18:34

<emzy> sipa: so only option would be a tarpit. To slow them down.

168

18:34

<felixweis> https://github.com/bitcoin/bitcoin/blob/50e019a97a5b49caee867ec7d630fca908caed9d/src/net.cpp#L1544

169

18:34

<sergei-t> I mean, how often is a loop in seconds? Does this question make sense?

170

18:34

<felixweis> everytime there is some activity on the socket. can be multiple times in a second

171

18:35

<jnewbery> carlakc: exactly!

172

18:35

<pinheadmz> sergei-t yeah because the check starts with GetSystemTimeInSeconds() so that loop can run infintiely, it just checks against the clock

173

18:35

<pinheadmz> ultimately there is a while(true) loop that iterates through all peers as long as bitcoind is alive

174

18:36

<dhruvm> sergei-t: If I am reading this right, it is called whenever anything is received on the socket?

175

18:36

<sergei-t> if there is no activity in the socket, the inactivity check must still run, right? Otherwise it fails to detect inactivity :)

176

18:37

<dhruvm> sergei-t: Sorry read the indentation wrong: It's run just cycling through the peers.

177

18:37

<dhruvm> https://github.com/bitcoin/bitcoin/blob/5bcae7967f73353aff5e6d2f696bbf47ec6fdbb3/src/net.cpp#L1544

178

18:37

<felixweis> at some point there will be a new incomming connection or another peer sends us an inv or something

179

18:37

<sergei-t> felixweis: aa, and this would trigger inactivity checks for _all_ peers, not only the one that sent us something?

180

18:38

<jnewbery> sergei-t: the inactivity check will happen for all peers in every loop, not just ones that are active

181

18:38

<felixweis> thats how I currently understand CConnman::SocketHandler, pls correct me

182

18:38

<sergei-t> jnewbery: that makes it much clearer, thanks! and what if all peers are inactive?

183

18:39

<willcl_ark> Looks like CConnman::ThreadSocketHandler just loops continuously running SocketHandler with no interval time?

184

18:39

<pinheadmz> yup

185

18:39

<dhruvm> willcl_ark: that's how i understand it now as well

186

18:39

<pinheadmz> checks each peer one at a time for incoming messages, or if we need to send an outgoing message back

187

18:39

<sergei-t> willcl_ark: loops continiously means _many_ times per second? isn't it a bit... wasteful?

188

18:40

<felixweis> if all the peers are inactive and there is no attempt for a new peer to connect (very unlikly), there is no cpu wasted so no need to disco

189

18:40

<jnewbery> willcl_ark: yes, I can't see any sleeps in the socket handler loop

190

18:40

<pinheadmz> sergei-t pretty sure it gets slowed down by all the things it does for each peer

191

18:40

<sipa> there is a 10ms delay somewhere iirc

192

18:40

<pinheadmz> sipa oh!

193

18:40

<MarcoFalke> I can't see any sleeps either

194

18:40

<sipa> maybe not anymore? that would be strange

195

18:40

<MarcoFalke> Ideed. I presumed there was one

196

18:40

<pinheadmz> i just figured a lot of the networking stuff is blocking, getting chain data etc

197

18:40

<pinheadmz> slows down the loop

198

18:41

<felixweis> doesnt it work on kqueue / epoll

199

18:41

<felixweis> ?

200

18:41

<sipa> the select() call has a timeout

201

18:41

<jnewbery> oh, maybe it's in SocketEvents()

202

18:41

<sipa> so SocketHandler runs whenever there is socket activity on any socket, or that timeoit pazssz

203

18:41

<sipa> yeah, in SocketEvents()

204

18:41

<sipa> sorry, phome typing

205

18:42

<MarcoFalke> https://github.com/bitcoin/bitcoin/blob/50e019a97a5b49caee867ec7d630fca908caed9d/src/net.cpp#L1316

206

18:42

<jnewbery> SELECT_TIMEOUT_MILLISECONDS which is 50ms

207

18:42

<MarcoFalke> phew, glad we found that

208

18:42

<willcl_ark> :D

209

18:42

<sipa> we'd notice

210

18:42

<sipa> bitcoind would be >100% cpu usage all the time otherwise

211

18:42

<willcl_ark> the thread would max out at 100% cpu otherwise

212

18:43

<MarcoFalke> ok, any other questions before we move on?

213

18:43

<willcl_ark> probably what sipa said

214

18:43

<michaelfolkson> But handshakes/serving data to peers can be done concurrently right. You don't need to complete a handshake with one peer before starting a handshake with another etc?

215

18:43

<michaelfolkson> I don't know what pinheadmz meant exactly by "blocking"

216

18:43

<jnewbery> and if you're interested, our other main loop is in the message handler thread, which sleeps here between loops: https://github.com/bitcoin/bitcoin/blob/50e019a97a5b49caee867ec7d630fca908caed9d/src/net.cpp#L2249

217

18:43

<MarcoFalke> michaelfolkson: Yes concurrently logically, but there is only one thread handling the bytes on the wire

218

18:44

<michaelfolkson> Ok thanks

219

18:44

<amiti> something I want to clarify about the previous convo with the timeout: prior to this PR, if a peer sends a non-version message before completing their handshake, this would bump their misbehaving score by 1 & disconnect if they sent 100 non-version messages. so, this case would probably hit the timeout (by the time they send over 100s of messages). there might be a slight difference though, in the case

220

18:44

<amiti> where they just send over a couple non-version messages and then finish connecting. prior to PR this could have incremented the score that reached the 100 number with other behavior too. (but if I remember correctly, many of the misbehaviors increment +100, so a couple extra points would be irrelevant)

221

18:44

<amiti> so what I'm saying is pre-pr incrementing misbehavior wouldnt *always* hit the timeout. but pre-pr disconnection would almost certainly disconnect via timeout. right?

222

18:45

<sipa> one message handler thread (which processes incoming messages and decides what to send), and one network thread (which actually receives/sends data over the wire)

223

18:46

<sipa> amiti: in theory that sounds right, but i think hitting score 100 through 100 score-1 faults isn't ever going to happen accept with very weird broken peers

224

18:46

<willcl_ark> amiti: do scores persist between disconnects?

225

18:46

<pinheadmz> michaelfolkson ^ yeah the single threadedness is what i meant by blocking -- while we are processing a peer the other peers are "waiting"

226

18:46

<MarcoFalke> amiti: Depends on how many non-version messages the peer can send in 60 seconds

227

18:46

<sipa> willcl_ark: no

228

18:46

<sipa> *except

229

18:46

<amiti> sipa: haha yeah, would be weird.

230

18:46

<michaelfolkson> So the network thread could be dealing with one peer and the message thread could be dealing with a different peer

231

18:46

<sipa> also we should just remove misbehavior scores

232

18:46

<amiti> +1

233

18:46

<willcl_ark> well if scores persisted, then the cumulateive behaviour in amiti's example might be different

234

18:47

<sipa> michaelfolkson: indeed

235

18:47

<MarcoFalke> sipa: Why not remove them? *hides*

236

18:48

<michaelfolkson> What's stopping us from having multiple message handler threads and multiple network threads? :)

237

18:48

<jonatack> I'm curious to hear people's thoughts on the last two questions.

238

18:48

<MarcoFalke> michaelfolkson: cs_main

239

18:48

<michaelfolkson> Ah

240

18:48

<nehan> cs_main doesn't _prevent_ it, it just limits the benefit

241

18:49

<pinheadmz> cs_main you mean - the lock ?

242

18:49

<nehan> and might actually make things slower if many threads are contending for one lock

243

18:49

<pinheadmz> side Q: is `cs` a hungarian-notation abbreviation for something ?

244

18:49

<sipa> critical section

245

18:49

<pinheadmz> of course!!!!!!! /runs to google

246

18:49

<MarcoFalke> To elaborate a bit more: Most traffic is from tx messages. Each tx message (assuming it is a previously unseen tx) will call ATMP (AcceptToMemoryPool), which takes the cs_main lock

247

18:50

<jnewbery> pinheadmz: mutexes that have been added more recently are more often named thing_mutex

248

18:50

<MarcoFalke> cs_main is the validation lock. It needs to be taken in addition to the mempool lock when adding txs to the mempoll

249

18:50

<MarcoFalke> *mempool

250

18:50

<sipa> michaelfolkson: it's also inherently limited in many ways; a lot of messages require processing that affects many other peers (e.g. incoming, put in queue for all other peers to announce)

251

18:51

<MarcoFalke> Ok, let's move on with the next questions. Only 10 minutes left

252

18:51

<MarcoFalke> What are the risks or benefits of disconecting a peer?

253

18:51

<sipa> so net_processing (at peast parts of it) are about interaction between peers, and not just "handle message for one peer, move on to the next"

254

18:51

<pinheadmz> i think disconnecting peers, like selecitng peers, is sensistive because of eclipse attacks

255

18:51

<nehan> risks: potential for eclipse attacks

256

18:51

<willcl_ark> well surely one risk is in someway eclipsing yourself

257

18:52

<felixweis> benefit would be keeping our connection slots open for non-misbehaving nodes

258

18:52

<amiti> a benefit is you don't want your slots to be taken up by broken or malicious peers

259

18:52

<sergei-t> Risk: wrongly disconnect an honest peer that did a weird thing on accident

260

18:52

<nehan> benefit: open up connections for useful peers

261

18:52

<sergei-t> Benefit: prevent potential attacks (?)

262

18:52

<amiti> a risk is that over-eager disconnection logic in bitcoin core could lead to a network partition

263

18:52

<michaelfolkson> Yeah depends on whether if the disconnecting was rational or not.

264

18:53

<michaelfolkson> If rational great, cleared a slot. If not rational you might replace honest peer with a malicious peer

265

18:53

<pinheadmz> also its very impolite. not everyone speaks the same dialect *cough* bcoin

266

18:53

<MarcoFalke> all good answers

267

18:53

<carlakc> even rationally connecting could be a problem if we have different logic across versions?

268

18:53

<MarcoFalke> pinheadmz: I think in some cases it is polite to disconnect

269

18:53

<sipa> i'd say disconnecting is very polite

270

18:54

<felixweis> pinheadmz: what are some examples of bcoin dialect?

271

18:54

<sipa> it can prevwnt the peer from wasting bandwidth on you

272

18:54

<pinheadmz> felixweis still uses getblocks and doesnt headers-first yet, although PR is in review to update

273

18:54

<pinheadmz> not really bannable offenses, i was joking

274

18:54

<felixweis> thanks, good to know

275

18:54

<jnewbery> pinheadmz: no headers-first?! :-O

276

18:55

<michaelfolkson> Yeah I wanted to ask about this carlakc. What happens when a connected peer changes their version? They just send you new version and you understand that they've upgraded?

277

18:55

<MarcoFalke> Say there is a light client connecting to you, asking for data you can't provide. You could keep and waste the connection or just disconnect

278

18:55

<sipa> michaelfolkson: you can't upgrade a connectiom from one version to another

279

18:55

<felixweis> yeah people tend to overlook the importance of "p2p consensus" when reimplementing bitcoin

280

18:55

<pinheadmz> jnewbery we worked on it all last year, it led to the chain-width-expansion attack paper Braydon released, but never merged the fix, still wanted to figure out a way to mitigate timewarp attacks in conjunciton

281

18:55

<michaelfolkson> Oh so you disconnect sipa?

282

18:55

<jnewbery> michaelfolkson: there's no way to re-version a connection

283

18:55

<sipa> michaelfolkson: how do you upgrade your software without disconnecting....?

284

18:56

<sipa> you need to stop the running version and start the new one

285

18:56

<michaelfolkson> Good point

286

18:56

<nehan> michaelfolkson: there's no mechanism for nodes to update their version without restarting...

287

18:56

<michaelfolkson> Haha

288

18:56

<MarcoFalke> double version messages are also ignored

289

18:56

<MarcoFalke> Last question: What are your thoughts on balancing the risks and benefits of disconnecting peers? What should be used as criteria to disconnect a peer?

290

18:57

<MarcoFalke> This one doesn't have a textbook answer

291

18:57

<sergei-t> Would it be a good idea to introduce some kind of config setting so that peer can set how strict it wants to be in its disconnection policy?

292

18:57

<felixweis> cue erlang bitcoin implementation hot version upgrade

293

18:57

<sipa> sergei-t: what advice would you give to people on what to set it to?

294

18:58

<willcl_ark> I guess you might want to be more liberal in disconnecting if your slots are full, but if you have many empty ones then less so

295

18:58

<pinheadmz> hard question, we want to mitigate DoS but keep the network healthy

296

18:58

<nehan> felixweis: if the kernel can do it... https://ksplice.oracle.com/

297

18:58

<pinheadmz> i think it takes research to determine what is meaningful attack vector based on message size and procesing requirement

298

18:58

<sergei-t> sipa: e.g., if you run a professional node that hodls lots of coins (exchange), you'd better be stringent

299

18:58

<willcl_ark> also if you are bandwidth-constrained then you don't want peers wasting your bandwidth

300

18:58

<sipa> (this is my test: if you introduce a configuration knob, you must know when it's useful to change ot)

301

18:58

<felixweis> nehan: exactly!!1

302

18:58

<MarcoFalke> willcl_ark: Good point making it dependend on how many slots are used

303

18:58

<sipa> sergei-t: how does being more stringent imply being more safe?

304

18:59

<sergei-t> sipa: it does though I can't prove it's true :) otoh, what are the benefits of _not_ being stringent?.. what do I gain by being more liberal?

305

18:59

<hmrawal> willcl_ark peers hardly take a few KBs right ? so if at all the bandwith is wasted it's not so much a peer can't afford

306

19:00

<MarcoFalke> sergei-t: If the connection can offer you value (eg. valid txs or blocks), you might be better off keeping it

307

19:00

<sipa> sergei-t: the problem is that the protocol is not exactly specified, and changing, and not all software behaves exactly the same

308

19:00

<felixweis> sergei-t: more implementations, more different bitcoin versions, easier to introduce new features without the risk of network partitioning, ...

309

19:00

<willcl_ark> hmrawal: I was doing some work previously on mesh networks where message size was 210 bytes...

310

19:00

<sipa> being more stringent may mean disallowing some (honest but weird) peers

311

19:00

<sipa> or in the worst case, promoting partitioning attacks

312

19:01

<nehan> also upgraded nodes with new features might look "weird" to un-upgraded nodes which is bad

313

19:01

<MarcoFalke> And accidents do happen. Banilla Bitcoin Core itself might accidentally misbehave

314

19:01

<MarcoFalke> *Vanilla

315

19:01

<MarcoFalke> #endmeeting

316

19:01

<MarcoFalke> time :)

317

19:01

<carlakc> tangent, but has there ever been an attempt to write out the current P2P protocol as implemented in core?

318

19:01

<felixweis> thanks MarcoFalke! :)

319

19:01

<jonatack> MarcoFalke: I think you've just named the new release

320

19:01

<nehan> thanks!

321

19:01

<willcl_ark> thanks MarcoFalke

322

19:01

<carlakc> thanks MarkoFalke!

323

19:01

<theStack> hi

324

19:01

<dhruvm> thank you MarcoFalke

325

19:01

<anir> thanks MarcoFalke!

326

19:01

<MarcoFalke> thanks everyone. Today I learned the difference between version handshake and feature negotiation

327

19:01

<willcl_ark> Bitcoin Core 22.0 "Banilla"

328

19:01

<pinheadmz> good jam ya'll!

329

19:01

<emzy> Thank you MarcoFalke and jnewbery!

330

19:02

<amiti> thanks MarcoFalke! thanks everyone :)

331

19:02

<michaelfolkson> Missed it theStack

332

19:02

<sergei-t> thank you everyone and welcome to people who confused the time zones!

333

19:02

<elle> thanks everyone!

334

19:02

<felixweis> theStack: yeah, DST sucks

335

19:02

<stacie> Thanks all!

336

19:02

<michaelfolkson> carlakc: You mean docs?

337

19:02

<theStack> michaelfolkson: ohnoez... guess i have to read the logs tomorrow :p

338

19:02

<jonatack> thanks MarcoFalke and everyone!

339

19:02

<jnewbery> sergei-t: A bit more background on removing the -banscore option, which did something similar to what you suggested: https://github.com/bitcoin/bitcoin/pull/19219#issuecomment-652684340 https://github.com/bitcoin/bitcoin/pull/19219#issuecomment-652699592

340

19:03

<carlakc> michaelfolkson: I mean formal specification of the protocol, message formats, exchange order etc

341

19:03

<jnewbery> thanks MarcoFalke. That was a great meeting :)

342

19:03

<MarcoFalke> carlakc: there is https://btcinformation.org/en/developer-reference#p2p-network

343

19:03

<thomasb06> Nothing to do with the review today but to break things up, Q[X]/(X^3-3X-1) in a field of dimension 9 over rationals and is non-commutative. It's considered a simple one: https://math.stackexchange.com/questions/133790/an-example-of-noncommutative-division-algebra-over-q-other-than-quaternion-alg

344

19:03

<willcl_ark> It's pretty much "the code is the spec" these days, right?

345

19:04

<carlakc> MarkoFalke: awesome, never seen that before

346

19:04

<michaelfolkson> Always has been. Always will be willcl_ark

347

19:04

<willcl_ark> I guess P2P could perhaps be codified though :S

348

19:04

<MarcoFalke> carlakc: Might be outdated by now. It was mainly written by harding some years ago

349

19:04

<sergei-t> jnewbery: thanks, will have a look! just seemed weird to me that misbehaving score is currently "one size fits all"...

350

19:04

<sipa> also like consensus changes... the changes to P2P are documented in BIPs

351

19:04

<MarcoFalke> Though, new p2p messages are covered in BIPs

352

19:05

<sipa> sergei-t: it's a terrible idea and we shouldn't ever had it

353

19:05

<sipa> ;)

354

19:05

<michaelfolkson> Suhas said he'd withdrawn a BIP. Which BIP was that?

355

19:05

<pinheadmz> MarcoFalke how is https://en.bitcoin.it/wiki/Protocol_documentation ?

356

19:05

<MarcoFalke> pinheadmz: Works too. Should cover the same stuff

357

19:06

<michaelfolkson> https://github.com/bitcoin/bitcoin/pull/19723#issuecomment-679137633

358

19:06

<MarcoFalke> michaelfolkson: The feature negotiation BIP?

359

19:06

<felixweis> michaelfolkson: not the bip, the big amendment

360

19:06

<sipa> michaelfolkson: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2020-August/018084.html

361

19:06

<michaelfolkson> Ta

362

19:07

<sipa> sergei-t: maybe a counterpoint... you *can* change the misbehavior threshold from 100 to another value using a cmdline option

363

19:07

<sipa> but imho, it's pointless to do so

364

19:08

<MarcoFalke> sipa: Not anymore

365

19:08

<MarcoFalke> That setting was finally removed

366

19:08

<sipa> oh, good.

367

19:13

<jonatack> #19464 net: remove -banscore configuration option

368

19:16

<sipa> dank.

369

19:17

<pinheadmz> haha spoken like a true bay area kid

370

19:17

<jnewbery> *flemish person

371

19:34

<michaelfolkson> How do Bcoin nodes connect to Core nodes pinheadmz? Core nodes don't recognize Bcoin versions right?

372

19:34

<pinheadmz> on the internet, no one knows youre a firdge...

373

19:34

<sipa> why would they care about version numbers?

374

19:34

<pinheadmz> bcoin speaks all the same messages and message tpyes

375

19:34

<sipa> clients don't even tell each other their client version, apart from a free-form user agent string

376

19:35

<pinheadmz> protocol version numbers are still in sync, as they must be to send and receive certain messages like compact blocks

377

19:35

<pinheadmz> same as btcd and other fullnode implementations

378

19:36

<pinheadmz> however, if bitcoin core deprecates getblocks we'll need to merge our headers first branch with more urgency ;-)

379

19:36

<sipa> that'd break a lot of things, i think

380

19:36

<pinheadmz> how old is headers first? any nodes out there predate that ?

381

19:36

<pinheadmz> (bitcoin core nodes out there)

382

19:37

<sipa> 0.10, 2015

383

19:39

<michaelfolkson> I rather dumbly thought Bcoin nodes would exchange Bcoin version messages

384

19:40

<michaelfolkson> So Bcoin versions map to particular Core versions

385

19:40

<sipa> the VERSION message doesn't send the client version; it sends the protocol version

386

19:40

<pinheadmz> two versions: software version like releases and protocl version :-)

387

19:41

<sipa> protocol versions have been disentangled from client versions since 0.6 or so

388

19:41

<pinheadmz> https://github.com/bcoin-org/bcoin/blob/master/lib/net/common.js#L23

389

19:42

<michaelfolkson> Ha TIL

390

19:42

<michaelfolkson> Just assumed they were the same

391

19:43

<michaelfolkson> Protocol versions only updated when they are changed?

392

19:43

<sipa> yes

393

19:43

<sipa> current protocol version is 70002

394

19:44

<sipa> eh

395

19:44

<sipa> 70016

396

19:45

<michaelfolkson> The reason for disentangling to not confuse versions with other implementations?

397

19:46

<michaelfolkson> I can't think of another reason. No harm in just using Core versions otherwise

398

19:46

<sipa> core doesn't define the protocol

399

19:46

<pinheadmz> michaelfolkson yeah and like, compact blocks was added in 70014, comcpact blocks +swegwit in 70015, etc

400

19:46

<sipa> it's an implementation of it

401

19:46

<pinheadmz> so your node knows what words the peer understnads

402

19:47

<MarcoFalke> there is also a wallet version ;)

403

19:47

<pinheadmz> heh and bcoin wallet is totally different from bitcoind

404

19:48

<sipa> the wallet version isn't a protocol thing

405

19:48

<sipa> it's just stored in wallet.dat, which is obviously core-only

406

19:50

<michaelfolkson> "Core doesn't define the protocol" is surely a can of worms. Without a spec a so called reference implementation is all we have

407

19:52

<michaelfolkson> But disentangling P2P protocol from consensus protocol makes sense

408

19:52

<sipa> i think that's the wrong way to think about it

409

19:52

<sipa> bitcoin core never sends GETBLOCKS messages since 0.10, but they're clearly still part of the protocol as removing it would break other software

410

19:53

<sipa> ultimately the protocol is defined by what actually deployed software expects

411

19:53

<sipa> and that's a very fuzzy thing, but it's not the same thing as "defined by a reference implementation"

412

19:56

<michaelfolkson> I guess if Core is the "reference implementation" then the next question is which version of Core is the "reference implementation". And which P2P protocol version is the reference P2P protocol implementation. Can't be answered

413

19:56

<michaelfolkson> The limitations of words....

414

19:58

<jnewbery> The getblocks/hashContinue IBD method should be deprecated. We don't even have any tests for it.

415

19:59

<sipa> jnewbery: perhaps we should add tests for it :)

416

19:59

<jnewbery> perhaps, or perhaps we should just remove it :)

417

20:00

<sipa> i don't think it's reasonable to require all peer software to switch to headers-first based IBD

418

20:01

<pinheadmz> actualy bcoin *does sync headers first for IBD only, once it hits the last checkpoint though its getblocks

419

20:01

<pinheadmz> and even then it doesn't download blocks out of order or from >1 peer

420

20:02

<pinheadmz> i think this is how btcd does as well

421

20:03

<jnewbery> pinheadmz: does it use the hashContinue method? ie does it need to be retriggered to request blocks by a BLOCK message for the tip after receiving 2000 blocks?

422

20:04

<pinheadmz> yeah

423

20:04

<pinheadmz> https://github.com/bcoin-org/bcoin/blob/master/lib/net/pool.js#L1652

424

20:04

<pinheadmz> with some tweaks

425

20:05

<jnewbery> yuck

426

20:05

<pinheadmz> ha tell me about it

427

20:06

<pinheadmz> the headers first PR is here: https://github.com/bcoin-org/bcoin/pull/875 and was accompanied by the chain-expansion attack paper

428

20:06

<jnewbery> I dislike any method where you're using state on a remote peer for your book-keeping

429

20:06

<pinheadmz> but after discussing on the ML braydon realized time warp attacks werent really accounted for and so like a perfectionist, he clsoed the branch

430

20:06

<pinheadmz> then we all got laid off

431

20:06

<pinheadmz> so.

432

20:06

<pinheadmz> yeah great point