Treat handshake misbehavior like unknown message (p2p)

https://github.com/bitcoin/bitcoin/pulls/20079

Host: MarcoFalke  -  PR author: MarcoFalke

The PR branch HEAD was fac8e9a at the time of this review club meeting.

Notes

  • The Bitcoin P2P network is open for anyone to join and peers are generally assumed to be untrusted. A node can function as long as it has at least one well-behaved connection to the network to participate in block and transaction relay.

  • Bitcoin Core has several options for how to treat peers that violate the rules of the P2P protocol:

    • ignore the individual message, but continue processing other messages from that peer;
    • increment the peer’s “misbehavior” score, and punish the peer once its score goes above a certain amount
    • disconnect from the peer
    • disconnect from the peer and prevent any later connections from that peer’s address (discouragement)
  • Today’s pull request 20079 was created as a follow up to PR 19723, which changed how non-verack messages are treated when the peer hasn’t yet sent a verack message.

Questions

  1. Did you review the PRs? Concept ACK, approach ACK, ACK, or NACK?

  2. What message types can be sent/received during a handshake? Which are optional? In what order can they be sent?

  3. What happens if the peer doesn’t complete the handshake?

  4. How are incorrect messages from peers dealt with during the handshake? How does the pull request change that?

  5. What are the risks or benefits of disconnecting a peer?

  6. What are your thoughts on balancing the risks and benefits of disconnecting peers? What should be used as criteria to disconnect a peer?

Meeting Log

  118:00 <MarcoFalke> #startmeeting
  218:00 <jnewbery> hi!
  318:00 <emzy> hi
  418:00 <carlakc> hello!
  518:00 <willcl_ark> hi
  618:00 <tonyfabeen> hi
  718:00 <pinheadmz> hi
  818:00 <stacie> hi
  918:00 <elle> hi!
 1018:00 <felixweis> hi
 1118:00 <dappdever> hi
 1218:00 <dhruvm> hi
 1318:00 <michaelfolkson> hi
 1418:00 <sergei-t> hi
 1518:01 <jonatack> hi
 1618:01 <nehan> hi
 1718:01 <hmrawal> hi
 1818:02 <MarcoFalke> Today we'll chat about peer handshake, and peer misbehavior. Reminder: Don't ask to ask, just ask your question :)
 1918:02 <MarcoFalke> Anyones first review club today?
 2018:02 <drainable> hi
 2118:03 <amiti> hi
 2218:03 <MarcoFalke> Ok, lets get started. Did you review the PRs (y/n)?
 2318:03 <elle> y
 2418:03 <dhruvm> y
 2518:03 <sergei-t> y
 2618:03 <nehan> y
 2718:03 <amiti> y
 2818:03 <felixweis> y
 2918:03 <carlakc> y
 3018:03 <stacie> y
 3118:03 <dappdever> y
 3218:03 <jnewbery> y
 3318:03 <MarcoFalke> nice. A lot of yes
 3418:03 <emzy> n
 3518:03 <pinheadmz> y
 3618:03 <jonatack> y, would like to propose peers wear a mask and avoid handshakes
 3718:04 <MarcoFalke> heh
 3818:04 <MarcoFalke> What message types can be sent/received during a handshake? Which are optional? In what order can they be sent?
 3918:04 <pinheadmz> 🤝 😷
 4018:04 <hmrawal> jonatack: :-D good one
 4118:04 <pinheadmz> MarcoFalke VERSION and VERACK; WTXIDRELAY is optional
 4218:05 <sergei-t> version - (verack+version) - verack. Not sure about the order in (version + verack)...
 4318:05 <dhruvm> Both peers in the connection must send a VERSION and receive a VERACK before other messages are accepted.
 4418:05 <MarcoFalke> The handshake is done by several functions, but here is a good start to begin your search: https://github.com/bitcoin/bitcoin/blob/50e019a97a5b49caee867ec7d630fca908caed9d/src/net_processing.cpp#L2274
 4518:05 <sergei-t> What is wtxidrelay?
 4618:05 <pinheadmz> sergei-t it's new!
 4718:05 <carlakc> and WTXIDRELAY needs to come before verack iiuc
 4818:05 <hmrawal> does verack mean version acknowledged ?
 4918:06 <felixweis> its a new feature where nodes relay tx via wtxid instead of txid
 5018:06 <pinheadmz> sergei-t peers can relay transactions by their witness txid
 5118:06 <pinheadmz> a tx hash that includes witness data
 5218:06 <pinheadmz> hmrawal yup
 5318:06 <felixweis> the hash over the full transaction not just the base version without the segwit stuff
 5418:06 <MarcoFalke> Indeed, wtxidrelay is a newly introduced protocol feature
 5518:06 <sergei-t> may be off topic, but why is wtxidrelay useful / better than regular txid relay?
 5618:07 <sipa> sergei-t: BIP 339
 5718:07 <pinheadmz> MarcoFalke pretty funny "start your search here"... link to 3,000 line function :-)
 5818:07 <felixweis> wtxidrelay is a marker to annouce to your peer, to negociate this ability for the connection
 5918:07 <MarcoFalke> pinheadmz: I wish we still had main.cpp. You could find everything in one file ;)
 6018:08 <felixweis> lolz
 6118:08 <pinheadmz> ha "yeah just check out the code here" ....main.cpp answers all your questions
 6218:08 <felixweis> no need to figure out how to exit vim all the time
 6318:08 <dhruvm> Why is wtxisrelay a message and not a service bit?
 6418:08 <dhruvm> wtxidrelay*
 6518:09 <carlakc> do SENDHEADERS and SENDCMPCT count as part of the handshake? they're optionally sent once we receive VERACK
 6618:09 <sipa> dhruvm: service bits are useful for feature discovery; he it's just a negotiation
 6718:09 <jnewbery> sergei-t dhruvm : it's probably a bit offtopic for this meeting. There's lots more information and context about wtxid relay in https://github.com/bitcoin/bitcoin/pull/18044
 6818:09 <dhruvm> jnewbery: thanks
 6918:09 <MarcoFalke> dhruvm: Good question. service bits are used when peering behavior is changed. wtxidrelay is just a toggle for tx relay and shouldn't influence peering
 7018:10 <jnewbery> and the issues that it solves are documented in https://github.com/bitcoin/bitcoin/issues/8279
 7118:10 <elle> carlakc: those are send only once handshake is complete i think. ie: after VERACK as you said
 7218:10 <elle> *sent
 7318:10 <dhruvm> MarcoFalke: i see. thanks.
 7418:10 <pinheadmz> also stuff like GETADDR and SENDCMPT kinda stuff
 7518:10 <pinheadmz> comes after handshake but is kinda part of the initial connection
 7618:11 <felixweis> you can't "wtxidrelay" it to older nodes too because you risk get disconnected (Misbehaving)
 7718:11 <carlakc> elle: got it, thanks!
 7818:11 <jonatack> dhruvm: there's even a review club for that https://bitcoincore.reviews/18044
 7918:11 <hmrawal> what's the order ? first Version followed by VERACK ?
 8018:11 <pinheadmz> hmrawal yeah VERSION-> VERACK<- VERSION<- VERACK-> I think
 8118:11 <sergei-t> If I receive a Version, do I first send my Version and then Verack or vice versa? Or t doesn't matter?
 8218:12 <hmrawal> and that's multiple VERSION/VERACK for multiples nodes I guess
 8318:12 <MarcoFalke> version must be the first message sent/received
 8418:12 <felixweis> the connecting node sends the first message
 8518:13 <sipa> for a lot of behavior around initiao connection (and the p2p protocol in general, especially parts that are old), there is no clear distinction between "what is part of the protocol" and "what bitcoin core does"
 8618:13 <sergei-t> MarcoFalke: so it's strictly Version1 - Version2 - Verack2 - Verack1?
 8718:13 <sipa> as the initial protocol was simply defined by how the reference client behaved
 8818:13 <sipa> and what it accepted
 8918:13 <MarcoFalke> So who wants to answer the full question? Hint: There are more message types than version/verack
 9018:14 <elle> sergei-t: i dont think the connection needs to happen both ways. that is optional. a peer may decide to connect to me but then i dont make an outgoing connection to them. i think
 9118:14 <dappdever> MarcoFalke: how do you define exactly what is part of the handshake?
 9218:14 <stacie> Do the rest of the message types handled in ProcessMessage() count as part of the handshake? Because there is a long list 😅
 9318:15 <michaelfolkson> Not formally defined in code dappdever. I would say any data sent once connection is formalized isn't part of handshake
 9418:15 <nehan> There is also SENDADDRV2
 9518:16 <pinheadmz> stacie hint: search for erros that throw a message with "...before version handshake"
 9618:16 <jonatack> nehan: yes, BIP155
 9718:17 <MarcoFalke> the handshake includes anything that involves feature negotiation
 9818:17 <dappdever> michaelfolkson: and a connection is formalized when fSuccessfullyConnected becomes true?
 9918:17 <MarcoFalke> so inv or tx wouldn't be part of the handshake (even if they might be sent out during one)
10018:17 <sipa> MarcoFalke: also sendheaders and sendcmpct which carlakc mentiomed earlier in that case
10118:18 <elle> Surely the only other optional message is wtxidrelay?
10218:18 <MarcoFalke> sipa: I'd count those as features negotiation, so part of the handshake
10318:19 <jnewbery> elle: the initial handshake does require a version message in both directions and a verack message in both directions. You may be getting confused with how the functional tests open multiple connections between nodes
10418:19 <elle> ok cool, my bad. Sorry sergei-t and carlakc! im spreading false info here
10518:20 <MarcoFalke> Ok, I think we covered all message types during the handshake
10618:20 <sipa> MarcoFalke: yeah; i think people here earlier suggested that it wasn't because those are after verack
10718:20 <sipa> just a semantics issue
10818:20 <sergei-t> So what's the correct answer? to question 2
10918:20 <carlakc> are sendheaders and sendcmpct also optional? because we only send them if the peer version supports them
11018:20 <nehan> carlakc: yes
11118:20 <MarcoFalke> Let me try to summarise: version (must be first message sent/received, required), wtxidrelay (sent/received after version, before verack, optional), verack (sent/received after version,required), sendaddrv2/sendheaders/sendcmpct (sent/received after verack,optional)
11218:20 <pinheadmz> cant SENDCMPCT be sent at any time ?
11318:21 <MarcoFalke> pinheadmz: Right
11418:21 <pinheadmz> so its an early message but to me the "handshake" is just the two required messages
11518:21 <MarcoFalke> though, must be after verack
11618:21 <pinheadmz> er, and then wtxidrelay yeah
11718:22 <nehan> MarcoFalke: so are you defining the "handshake" as first VERSION to last VERACK, or first VERSION to complete feature negotiation?
11818:22 <MarcoFalke> ok, I think this question wasn't well defined because handshake is unclear
11918:22 <MarcoFalke> nehan: I'd say until feature negotiation is complete
12018:22 <nehan> ok, I think that was the confusion.
12118:22 <MarcoFalke> It is a long and awkward handshake
12218:22 <elle> my assumption was that once pfrom.fSuccessfullyConnected is true, then the handshake is complete?
12318:23 <anir> that's what I understood^
12418:23 <felixweis> thats expected, the nodes are just starting to get to know each other
12518:23 <michaelfolkson> You can define handshake however you want. It is a word. It isn't code :)
12618:23 <MarcoFalke> Ok, so my understanding of handshake is wrong ;). Moving on ...
12718:23 <MarcoFalke> What happens if the peer doesn’t complete the handshake?
12818:23 <sipa> maybe we can call the steps until verack "handshake" and the colection of all messages that determine features "feature negotiation", and these two only partially overlap (wtxidrelay)
12918:24 <MarcoFalke> Oh, I see. This question only makes sense when the handshake is done after the verack msg
13018:24 <willcl_ark> It looks like, after the PR, we just fall back to timing out after `DEFAULT_PEER_CONNECT_TIMEOUT = 60` seconds?
13118:24 <pinheadmz> we ignore any other messages / bump misbehave score befoer this PR, and eventually the peer probably times out
13218:24 <stacie> Per comments in PR #19723 they are disconnected after 60 seconds/99 unsupported/non-handshake messages (the messages part of the previous question). Can someone help direct me to where this is in the code?
13318:24 <hmrawal> they are disconnected
13418:24 <MarcoFalke> can anyone link to the code?
13518:25 <willcl_ark> src/net.cpp#CConnman::InactivityCheck
13618:26 <nehan> pre-PR, if nVersion hasn't been set, add to the misbehaving score. if the nVersion was set but fSuccessfullyConnected wasn't true yet, it looks like it just logs.
13718:26 <MarcoFalke> willcl_ark: There are several inactivity checks. Which one would be hit?
13818:26 <pinheadmz> stacie i thikn the old 99 value comes from the misbehavior score getting +1'ed and then we ban at 100
13918:27 <willcl_ark> I think `else if (!pnode->fSuccessfullyConnected)`
14018:27 <dhruvm> https://github.com/bitcoin/bitcoin/blob/5bcae7967f73353aff5e6d2f696bbf47ec6fdbb3/src/net.cpp#L1257
14118:27 <stacie> pinheadmz oh! that makes sense. Especially since incrementing misbehaving score is what was removed from this PR
14218:28 <MarcoFalke> willcl_ark: dhruvm: right
14318:28 <pinheadmz> and theres a few other Misbehaving() calls, e.g. "non-connecting headers" is a score of 20 etc
14418:28 <MarcoFalke> How are incorrect message from peers dealt with during the handshake? How does the pull request change that?
14518:29 <dhruvm> prior to PR incorrect messages would contribute to themisbehavior score, now they won't
14618:29 <hmrawal> earlier it used to add to the misbehaving score after PR it will jus tlog
14718:29 <emzy> How can tor nodes be baned? As I understand they have no source address.
14818:29 <nehan> i think my answer above addresses the first part. the PR changes it by no longer adding to the misbehaving score. I don't see how timeouts are involved though...
14918:29 <dhruvm> however, since the 60 seconds timeout will disconnect the peer anyway, it seems unnecessary to punish them
15018:30 <felixweis> emzy: you can still get disconnected if your score reaches 100
15118:30 <dappdever> it is more likely that the peer would timeout before having a score reach 100
15218:30 <sipa> emzy: they're just disconnected
15318:31 <jnewbery> Does everyone see that? In the InactivityCheck() function: https://github.com/bitcoin/bitcoin/blob/50e019a97a5b49caee867ec7d630fca908caed9d/src/net.cpp#L1235-L1257
15418:31 <jnewbery> That top if statement with nTimeConnected means we don't do any of these checks until we've been connected to the peer for 60 seconds. The final else if statement with fSuccessfullyConnected means that if the handshake hasn't completed in that time then we'll disconnect.
15518:31 <michaelfolkson> emzy sipa: Tor nodes aren't subject to the these misbehaving scores? Just disconnected at first sign of misbehaving?
15618:32 <nehan> jnewbery: thanks!
15718:32 <MarcoFalke> The InactivityCheck is called by Connman for every node in a loop
15818:32 <MarcoFalke> jnewbery: thanks for clarifying
15918:32 <sergei-t> How often in the inactivity check performed?
16018:33 <sipa> michaelfolkson: they're subject to misbehavior and disconnection, not discouraging
16118:33 <MarcoFalke> m_peer_connect_timeout is 60 seconds by default
16218:33 <jnewbery> sergei-t: every loop of the socket handler thread
16318:33 <sipa> (discouraging is the new name of the auto-banning mechanism)
16418:33 <MarcoFalke> every loop for every peer
16518:33 <carlakc> jnewbery: this one https://github.com/bitcoin/bitcoin/blob/50e019a97a5b49caee867ec7d630fca908caed9d/src/net.cpp#L1544?
16618:33 <jnewbery> look for where InactivityCheck() is called
16718:34 <emzy> sipa: so only option would be a tarpit. To slow them down.
16818:34 <felixweis> https://github.com/bitcoin/bitcoin/blob/50e019a97a5b49caee867ec7d630fca908caed9d/src/net.cpp#L1544
16918:34 <sergei-t> I mean, how often is a loop in seconds? Does this question make sense?
17018:34 <felixweis> everytime there is some activity on the socket. can be multiple times in a second
17118:35 <jnewbery> carlakc: exactly!
17218:35 <pinheadmz> sergei-t yeah because the check starts with GetSystemTimeInSeconds() so that loop can run infintiely, it just checks against the clock
17318:35 <pinheadmz> ultimately there is a while(true) loop that iterates through all peers as long as bitcoind is alive
17418:36 <dhruvm> sergei-t: If I am reading this right, it is called whenever anything is received on the socket?
17518:36 <sergei-t> if there is no activity in the socket, the inactivity check must still run, right? Otherwise it fails to detect inactivity :)
17618:37 <dhruvm> sergei-t: Sorry read the indentation wrong: It's run just cycling through the peers.
17718:37 <dhruvm> https://github.com/bitcoin/bitcoin/blob/5bcae7967f73353aff5e6d2f696bbf47ec6fdbb3/src/net.cpp#L1544
17818:37 <felixweis> at some point there will be a new incomming connection or another peer sends us an inv or something
17918:37 <sergei-t> felixweis: aa, and this would trigger inactivity checks for _all_ peers, not only the one that sent us something?
18018:38 <jnewbery> sergei-t: the inactivity check will happen for all peers in every loop, not just ones that are active
18118:38 <felixweis> thats how I currently understand CConnman::SocketHandler, pls correct me
18218:38 <sergei-t> jnewbery: that makes it much clearer, thanks! and what if all peers are inactive?
18318:39 <willcl_ark> Looks like CConnman::ThreadSocketHandler just loops continuously running SocketHandler with no interval time?
18418:39 <pinheadmz> yup
18518:39 <dhruvm> willcl_ark: that's how i understand it now as well
18618:39 <pinheadmz> checks each peer one at a time for incoming messages, or if we need to send an outgoing message back
18718:39 <sergei-t> willcl_ark: loops continiously means _many_ times per second? isn't it a bit... wasteful?
18818:40 <felixweis> if all the peers are inactive and there is no attempt for a new peer to connect (very unlikly), there is no cpu wasted so no need to disco
18918:40 <jnewbery> willcl_ark: yes, I can't see any sleeps in the socket handler loop
19018:40 <pinheadmz> sergei-t pretty sure it gets slowed down by all the things it does for each peer
19118:40 <sipa> there is a 10ms delay somewhere iirc
19218:40 <pinheadmz> sipa oh!
19318:40 <MarcoFalke> I can't see any sleeps either
19418:40 <sipa> maybe not anymore? that would be strange
19518:40 <MarcoFalke> Ideed. I presumed there was one
19618:40 <pinheadmz> i just figured a lot of the networking stuff is blocking, getting chain data etc
19718:40 <pinheadmz> slows down the loop
19818:41 <felixweis> doesnt it work on kqueue / epoll
19918:41 <felixweis> ?
20018:41 <sipa> the select() call has a timeout
20118:41 <jnewbery> oh, maybe it's in SocketEvents()
20218:41 <sipa> so SocketHandler runs whenever there is socket activity on any socket, or that timeoit pazssz
20318:41 <sipa> yeah, in SocketEvents()
20418:41 <sipa> sorry, phome typing
20518:42 <MarcoFalke> https://github.com/bitcoin/bitcoin/blob/50e019a97a5b49caee867ec7d630fca908caed9d/src/net.cpp#L1316
20618:42 <jnewbery> SELECT_TIMEOUT_MILLISECONDS which is 50ms
20718:42 <MarcoFalke> phew, glad we found that
20818:42 <willcl_ark> :D
20918:42 <sipa> we'd notice
21018:42 <sipa> bitcoind would be >100% cpu usage all the time otherwise
21118:42 <willcl_ark> the thread would max out at 100% cpu otherwise
21218:43 <MarcoFalke> ok, any other questions before we move on?
21318:43 <willcl_ark> probably what sipa said
21418:43 <michaelfolkson> But handshakes/serving data to peers can be done concurrently right. You don't need to complete a handshake with one peer before starting a handshake with another etc?
21518:43 <michaelfolkson> I don't know what pinheadmz meant exactly by "blocking"
21618:43 <jnewbery> and if you're interested, our other main loop is in the message handler thread, which sleeps here between loops: https://github.com/bitcoin/bitcoin/blob/50e019a97a5b49caee867ec7d630fca908caed9d/src/net.cpp#L2249
21718:43 <MarcoFalke> michaelfolkson: Yes concurrently logically, but there is only one thread handling the bytes on the wire
21818:44 <michaelfolkson> Ok thanks
21918:44 <amiti> something I want to clarify about the previous convo with the timeout: prior to this PR, if a peer sends a non-version message before completing their handshake, this would bump their misbehaving score by 1 & disconnect if they sent 100 non-version messages. so, this case would probably hit the timeout (by the time they send over 100s of messages). there might be a slight difference though, in the case
22018:44 <amiti> where they just send over a couple non-version messages and then finish connecting. prior to PR this could have incremented the score that reached the 100 number with other behavior too. (but if I remember correctly, many of the misbehaviors increment +100, so a couple extra points would be irrelevant)
22118:44 <amiti> so what I'm saying is pre-pr incrementing misbehavior wouldnt *always* hit the timeout. but pre-pr disconnection would almost certainly disconnect via timeout. right?
22218:45 <sipa> one message handler thread (which processes incoming messages and decides what to send), and one network thread (which actually receives/sends data over the wire)
22318:46 <sipa> amiti: in theory that sounds right, but i think hitting score 100 through 100 score-1 faults isn't ever going to happen accept with very weird broken peers
22418:46 <willcl_ark> amiti: do scores persist between disconnects?
22518:46 <pinheadmz> michaelfolkson ^ yeah the single threadedness is what i meant by blocking -- while we are processing a peer the other peers are "waiting"
22618:46 <MarcoFalke> amiti: Depends on how many non-version messages the peer can send in 60 seconds
22718:46 <sipa> willcl_ark: no
22818:46 <sipa> *except
22918:46 <amiti> sipa: haha yeah, would be weird.
23018:46 <michaelfolkson> So the network thread could be dealing with one peer and the message thread could be dealing with a different peer
23118:46 <sipa> also we should just remove misbehavior scores
23218:46 <amiti> +1
23318:46 <willcl_ark> well if scores persisted, then the cumulateive behaviour in amiti's example might be different
23418:47 <sipa> michaelfolkson: indeed
23518:47 <MarcoFalke> sipa: Why not remove them? *hides*
23618:48 <michaelfolkson> What's stopping us from having multiple message handler threads and multiple network threads? :)
23718:48 <jonatack> I'm curious to hear people's thoughts on the last two questions.
23818:48 <MarcoFalke> michaelfolkson: cs_main
23918:48 <michaelfolkson> Ah
24018:48 <nehan> cs_main doesn't _prevent_ it, it just limits the benefit
24118:49 <pinheadmz> cs_main you mean - the lock ?
24218:49 <nehan> and might actually make things slower if many threads are contending for one lock
24318:49 <pinheadmz> side Q: is `cs` a hungarian-notation abbreviation for something ?
24418:49 <sipa> critical section
24518:49 <pinheadmz> of course!!!!!!! /runs to google
24618:49 <MarcoFalke> To elaborate a bit more: Most traffic is from tx messages. Each tx message (assuming it is a previously unseen tx) will call ATMP (AcceptToMemoryPool), which takes the cs_main lock
24718:50 <jnewbery> pinheadmz: mutexes that have been added more recently are more often named thing_mutex
24818:50 <MarcoFalke> cs_main is the validation lock. It needs to be taken in addition to the mempool lock when adding txs to the mempoll
24918:50 <MarcoFalke> *mempool
25018:50 <sipa> michaelfolkson: it's also inherently limited in many ways; a lot of messages require processing that affects many other peers (e.g. incoming, put in queue for all other peers to announce)
25118:51 <MarcoFalke> Ok, let's move on with the next questions. Only 10 minutes left
25218:51 <MarcoFalke> What are the risks or benefits of disconecting a peer?
25318:51 <sipa> so net_processing (at peast parts of it) are about interaction between peers, and not just "handle message for one peer, move on to the next"
25418:51 <pinheadmz> i think disconnecting peers, like selecitng peers, is sensistive because of eclipse attacks
25518:51 <nehan> risks: potential for eclipse attacks
25618:51 <willcl_ark> well surely one risk is in someway eclipsing yourself
25718:52 <felixweis> benefit would be keeping our connection slots open for non-misbehaving nodes
25818:52 <amiti> a benefit is you don't want your slots to be taken up by broken or malicious peers
25918:52 <sergei-t> Risk: wrongly disconnect an honest peer that did a weird thing on accident
26018:52 <nehan> benefit: open up connections for useful peers
26118:52 <sergei-t> Benefit: prevent potential attacks (?)
26218:52 <amiti> a risk is that over-eager disconnection logic in bitcoin core could lead to a network partition
26318:52 <michaelfolkson> Yeah depends on whether if the disconnecting was rational or not.
26418:53 <michaelfolkson> If rational great, cleared a slot. If not rational you might replace honest peer with a malicious peer
26518:53 <pinheadmz> also its very impolite. not everyone speaks the same dialect *cough* bcoin
26618:53 <MarcoFalke> all good answers
26718:53 <carlakc> even rationally connecting could be a problem if we have different logic across versions?
26818:53 <MarcoFalke> pinheadmz: I think in some cases it is polite to disconnect
26918:53 <sipa> i'd say disconnecting is very polite
27018:54 <felixweis> pinheadmz: what are some examples of bcoin dialect?
27118:54 <sipa> it can prevwnt the peer from wasting bandwidth on you
27218:54 <pinheadmz> felixweis still uses getblocks and doesnt headers-first yet, although PR is in review to update
27318:54 <pinheadmz> not really bannable offenses, i was joking
27418:54 <felixweis> thanks, good to know
27518:54 <jnewbery> pinheadmz: no headers-first?! :-O
27618:55 <michaelfolkson> Yeah I wanted to ask about this carlakc. What happens when a connected peer changes their version? They just send you new version and you understand that they've upgraded?
27718:55 <MarcoFalke> Say there is a light client connecting to you, asking for data you can't provide. You could keep and waste the connection or just disconnect
27818:55 <sipa> michaelfolkson: you can't upgrade a connectiom from one version to another
27918:55 <felixweis> yeah people tend to overlook the importance of "p2p consensus" when reimplementing bitcoin
28018:55 <pinheadmz> jnewbery we worked on it all last year, it led to the chain-width-expansion attack paper Braydon released, but never merged the fix, still wanted to figure out a way to mitigate timewarp attacks in conjunciton
28118:55 <michaelfolkson> Oh so you disconnect sipa?
28218:55 <jnewbery> michaelfolkson: there's no way to re-version a connection
28318:55 <sipa> michaelfolkson: how do you upgrade your software without disconnecting....?
28418:56 <sipa> you need to stop the running version and start the new one
28518:56 <michaelfolkson> Good point
28618:56 <nehan> michaelfolkson: there's no mechanism for nodes to update their version without restarting...
28718:56 <michaelfolkson> Haha
28818:56 <MarcoFalke> double version messages are also ignored
28918:56 <MarcoFalke> Last question: What are your thoughts on balancing the risks and benefits of disconnecting peers? What should be used as criteria to disconnect a peer?
29018:57 <MarcoFalke> This one doesn't have a textbook answer
29118:57 <sergei-t> Would it be a good idea to introduce some kind of config setting so that peer can set how strict it wants to be in its disconnection policy?
29218:57 <felixweis> cue erlang bitcoin implementation hot version upgrade
29318:57 <sipa> sergei-t: what advice would you give to people on what to set it to?
29418:58 <willcl_ark> I guess you might want to be more liberal in disconnecting if your slots are full, but if you have many empty ones then less so
29518:58 <pinheadmz> hard question, we want to mitigate DoS but keep the network healthy
29618:58 <nehan> felixweis: if the kernel can do it... https://ksplice.oracle.com/
29718:58 <pinheadmz> i think it takes research to determine what is meaningful attack vector based on message size and procesing requirement
29818:58 <sergei-t> sipa: e.g., if you run a professional node that hodls lots of coins (exchange), you'd better be stringent
29918:58 <willcl_ark> also if you are bandwidth-constrained then you don't want peers wasting your bandwidth
30018:58 <sipa> (this is my test: if you introduce a configuration knob, you must know when it's useful to change ot)
30118:58 <felixweis> nehan: exactly!!1
30218:58 <MarcoFalke> willcl_ark: Good point making it dependend on how many slots are used
30318:58 <sipa> sergei-t: how does being more stringent imply being more safe?
30418:59 <sergei-t> sipa: it does though I can't prove it's true :) otoh, what are the benefits of _not_ being stringent?.. what do I gain by being more liberal?
30518:59 <hmrawal> willcl_ark peers hardly take a few KBs right ? so if at all the bandwith is wasted it's not so much a peer can't afford
30619:00 <MarcoFalke> sergei-t: If the connection can offer you value (eg. valid txs or blocks), you might be better off keeping it
30719:00 <sipa> sergei-t: the problem is that the protocol is not exactly specified, and changing, and not all software behaves exactly the same
30819:00 <felixweis> sergei-t: more implementations, more different bitcoin versions, easier to introduce new features without the risk of network partitioning, ...
30919:00 <willcl_ark> hmrawal: I was doing some work previously on mesh networks where message size was 210 bytes...
31019:00 <sipa> being more stringent may mean disallowing some (honest but weird) peers
31119:00 <sipa> or in the worst case, promoting partitioning attacks
31219:01 <nehan> also upgraded nodes with new features might look "weird" to un-upgraded nodes which is bad
31319:01 <MarcoFalke> And accidents do happen. Banilla Bitcoin Core itself might accidentally misbehave
31419:01 <MarcoFalke> *Vanilla
31519:01 <MarcoFalke> #endmeeting
31619:01 <MarcoFalke> time :)
31719:01 <carlakc> tangent, but has there ever been an attempt to write out the current P2P protocol as implemented in core?
31819:01 <felixweis> thanks MarcoFalke! :)
31919:01 <jonatack> MarcoFalke: I think you've just named the new release
32019:01 <nehan> thanks!
32119:01 <willcl_ark> thanks MarcoFalke
32219:01 <carlakc> thanks MarkoFalke!
32319:01 <theStack> hi
32419:01 <dhruvm> thank you MarcoFalke
32519:01 <anir> thanks MarcoFalke!
32619:01 <MarcoFalke> thanks everyone. Today I learned the difference between version handshake and feature negotiation
32719:01 <willcl_ark> Bitcoin Core 22.0 "Banilla"
32819:01 <pinheadmz> good jam ya'll!
32919:01 <emzy> Thank you MarcoFalke and jnewbery!
33019:02 <amiti> thanks MarcoFalke! thanks everyone :)
33119:02 <michaelfolkson> Missed it theStack
33219:02 <sergei-t> thank you everyone and welcome to people who confused the time zones!
33319:02 <elle> thanks everyone!
33419:02 <felixweis> theStack: yeah, DST sucks
33519:02 <stacie> Thanks all!
33619:02 <michaelfolkson> carlakc: You mean docs?
33719:02 <theStack> michaelfolkson: ohnoez... guess i have to read the logs tomorrow :p
33819:02 <jonatack> thanks MarcoFalke and everyone!
33919:02 <jnewbery> sergei-t: A bit more background on removing the -banscore option, which did something similar to what you suggested: https://github.com/bitcoin/bitcoin/pull/19219#issuecomment-652684340 https://github.com/bitcoin/bitcoin/pull/19219#issuecomment-652699592
34019:03 <carlakc> michaelfolkson: I mean formal specification of the protocol, message formats, exchange order etc
34119:03 <jnewbery> thanks MarcoFalke. That was a great meeting :)
34219:03 <MarcoFalke> carlakc: there is https://btcinformation.org/en/developer-reference#p2p-network
34319:03 <thomasb06> Nothing to do with the review today but to break things up, Q[X]/(X^3-3X-1) in a field of dimension 9 over rationals and is non-commutative. It's considered a simple one: https://math.stackexchange.com/questions/133790/an-example-of-noncommutative-division-algebra-over-q-other-than-quaternion-alg
34419:03 <willcl_ark> It's pretty much "the code is the spec" these days, right?
34519:04 <carlakc> MarkoFalke: awesome, never seen that before
34619:04 <michaelfolkson> Always has been. Always will be willcl_ark
34719:04 <willcl_ark> I guess P2P could perhaps be codified though :S
34819:04 <MarcoFalke> carlakc: Might be outdated by now. It was mainly written by harding some years ago
34919:04 <sergei-t> jnewbery: thanks, will have a look! just seemed weird to me that misbehaving score is currently "one size fits all"...
35019:04 <sipa> also like consensus changes... the changes to P2P are documented in BIPs
35119:04 <MarcoFalke> Though, new p2p messages are covered in BIPs
35219:05 <sipa> sergei-t: it's a terrible idea and we shouldn't ever had it
35319:05 <sipa> ;)
35419:05 <michaelfolkson> Suhas said he'd withdrawn a BIP. Which BIP was that?
35519:05 <pinheadmz> MarcoFalke how is https://en.bitcoin.it/wiki/Protocol_documentation ?
35619:05 <MarcoFalke> pinheadmz: Works too. Should cover the same stuff
35719:06 <michaelfolkson> https://github.com/bitcoin/bitcoin/pull/19723#issuecomment-679137633
35819:06 <MarcoFalke> michaelfolkson: The feature negotiation BIP?
35919:06 <felixweis> michaelfolkson: not the bip, the big amendment
36019:06 <sipa> michaelfolkson: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2020-August/018084.html
36119:06 <michaelfolkson> Ta
36219:07 <sipa> sergei-t: maybe a counterpoint... you *can* change the misbehavior threshold from 100 to another value using a cmdline option
36319:07 <sipa> but imho, it's pointless to do so
36419:08 <MarcoFalke> sipa: Not anymore
36519:08 <MarcoFalke> That setting was finally removed
36619:08 <sipa> oh, good.
36719:13 <jonatack> #19464 net: remove -banscore configuration option
36819:16 <sipa> dank.
36919:17 <pinheadmz> haha spoken like a true bay area kid
37019:17 <jnewbery> *flemish person
37119:34 <michaelfolkson> How do Bcoin nodes connect to Core nodes pinheadmz? Core nodes don't recognize Bcoin versions right?
37219:34 <pinheadmz> on the internet, no one knows youre a firdge...
37319:34 <sipa> why would they care about version numbers?
37419:34 <pinheadmz> bcoin speaks all the same messages and message tpyes
37519:34 <sipa> clients don't even tell each other their client version, apart from a free-form user agent string
37619:35 <pinheadmz> protocol version numbers are still in sync, as they must be to send and receive certain messages like compact blocks
37719:35 <pinheadmz> same as btcd and other fullnode implementations
37819:36 <pinheadmz> however, if bitcoin core deprecates getblocks we'll need to merge our headers first branch with more urgency ;-)
37919:36 <sipa> that'd break a lot of things, i think
38019:36 <pinheadmz> how old is headers first? any nodes out there predate that ?
38119:36 <pinheadmz> (bitcoin core nodes out there)
38219:37 <sipa> 0.10, 2015
38319:39 <michaelfolkson> I rather dumbly thought Bcoin nodes would exchange Bcoin version messages
38419:40 <michaelfolkson> So Bcoin versions map to particular Core versions
38519:40 <sipa> the VERSION message doesn't send the client version; it sends the protocol version
38619:40 <pinheadmz> two versions: software version like releases and protocl version :-)
38719:41 <sipa> protocol versions have been disentangled from client versions since 0.6 or so
38819:41 <pinheadmz> https://github.com/bcoin-org/bcoin/blob/master/lib/net/common.js#L23
38919:42 <michaelfolkson> Ha TIL
39019:42 <michaelfolkson> Just assumed they were the same
39119:43 <michaelfolkson> Protocol versions only updated when they are changed?
39219:43 <sipa> yes
39319:43 <sipa> current protocol version is 70002
39419:44 <sipa> eh
39519:44 <sipa> 70016
39619:45 <michaelfolkson> The reason for disentangling to not confuse versions with other implementations?
39719:46 <michaelfolkson> I can't think of another reason. No harm in just using Core versions otherwise
39819:46 <sipa> core doesn't define the protocol
39919:46 <pinheadmz> michaelfolkson yeah and like, compact blocks was added in 70014, comcpact blocks +swegwit in 70015, etc
40019:46 <sipa> it's an implementation of it
40119:46 <pinheadmz> so your node knows what words the peer understnads
40219:47 <MarcoFalke> there is also a wallet version ;)
40319:47 <pinheadmz> heh and bcoin wallet is totally different from bitcoind
40419:48 <sipa> the wallet version isn't a protocol thing
40519:48 <sipa> it's just stored in wallet.dat, which is obviously core-only
40619:50 <michaelfolkson> "Core doesn't define the protocol" is surely a can of worms. Without a spec a so called reference implementation is all we have
40719:52 <michaelfolkson> But disentangling P2P protocol from consensus protocol makes sense
40819:52 <sipa> i think that's the wrong way to think about it
40919:52 <sipa> bitcoin core never sends GETBLOCKS messages since 0.10, but they're clearly still part of the protocol as removing it would break other software
41019:53 <sipa> ultimately the protocol is defined by what actually deployed software expects
41119:53 <sipa> and that's a very fuzzy thing, but it's not the same thing as "defined by a reference implementation"
41219:56 <michaelfolkson> I guess if Core is the "reference implementation" then the next question is which version of Core is the "reference implementation". And which P2P protocol version is the reference P2P protocol implementation. Can't be answered
41319:56 <michaelfolkson> The limitations of words....
41419:58 <jnewbery> The getblocks/hashContinue IBD method should be deprecated. We don't even have any tests for it.
41519:59 <sipa> jnewbery: perhaps we should add tests for it :)
41619:59 <jnewbery> perhaps, or perhaps we should just remove it :)
41720:00 <sipa> i don't think it's reasonable to require all peer software to switch to headers-first based IBD
41820:01 <pinheadmz> actualy bcoin *does sync headers first for IBD only, once it hits the last checkpoint though its getblocks
41920:01 <pinheadmz> and even then it doesn't download blocks out of order or from >1 peer
42020:02 <pinheadmz> i think this is how btcd does as well
42120:03 <jnewbery> pinheadmz: does it use the hashContinue method? ie does it need to be retriggered to request blocks by a BLOCK message for the tip after receiving 2000 blocks?
42220:04 <pinheadmz> yeah
42320:04 <pinheadmz> https://github.com/bcoin-org/bcoin/blob/master/lib/net/pool.js#L1652
42420:04 <pinheadmz> with some tweaks
42520:05 <jnewbery> yuck
42620:05 <pinheadmz> ha tell me about it
42720:06 <pinheadmz> the headers first PR is here: https://github.com/bcoin-org/bcoin/pull/875 and was accompanied by the chain-expansion attack paper
42820:06 <jnewbery> I dislike any method where you're using state on a remote peer for your book-keeping
42920:06 <pinheadmz> but after discussing on the ML braydon realized time warp attacks werent really accounted for and so like a perfectionist, he clsoed the branch
43020:06 <pinheadmz> then we all got laid off
43120:06 <pinheadmz> so.
43220:06 <pinheadmz> yeah great point