Bitcoin Core PR Review Club

Immediately disconnect on invalid net message checksum (p2p)

May 20, 2020

https://github.com/bitcoin/bitcoin/pull/15206

Host: jnewbery - PR author: jonasschnelli

The PR branch HEAD was 1d9bc6a8c at the time of this review club meeting.

Note: an earlier version of this PR moved hash finalization from the Message Handler thread to the Socket Handler thread. That actually happened in PR 16202, so you can ignore discussion about moving the hash calculation.

Notes

Peer-to-peer message headers include a checksum for message integrity. If any of the message gets corrupted, then the calculated checksum from the payload won’t match the checksum in the header. See the developer notes for full details of the message header format.
The checksum is the first four bytes of the double-SHA256 of the message payload.
SHA256() is a Merkle–Damgård hash function. In this hash function construction, the pre-image is split into blocks, which are fed one-by-one into a compression function.
Because of the serial way that the hash is constructed, we can do some of the hashing work in the Socket Handler thread as we receive the bytes of the wire. See the readData() function, which calls into CSHA256::Write().
The final round of hashing is done in GetMessageHash(), which calls through to CSHA256::Finalize().
Prior to PR 16202, GetMessageHash() was called by ProcessMessages(), which is executed by the Message Handler thread. After that PR, the hash finalization is done in the Socket Hander thread, and the Message Handler thread just checks the saved hash data member.
If the calculated checksum does not match the checksum in the message header, then the message is dropped.
This PR proposes removing any checksum logic from net_processing, and moving it to net. It also proposes immediately disconnecting peers that send us a message with a bad checksum.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)
What are the benefits of handling bad checksums in the net layer instead of in net processing?
Are there any other checks that should be moved from net processing into net?
What do you think about the behaviour change in this PR? Is it better to simply drop the message or should we disconnect from the peer?

Meeting Log

13:00

<jnewbery> #startmeeing

13:00

<MarcoFalke> hi

13:00

<troygiorshev> hi

13:00

<ariard> hi

13:00

<nehan> hi

13:00

<theStack> hi

13:00

<kanzure> hi

13:00

<thomasb06> hi

13:00

<jnewbery> greetings earthlings (and other universe dwellers)

13:00

<raj_149> Hi..

13:00

<jnewbery> Notes and questions: https://bitcoincore.reviews/15206.html

13:00

<michaelfolkson> hi

13:00

<emzy> Hi

13:00

<felixweis> hi

13:00

<jkczyz> hi

13:00

<sipa> hi

13:00

<lightlike> hi

13:01

<pinheadmz> hi

13:01

<jnewbery> Normal reminders: all questions are welcome! If you're confused about something, other people probably are too, so ask! No need to ask to ask, just ask!

13:01

<andrewtoth> hi

13:01

<jnewbery> who had a chance to review this week? (y/n)

13:01

<pinheadmz> y

13:01

<ariard> y

13:01

<raj_149> y

13:01

<felixweis> n

13:01

<MarcoFalke> y

13:01

<theStack> n

13:01

<lightlike> y

13:01

<troygiorshev> y

13:02

<vasild> Hi, n

13:02

<gimballock> Hi

13:02

<nehan> y

13:02

<andrewtoth> y

13:02

<jkczyz> y

13:02

<jnewbery> Great, anyone want to summarize the PR? Are you concept ACKs/NACKs?

13:02

<emzy> n

13:03

<pinheadmz> it moves the p2p message checksum check up earlier in the message processing prrocess

13:04

<troygiorshev> Concept ACK. Two parts: 1) Move checksum check to net layer, 2) change behavior upon check failure to disconnecting the peer, from ignoring the message

13:04

<pinheadmz> and the reviewers are trying to decide if it should ban a peer that sends a malformed message

13:04

<jonasschnelli> Hi (only partially here)

13:04

<jnewbery> troygiorshev pinheadmz: exactly. This isn't a pure refactor, it does change behaviour

13:04

<raj_149> Move hashing op into network layer. Move it from message processing to socket Handler. .

13:04

<jonasschnelli> The current behavior of the PR is not a ban

13:05

<jnewbery> we'll go into that in the next questions

13:05

<jnewbery> hi jonasschnelli! Thanks for joining us

13:05

<nehan> also changes which thread does the Finalize, I think?

13:05

<sipa> only the finalization moves

13:06

<sipa> hashing for bulk of messages was already in net

13:06

<jonasschnelli> Only the finalize operation changes...

13:06

<jonasschnelli> Though we have a lot of small messages

13:06

<jnewbery> I don't think this PR actually moves which thread does the finalize

13:06

<pinheadmz> yeah looked to me like originally the checksum was checked and stored in a bool but not /acted/ upon until way later

13:06

<jonasschnelli> I think it does... but can’t say for sure. It been a while

13:07

<jnewbery> When the PR was opened it did, but since then, PR16202 was merged, which moves the finalize operation into the socket handler thread

13:07

<sipa> oh, ok!

13:07

<pinheadmz> oh interesting. yeah i had a pre-game quesiton about the timing of this one

13:07

<pinheadmz> has there been a recent flood of malformed messages or something?

13:07

<pinheadmz> what rises this PR from its year of sleep?

13:07

<jonasschnelli> Oh. Thanks for that info jnewbery

13:08

<michaelfolkson> I think some of the review comments touched upon this but what are the likely reasons for sending an incorrect checksum (other than maliciousness)?

13:08

<jonasschnelli> Some BIp324 discussion

13:08

<nehan> jnewbery: oh right, that's the first sentence of the review notes

13:08

<jnewbery> pinheadmz: I just saw it was rebased recently and started receiving some review attention and thought net/net_processing separation would be interesting to talk about at review club

13:09

<jnewbery> ok, so first question. What are the benefits of handling bad checksums in the net layer instead of in net processing?

13:09

<vasild> michaelfolkson: implementation bug, memory corruption, also memory corruption at the receiving end.

13:09

<sipa> just broken network infrastructure

13:10

<vasild> we can rely on TCP to not corrupt the data?

13:10

<nehan> jnewbery: get to it sooner, use fewer resources processing bad messages

13:10

<jnewbery> (I'll be a bit loose with language and use "net layer" == "socket handler thread" and "net processing layer" == "message handler thread")

13:10

<michaelfolkson> vasild: Which could all be honest failures right? We're disconnecting for our own benefit as a node rather than because we think the incorrect checksum was sent due to maliciousness

13:11

<emzy> For network problems there is l a TCP checksum. Ok could be manipulatet bei firewall stuff.

13:11

<jnewbery> vasild: great question. TCP should take care of errors for us. So how else could header checksum be wrong?

13:11

<theStack> jnewbery: i'd say it makes generally sense to detect failures at the lowest layer possible

13:11

<vasild> I mean we can rely that TCP will detect and re-transmit corrupted data. So there is no way to get corrupted data due to a malfuctioning router in between?

13:11

<vasild> michaelfolkson: yes

13:12

<ariard> vasild: TCP does integrity check ?

13:12

<emzy> vasild: yes, normaly.

13:12

<jonasschnelli> Would firewall randomly manipulate a single message? Or constant?

13:12

<sipa> jonasschnelli: intentionally it wouldn't change anything at all

13:12

<jonasschnelli> (Sry phone typing)

13:12

<lightlike> While reviewing I wondered if the changed disconnect behavior is a consequence of a change done mostly for architectural reasons (not so easy to imitate the old behavior after moving from net_processing to net) or something desirable in itself.

13:12

<vasild> ariard: yes

13:13

<jonasschnelli> Is the firewall a hypothetical issue or did someone report / monitor that?

13:13

<emzy> jonasschnelli: you never know. there is deep packet inspection.

13:13

<sipa> but routers can reassemble TCP packets

13:13

<jnewbery> nehan theStack: in the original version of this PR, I'd agree with you. I expect there was a performance benefit from moving the finalize operation to the socket handler thread, since the message handler thread is our bottleneck. Since 16202 was merged, there isn't a performance benefit ...

13:13

<felixweis> tcp has a 16 bit checksum

13:13

<sipa> so the TCP checksum is only an point-to-point checksum, not end-to-end

13:13

<jonasschnelli> Which is relatively weak

13:13

<jnewbery> ... but I think from a architecture/layering perspective, this makes sense

13:13

<MarcoFalke> lightlike: I posted a patch that keeps the check in net_processing, so no

13:13

<ariard> right but your NAT firewall may recompute the checksum after modification IIRC

13:13

<sipa> ariard: yes

13:13

<sipa> it will

13:14

<troygiorshev> jnewbery: agreement on the architecture/layering reason

13:14

<nehan> jnewbery: what is the architecture/layering reason?

13:15

<jnewbery> anyone want to answer nehan's question about layering?

13:15

<sipa> it's ugly that the checksum checking is in net_processing, because it's a network protocol level thing

13:15

* MarcoFalke move the checksum check to the wallet. *hides

13:16

<troygiorshev> nehan: The header wraps around some data. (Just like how, say, an ethernet header wraps around some data). Ideally you do the checksums, check the header, and then hand the data to the layer above you. Then that layer doesn't have to worry about anythign

13:16

<sipa> in particular when facing BIP324, where the checksum because dependent on the transport used... it's strange that the network processing layer would even know which transport is used

13:16

<nehan> maybe you could share a bit about how you see hte split between net/net_processing

13:16

<nehan> net is network protocol; net_processing is more application (node) semantics?

13:17

<felixweis> stream vs message handling?

13:17

<nehan> and you consider the "header" part of the network protocol? that's application-specific, right? it's not the packet header

13:17

<troygiorshev> nehan: yes exactly

13:17

<jnewbery> nehan: exactly

13:17

<sipa> net is interaction with socket layer, and P2P transport layer

13:17

<lightlike> the PR description gives another reason: It would be desirable to implement alternative transport layer protocols such as BIP155 in net, without having to change anything in net_processing

13:17

<sipa> the P2P transport layer transports messages, which are handled by net_processing

13:18

<jnewbery> The distinction on our software is not perfect. Take a look a CNode in net.h and CNodeState in net_processing.cpp. Ideally, CNode would just be about the connection to another node and CNodeState would just be about application-level stuff (ie inventory of transactions and blocks, address gossip, etc)

13:19

<jnewbery> Greg's point here: https://github.com/bitcoin/bitcoin/pull/15206#issuecomment-460082894 is about firewalls trying to be application aware, and messing around with IP addresses inside application messages. That sometimes happens when firewalls are trying to deal with things like NAT traversal.

13:19

<sipa> nehan: i guess you'd have to see the bitcoin p2p protocol as consisting of two layers itself

13:19

<jnewbery> In a previous life I worked in telecoms, and we used to have lots of similar issues, where firewalls would mess around with IP addresses inside SIP and SDP messages and things would stop working.

13:19

<gzhao408> jnewbery When you say that "message handler thread is our bottleneck" it seems like we want to protect that thread from DoS/wasted resources. Is this a concern here/in general?

13:20

<jnewbery> gzhao408: that's a concern everywhere!

13:20

<sipa> it depends whether you're talking about honest overload or DoS attack

13:20

<gzhao408> jnewbery er sorry, I mean is it something we care /especially/ about e.g. because it's a common dos vector?

13:20

<jnewbery> but yes, here is a good example. In general, like nehan and theStack said earlier, it's best to deal with failures and bad messages as early and as low in the stack as possible.

13:21

<ariard> sipa: do we have a way to dissociate both right now ?

13:21

<lightlike> if these corruption issues can happen spuriously to otherwise good peers in some cases, I would prefer not to disconnect.

13:21

<ariard> like when downloading too much blocks become a DoS ?

13:21

<theStack> would a proper analogy for net vs. net_processing be be ethernet (layer 2) vs ip (layer 3)? the ethernet frame has a checksum, which is also checked on layer 2, and layer3 never gets to see it

13:21

<sipa> ariard: sure it can, but we have no protection against that

13:22

<jnewbery> gzhao408: probably not. This isn't a big dos concern. We always have to do the checksum calculations, so a node trying to waste our resources gains no advantage by sending a message with a bad checksum.

13:22

<ariard> I think there is another point by moving this from net_processing to net, it's happen in 2 different threads right now

13:22

<emzy> What about Erlay BIP 330? Will it also bettter to move the checksum out to net for that?

13:22

<felixweis> could ASMAP be used to limit upload rates per network?

13:22

<ariard> and lets say you have a single-threaded system, you may have some buffer growing quickly before switch

13:22

<troygiorshev> theStack: I think so. An interesting parellel is that just as the ethernet header contains a section saying "IP", the bitcoin header contains a sections specifying the command name

13:22

<jnewbery> theStack: I'm not sure that analogy is useful. You can point out some similarities, but I don't think it's particularly illuminating

13:22

<sipa> emzy: completely orthogonal; erlay is entirely net_processing

13:23

<ariard> felixweis: I would fear someone in your ASN wasting resources for you, like some kind of impersonation

13:24

<sipa> felixweis: the general solution to resource wasting attack is just keep track of how much resources every peer is using; if things get tight, slow down processing of the worst one

13:24

<sipa> felixweis: that's the first, and hard, step

13:24

<jnewbery> lightlike: To understand whether this is actually a concern, it'd be useful to look at real-world data.

13:24

<sipa> doing it per asmap seems like a nice optimization on top of that

13:24

<jnewbery> Has anyone grepped their debug log for "CHECKSUM ERROR"?

13:24

<sipa> felixweis: sorry, how many resources you're consuming on behalf of every peer

13:24

<sipa> jnewbery: yes

13:25

<MarcoFalke> net_processing is similar to the rpc server I guess. They both can send data structures to validation, and they both read raw bytes from a socket or so

13:25

<troygiorshev> sipa: What did you generally find?

13:25

<felixweis> ariad, sipa: interesting points!

13:25

<MarcoFalke> I found one error

13:25

<sipa> troygiorshev: only one completely broken peer, which sends bogus messages with checksum all 0

13:25

<MarcoFalke> 2020-03-03T23:14:54Z ProcessMessages(inv, 397 bytes): CHECKSUM ERROR peer=1849951

13:25

<troygiorshev> sipa: bogus message with SMBr as the type?

13:25

<sipa> indeed

13:25

<MarcoFalke> 2020-03-03T22:49:52Z receive version message: /Satoshi:0.18.0/: version 70015, blocks=620054, us=35.247.102.9:8333, peer=1849951

13:25

<sipa> or 0-byte or -1 messages

13:26

<troygiorshev> Looks like that's spam from some group advertizing their coin

13:26

<jnewbery> I found none (although I only had a couple of weeks' worth of debug logs)

13:26

<MarcoFalke> So according to the user agent, it is a Bitcoin Core node

13:26

<jonasschnelli> MarcoFalke: what does that peer does during its session... can you grep?

13:26

<MarcoFalke> Normal relay, it was an inv message

13:26

<jnewbery> sipa: how long does your debug log go back?

13:27

<sipa> only 15 days

13:27

<sipa> i have an older one though

13:28

<sipa> i don't think this means much though - people on broken network infrastructure will see checksum errors and others won't

13:28

<jnewbery> So it seems that checksum errors are very rare.

13:28

<jnewbery> Did people see Nicolas Dorier's comment here: https://github.com/bitcoin/bitcoin/pull/15206#issuecomment-460024940. Any thoughts about that?

13:29

<sipa> jnewbery: i don't think we can conclude that (i agree it's likely the case, but it's hard to know)

13:29

<emzy> can't find a "CHECKSUM ERROR" on like 5 nodes/

13:30

<jnewbery> sipa: right, there's at least one cow in scotland, and one side of it appears to be brown

13:30

<sipa> ...

13:30

<nehan> jnewbery: gmaxwell's response seemed reasonable

13:30

<sipa> jnewbery: my point is that you wouldn't see problems if you're on a non-broken network

13:30

<jnewbery> (https://mathworld.wolfram.com/AtLeastOne.html)

13:30

<sipa> that doesn't mean it's not a problem for others

13:31

<sipa> it may be fairly common even, for users we care about, but we wouldn't know as long as we are using good networks

13:31

<sipa> (especially old home routers...)

13:31

<lightlike> emzy: all nodes with debug=net enabled?

13:31

<ariard> and there is no way to measure which nodes are on good-vs-bad network

13:31

<troygiorshev> should we try and ensure that bitcoin can be used over an unreliable transport layer?

13:32

<raj_149> I didn't find any in my node..

13:32

<emzy> lightlike: no.

13:32

<sipa> troygiorshev: that is a better question, i think

13:32

<vasild> If I am sitting on a broken network infrastructure, checksum errors will be common for me.

13:32

<sipa> to what extent can be guarantee reliable operation in such cases

13:32

<lightlike> emzy: then it wouldn't appear in the logs even if it happened

13:32

<MarcoFalke> Wouldn't a bad router cause disconnects anyway because the network magic is corrupted? I mean it is less likely because the magic is short compared to the message, but still

13:32

<sipa> BIP324 would break these nodes anyway

13:32

<troygiorshev> it certainly seems, uh, romantic at least...

13:32

<ariard> troygiorshev: shilling this https://github.com/bitcoin/bitcoin/pull/18988 ;)

13:33

<emzy> lightlike: I see. I will enable it on one of my nodes.

13:33

<jnewbery> If the network infrastructure is breaking the version message, then those nodes won't be able to connect to peers

13:33

<sipa> jnewbery: agree

13:33

<troygiorshev> sipa: good point bringing up BIP324. Maybe it overrides this discussion

13:33

<sipa> further, the protocol is stateful

13:34

<sipa> if something happens in the middle of say a tx response, the peer may keep waiting, and time out

13:34

<jnewbery> so there seems to be no downside to disconnecting peers that have a bad checksum in a version message (since the alternative is that they'll timeout their connection)

13:34

<troygiorshev> ariard: i love it :D

13:34

<sipa> jnewbery: why would they time out?

13:34

<jnewbery> And if the version message is the one most likely to be corrupted by the firewall (which seems likely since it contains IP addresses), then if it's uncorrupted, then there are unlikely to be problems with subsequent messages

13:35

<jonasschnelli> Good point

13:35

<jnewbery> Do we not drop a connection if we don't receive a version within a time limit?

13:35

<sipa> jnewbery: sure

13:35

<sipa> but there are cases where a checksum failure would not result in any problems at all

13:35

<jonasschnelli> MarcoFalke: is that checksum error you witnessed in a version message?

13:35

<ariard> jnewbery: can you still have a bad checksum for version message, i.e switch IP address but this IP address being valid

13:35

<ariard> for receiver

13:35

<MarcoFalke> no

13:35

<sipa> the message would just be ignored and skipped, and that'd be it

13:35

<jonasschnelli> (Or did you say in a inv)

13:35

<MarcoFalke> It was in an inv message

13:36

<jnewbery> so the peer would send us a version with a bad checksum, we'd drop it, and then we'd eventually time out the connection because we never saw a version

13:36

<troygiorshev> sipa: (Other than a single bit flip in the checksum field I assume?)

13:36

<ariard> if it's a NATed address it's routable

13:36

<sipa> jnewbery: if it's in the version message, sure

13:36

<MarcoFalke> I was connected to the node for days

13:36

<troygiorshev> jonasschnelli: that's a good point

13:37

<MarcoFalke> No, only 15 minutes :sweat-smile:

13:37

<jnewbery> sipa: right that was my point. If the version message is corrupted, then there's no downside to disconnecting

13:37

<sipa> jnewbery: agree!

13:37

<jnewbery> right, next question. Are there any other checks that should be moved from net processing into net?

13:38

<jnewbery> (feel free to continue discussing/asking questions about disconnect. I just want to make sure we have a chance to discuss all the questions before the end of the hour)

13:38

<troygiorshev> jnewbery: MarcoFalke brought up a great point in the PR that the checksum should be treated the same as the header check and netmagic. Maybe all of them should be moved to net?

13:39

<jnewbery> troygiorshev: I agree!

13:40

<MarcoFalke> I think both should live in the same place (whether that is net_processing due to historic reasons or net because we decided that is the better place)

13:41

<MarcoFalke> But splitting them up felt a bit weird

13:41

<jnewbery> MarcoFalke: I don't think it's all-or-nothing. There are plenty of examples where things live in the wrong layer because they haven't moved yet

13:42

<ariard> its far better in net IMO, its really confusing while reviewing bip324 where these values are changed by deserializer and you have to go in net_processing to understand semantic

13:42

<jonasschnelli> jnewbery: we might want to disconnect nodes on invalid netmagic (v1 only).

13:42

<pinheadmz> jnewbery how about m_valid_netmagic ?

13:42

<pinheadmz> ha

13:42

<jonasschnelli> (in the socket thread)

13:42

<pinheadmz> i was looking for something like message size etc

13:42

<troygiorshev> pinheadmz: look at m_valid_header too

13:42

<jonasschnelli> Isn't #15197 doing that? (netmagic)

13:43

<jonasschnelli> https://github.com/bitcoin/bitcoin/pull/15197

13:43

<pinheadmz> troygiorshev ah good call for some reason i saw that and thought block header, but ofc that was wrong contet

13:44

<jnewbery> jonasschnelli: ah, I hadn't seen 15197. Thanks

13:45

<MarcoFalke> Is there software that uses the net magic to seek to the next message in the stream?

13:45

<jonasschnelli> MarcoFalke: that would be super weak

13:46

<MarcoFalke> (Asking because btcinformation claimed so)

13:46

<jonasschnelli> what if those 4 bytes match a hash of something sent over the wire?

13:46

<MarcoFalke> jonasschnelli: Agree

13:46

<jonasschnelli> we can't help someone doing that

13:47

<MarcoFalke> I am asking because the meeting notes linked to btcinformation, which suggest this is good practice

13:47

<MarcoFalke> Someone should probably edit that

13:47

<troygiorshev> Is it maybe historical?

13:48

<vasild> The decision whether to disconnect from a peer from whom we receive a bad checksum maybe should be based on how many other checksum errors am I seeing from other peers. For example if I get bad checksums from 1 peer but have plenty of other healthy peers from which I never get a bad checksum, then I would disconnect from the "bad" peer. But if I am getting checksum errors every few minutes

13:48

<vasild> from e.g 80% of my peers then maybe I wouldn't be so eager to disconnect (from everybody).

13:48

<jnewbery> MarcoFalke: you've missed out a part of the sentence "used to seek to next message _when stream state is unknown._"

13:49

<jnewbery> I'm still not sure whether that's true, but it makes a bit more sense

13:49

<jonasschnelli> vasild: that would be an option. But is it worth the effort?

13:49

<vasild> dunno :)

13:49

<MarcoFalke> But why would the stream state be unknown? (We disconnect when the header can't be deserialized)

13:50

<vasild> jonasschnelli: I think definitely out of the scope of that PR, maybe not worth the effort at all.

13:50

<michaelfolkson> Agree vasild. The danger here is that majority of your peers are sending you bad checksums and you rotate through iterations of new peers?

13:50

<felixweis> is it known how much CPU time is spent in SHA256 in a typical bitcoin node during normal operation (not IBD)?

13:51

<nehan> vasild: hmm this makes me wonder if this changes threatmodels at all. now an attacker could corrupt all incoming headers and force me to disconnect from all nodes

13:51

<troygiorshev> nehan: this is discussed in the PR

13:51

<vasild> michaelfolkson: yes, needlessly disconnecting from good peers due to bricked network router, but you also use the same router to connect to new peers...

13:51

<nehan> but could an attacker with network control essentially do that anyway?

13:51

<sipa> nehan: that kind of attacker can already just disconnect you

13:51

<troygiorshev> nehan: yup that's it :)

13:51

<nehan> troygiorshe: ah, must have missed it. thanks!

13:51

<MarcoFalke> felixweis: Depends whether you use hardware hashing or do it in software, I guess

13:51

<jnewbery> felixweis: I'm also curious about this. Does anyone know?

13:52

<troygiorshev> felixweis: Do you mean in this checksum or SHA256 in general?

13:52

<sipa> none of the changes in this PR change anything about attack models, as far as i can tell

13:52

<emzy> There is also an attack vector. If you can insert one TCP packet into the stram with wrong checksum. Then you can disconnect all peers. Seeing from a network level attacker.

13:52

<sipa> the only question is whether disconnecting honest but broken peers is preferable

13:52

<jonasschnelli> well... there is a CVE.

13:52

<felixweis> well all i know is there's a lot of sh256 hashing going, only 2 of them are used to verify the next block header.

13:53

<michaelfolkson> The scenario where honest peers are sending you bad checksums. A dishonest node identifies this, sends you good checksums and controls all your connections

13:53

<michaelfolkson> Unlikely but possible?

13:53

<jnewbery> is the concern about bad firewalls about my local firewall or the remote peer's firewall?

13:53

<ariard> emzy: network level attacker with such capabilities can just feed invalid blocks and get ban all your peers

13:54

<ariard> jnewbery: good question, I think both

13:54

<ariard> like can you know which one is faultive ?

13:54

<sipa> felixweis: txid and merkle root computation are probably the majority

13:54

<jnewbery> if it's about the remote's firewall, then I'd expect to see CHECKSUM ERROR messages in all node's debug logs

13:54

<sipa> and p2p checksums...

13:54

<pinheadmz> jonasschnelli CVE for what ?

13:55

<jnewbery> because we'd probably connect to at least one peer with a bad firewall

13:55

<sipa> jnewbery: it could be rare

13:55

<emzy> ariard: You only have to hit the right TCP SEQ. number. That's not sp hard anymore.

13:55

<jonasschnelli> There is a publicly revealed CVE that the PR fixes... but I don't think that CVE has reasonable weight

13:55

<jnewbery> 5 minutes left!

13:55

<MarcoFalke> jonasschnelli: Pretty sure anyone can come up with a DOS vector that is more severe and does not depend on the checksum disconnect logic

13:55

<jonasschnelli> The PR focuses on layering issues and it "also fixes that CVE"

13:55

<jnewbery> If you've been shy so far, now's your chance to ask your question

13:56

<ariard> emzy: what do you mean by sp hard ? IIRC there is ban on mutated consensus data, you don't need hashrate for this

13:57

<pinheadmz> jonasschnelli if possible id love to see the cve

13:57

<lightlike> emzy, ariard: maybe there are types of attackers who could manage to flip a bit every now and then, but cannot carefully replace whole msgs with constructed ones? Or does that make no sense?

13:57

<nehan> well, now i want to know about the CVE

13:57

<troygiorshev> +!

13:57

<troygiorshev> +1

13:57

<jonasschnelli> MarcoFalke: probably... I guess the difference on attacking the checksum is, that the attacker doesn't have to calculate a valid SHA256 hash where the victim needs to do for the validation

13:57

<ariard> jnewbery: it's a private node it may not have that much connections and not being seen that much

13:57

<emzy> ariard: sorry typo. It's not so hard to hit the right TCP sequence number.

13:57

<jonasschnelli> pinheadmz: the CVE has been publicly announced by the Bitcoin SV people...

13:57

<pinheadmz> ha!

13:57

<jonasschnelli> I'm only revealing it because its already public available on the web

13:58

<nehan> ah. it's a pretty obvious one

13:58

<pinheadmz> https://bitcoinsv.io/2019/03/01/denial-of-service-vulnerabilities-repaired-in-bitcoin-sv-version-0-1-1/

13:59

<jonasschnelli> It is public since a year and it seems that no-one could explot it

13:59

<sipa> heh, they can just send valid double spending transactions too

13:59

<nehan> jonasschnelli: why not?

13:59

<emzy> lightlike: yes, It would make an attack more easy, if you only have to get in the TCP stream and send someting random. Insted of getting the hash right.

14:00

<jnewbery> That's time. Thanks everyone! Special thanks to jonasschnelli for dropping in.

14:00

<jonasschnelli> nehan: don't know... because its probably an inefficient attack?

14:00

<ariard> lightlike: I can't think about real-world infrastructure with this kind of capabilites, see Erebus paper for discussion on infrastructure attacker model

14:00

<jnewbery> I have a call now so I have to run.

14:00

<nehan> thanks!

14:00

<jnewbery> Reminder that we have a BIP 157 special meeting at the same time tomorrow. I'll push the notes and questions for https://github.com/bitcoin/bitcoin/pull/19010 this afternoon.

14:00

<troygiorshev> thanks jnewbery!

14:00

<jnewbery> #endmeeting

14:00

<sipa> jonasschnelli: there are dozens of ways to achieve the same outcome

14:00

<jonasschnelli> thanks, Thanks for calling me in jnewbery!

14:00

<pinheadmz> ty all!

14:00

<jonasschnelli> sipa: Yes. I think so.

14:00

<emzy> tnx everyone!

14:00

<lightlike> thanks!

14:00

<theStack> thanks!

14:00

<vasild> Thanks everyone!

14:00

<sipa> jonasschnelli: sending N bytes to cause the victim to waste N bytes of network + N bytes of hashing...

14:01

<felixweis> thanks everyone!

14:01

<pinheadmz> sipa jonasschnelli you mean just sending any type of invalid mesasge

14:01

<jonasschnelli> indeed... yeah

14:01

<thomasb06> thanks!

14:01

<thomasb06>

14:01

<pinheadmz> is as severe as bad checksum

14:01

<sipa> pinheadmz: right

14:01

<sipa> that too

14:01

<pinheadmz> i.e. a ddos where entire 4MB block is valid.... except the last tx

14:01

<nehan> pinheadz: that requires a bunch of pow

14:01

<pinheadmz> ah good point

14:01

<jonasschnelli> but for an invalid message (to pass the checksum test), the attacker needs a valid sha256 hash?

14:01

<pinheadmz> well then a TX max-size where the last sig is invalid

14:02

<MarcoFalke> jonasschnelli: Only needs to calculate once

14:02

<jonasschnelli> But I guess you can send invalid blocks to make the victim hash-check that

14:02

<nehan> don't nodes drop peers that send a lot of invalid messages/

14:02

<jonasschnelli> MarcoFalke: yeah. We don't ignore dups

14:02

<jonasschnelli> I guess that CVE is total bullshit

14:02

<MarcoFalke> right

14:02

<pinheadmz> even if its not severe, I do appreciate that altcoins look at the code from a different perspective and have an opportunity to contribute back to bitcoin

14:03

<jonasschnelli> However, the optimization in layering (checksum belong to message transport and not to message processing) is real

14:03

<jonasschnelli> I guess the SV people where happy they could create some CVE at all. :)

14:04

<raj_149> Curious on estimate of performance optimization by this layering.

14:05

<jonasschnelli> raj_149: near to zero probably

14:06

<emzy> Again. I think the checksum prevents that an attacker could inject just one packet into the TCP stream to ban a node. I'ts far to easy to spoof the IP address of a peer and flood with TCP packets (random SEQ. number) the target node.

14:07

<emzy> Would be better to just dissconnet the node not ban it.

14:08

<sipa> who says anything about banning?

14:08

<raj_149> Are we also banning the node along with disconnection?

14:08

<sipa> no

14:08

<raj_149> Ya. Right.

14:08

<emzy> I think I got it wrong. It is only disconnect.

14:08

<emzy> Sorry, that's fine then.

14:10

<raj_149> Also is it possible to apply some kind of heuristics to assertain resource draining intention from a node? May be that way we can safeguard against disconecting honest nodes just because of bad transport?

14:11

<sipa> raj_149: that's hard, as it's probably a cat and mouse game

14:11

<sipa> there are certainly some behaviours that are easily detectable, but if someone really wants to exploit things, there are dozens of ways

14:12

<sipa> i believe the real solution is just keeping track of resources used on behalf of every peer, and slow down/disconnect the worst offenders if you run low

14:12

<desolate> raj_149 does the heuristic determine how much resources the attack is draining on the heuristic function? :P

14:12

<sipa> with perhaps exceptions like giving you a valid block

14:20

<desolate> I second sipa's KISS approach when receiving incompatible data: "turtle" strategy rather than attempting to build a highly optimized, hopefully omnipotent strategy