Bitcoin Core PR Review Club

transport abstraction (p2p)

Sep 6, 2023

https://github.com/bitcoin/bitcoin/pull/28165

Host: glozow - PR author: sipa

Notes

Bitcoin nodes “speak” a P2P protocol, which includes a way of interpreting bytes sent over connections to one another as well as expectations for the contents of those messages peers are expected to send to each other. Within Bitcoin Core, the implementations of these two parts of the protocol live in largely separate areas of the code. At a high level:
- The “net processing” layer includes application logic such as responding to ping with a pong containing the same nonce and disconnecting a peer that sends a transaction after we told them not to.
- The “net” layer “below” net processing abstracts away the details of converting messages to/from bytes that are sent and received on the connection.
Within the ProcessMessage function which responds to pings with pongs, The ping is a CNetMessage retrieved from CNode::PollMessage, and the pong is sent by calling CConnman::PushMessage.
CConnman::SocketSendData is the function that actually sends data over the connection. It is called by the socket handler thread and sometimes by message handler thread from PushMessage (an “opportunistic write”) if the message queue was empty.
This PR is a prerequisite of PR #28196 implementing BIP324. The complete specification of BIP324 is out of scope for this Review Club, but please make sure to read the Introduction and Goals.
BIP324 introduces a new transport protocol version. While nodes still send the same types of messages and respond to pings with pongs in the same way, the way messages are converted to/from bytes is very different.
This PR creates an abstract Transport class, combining the current TransportDeserializer and TransportSerializer classes into a single interface for converting bytes to/from messages.
A combined interface means that V2Transport can have state shared between the serializer and deserializer, such as the BIP324Cipher for encrypting/decrypting messages. It may be helpful to see how V2Transport is implemented in that PR. Also see a previous approach, which kept the serializer and deserializer separate and stored state in the CNode object.

Questions

We’ll be covering the General questions on the first meeting on Wednesday, and the PR-specific questions on the second meeting on Thursday.

General

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
What is the distinction between “net” and “net processing”? What data structures and tasks might we associate with each of them?
Does BIP324 require changes to the “net” layer, the “net processing” layer, or both? Does it affect policy or consensus?
Is {CNetMessage, CMessageHeader, CSerializedNetMsg, BytesToSend} used in sending, receiving, or both?
CNetMsgMaker and Transport both “serialize” messages. What is the difference in what they do?
In the process of turning an application object like a CTransactionRef into bytes sent over the wire, what needs to happen? What data structures are involved? (Hint: start here, where a transaction is sent in response to a getdata, and trace the calls until this line calling Sock::Send).
The RPC getpeerinfo returns a map of bytessent_per_msg and bytesrecv_per_msg. Add a print(self.nodes[0].getpeerinfo()[0]['bytessent_per_msg']) to one of the subtests in test/functional/p2p_sendtxrcncl.py after peers send sendtxrcncl to each other, e.g. here. BIP 330 specifies that peers send only 1 “sendtxrcncl” message to each other, and each consists of a 4B version and a 8B salt, but getpeerinfowill report a number much higher than 12B. What is the number, and what is included in those bytes? (Hint: see AccountForSentBytes. Where is it called?)
After PushMessage returns (e.g. on this line), have we sent the bytes corresponding to this message to the peer already (yes/no/maybe)? Why?
Which threads access CNode::vSendMsg? (Hint: start by searching for vSendMsg within src/net.cpp, then search for the functions which access it, then the functions which call those functions… until you find where the std::threads are created).

PR-specific

Can you summarize what this PR is doing?
What does it mean for CNetMessage to be “transport protocol agnostic”?
Prior to this PR, why are TransportSerializer and TransportDeserializer different classes? Why should they be combined? What are the downsides to combining them, if any?
What is the difference in approach with #24545? Do you think this approach is preferable?
Commit 27f9ba23 adds an internal m_recv_mutex for protecting receiving state. The commit message also mentions that an external mutex is needed. Why? (Hint: is it ok for vRecv to change in between calling ReceivedMessageComplete and GetReceivedMessage?)
At a high level, what does the added fuzz test do? What kinds of bugs would it catch? What kinds of bugs would it not catch?
Commit bb4aab90 moves the code using m_transport to convert the message to bytes from PushMessage to SocketSendData. What are the behavior changes in this commit?
Commit bb4aab90 mentions “removing the assumption that a message can always immediately be converted to wire bytes.” What happens in SocketSendData if m_transport isn’t able to convert messages?
Commit bb4aab90 mentions one thread being “more of a computational bottleneck” than the other. Which thread is this, and what runs on it?

Meeting Log

Meeting 1

17:00

<glozow> #startmeeting

17:00

<abubakarsadiq> hello

17:00

<dberkelmans> Hi

17:01

<effexzi> Hi every1

17:01

<glozow> hello, welcome to PR review club!

17:01

<lightlike> Hi

17:01

<mayrf> Hi

17:01

<sipa> hi

17:01

<glozow> This week's PR is #28165, notes are in the usual place: https://bitcoincore.reviews/28165

17:01

<BrandonOdiwuor> Hello

17:01

<glozow> Did anybody get a chance to review the PR or look at the notes? how about a y/n

17:02

<mayrf> n

17:02

<abubakarsadiq> read the notes recently, but did not review the PR

17:02

<dberkelmans> No read half

17:03

<BrandonOdiwuor> y

17:03

<glozow> If you didn't have a chance to review the PR, today's questions are more general and you should be able to figure them out while we're going through the questions. So don't worry :) Tomorrow's questions will be PR-specific

17:03

<lightlike> y

17:03

<sipa> y

17:04

<michaelfolkson> hi

17:04

<michaelfolkson> y

17:04

<glozow> great, let's get started with the questions. And feel free to ask your own questions at any time!

17:04

<glozow> What is the distinction between “net” and “net processing”? What data structures and tasks might we associate with each of them?

17:05

<BrandonOdiwuor> Net sits at the bottom of the networking stack and handles low-level communication between peers while net_processing builds on top of the net layer and handles the processing and validation of messages from net layer

17:06

<glozow> BrandonOdiwuor: great answer!

17:07

<glozow> To make this more concrete, can anybody name an example of a class or function that we'd associate with net processing and not net?

17:08

<glozow> And can somebody name a class or function that we'd associate with net, and not net processing?

17:08

<abubakarsadiq> peermanager is associated with net_processing

17:08

<glozow> abubakarsadiq: yup I agree!

17:10

<glozow> Anyone wanna name something in net?

17:11

<abubakarsadiq> classes like CNode

17:11

<BrandonOdiwuor29> ReceiveMsgBytes and MarkReceivedMsgForProcessing by net

17:11

<glozow> Great answers!

17:11

<michaelfolkson> CNetCleanup

17:11

<instagibbs> CConnman\

17:13

<BrandonOdiwuor29> ProcessMessages and PollMessages in net_processing

17:13

<glozow> I think it's also worth pointing out here that the line is somewhat blurry. `CConnman::AttemptToEvictConnection` (which is in connman/net) uses "application" logic like whether a peer provided us a valid transaction recently to decide whether we should consider evicting a peer.

17:15

<glozow> Next question: Does BIP324 require changes to the “net” layer, the “net processing” layer, or both? Does it affect policy or consensus?

17:17

<BrandonOdiwuor29> I think it mostly requires changes in the net layer which deals with communication between peers

17:18

<michaelfolkson> net_processing not touched in either PR

17:18

<michaelfolkson> (.cpp)

17:18

<glozow> BrandonOdiwuor29: I agree!

17:19

<abubakarsadiq> I dont think this is a consensus change

17:19

<instagibbs> hopefully not :)

17:19

<glozow> abubakarsadiq: correct. at least we hope so

17:20

<glozow> ooh, fun question: what kind of implementation bug could result in the PR being an (accidental) consensus change?

17:20

<abubakarsadiq> consensus change will require all nodes to upgrade to the new version

17:20

<glozow> ah, maybe not this PR. I mean "bug in the implementation of BIP324"

17:22

<michaelfolkson> It has got to be creative surely, this doesn't impact validation at all

17:22

<instagibbs> bug which restricts max message size less than 4MB, resulting in a blockweight softfork

17:22

<instagibbs> (assuming your only connections are v2)

17:22

<sipa> instagibbs: bing bing bing

17:22

<instagibbs> or if the bug infected v1

17:22

<glozow> instagibbs: bingo. You could imagine a bug in deserialization of a block where you'd reject something consensus-valid

17:22

<instagibbs> I was checking that specific logic today :)

17:23

<glozow> Is {CNetMessage, CMessageHeader, CSerializedNetMsg, BytesToSend} used in sending, receiving, or both?

17:23

<BrandonOdiwuor29> CNetMessage is used in Receiving while CSerializedNetMsg is mostly used in sending. BytesToSend is also used in sending

17:23

<BrandonOdiwuor29> CMessageHeader is used in both sending and receiving

17:24

<sipa> CMessageHeader is really only used directly for V1, though some constants are reused in V2 too

17:24

<glozow> BrandonOdiwuor29: nice prep :D

17:25

<glozow> CNetMsgMaker and Transport both “serialize” messages. What is the difference in what they do?

17:26

<michaelfolkson> For that bug it would need to be a ~4MB transaction to trigger it?

17:26

<sipa> no

17:26

<instagibbs> michaelfolkson just a block that's above the size of the buggy limit

17:26

<sipa> if 4M block messages can't get through, that'd be a problem

17:27

<sipa> though compact blocks can partially mitigate it

17:27

<instagibbs> OOB *rdinals

17:27

<_aj_> would be fun for IBD

17:30

<glozow> Hint: `CNetMsgMaker` creates a `CSerializedNetMsg` (declared here https://github.com/bitcoin/bitcoin/blob/cf421820f50abcbd4f2709f200d3a78fb69fc698/src/net.h#L107)

17:30

<glozow> What does `Transport` do?

17:32

<_aj_> being a bit pedantic: Transport encodes something that's already serialized? (it takes a Span<uint8_t> which as already been serialized)

17:32

<lightlike> CNetMsgMaker: Performs the serialization of data structures into bytes, Transport adds the header and actually sends it

17:33

<sipa> i've tried to get the "serialize" terminology out of most places related to the transport-network part

17:33

<_aj_> i guess serializednetmsg has a message and payload which isn't completely serialized

17:33

<glozow> lightlike: thank you!

17:33

<glozow> Yeah I added this question when I saw https://github.com/bitcoin/bitcoin/pull/28165/files#r1301596054

17:33

<sipa> i think at some point in the codebase the serialization of data structure to bytes, and the addition of message headers, were done at the same time, so it was all call "serialization"

17:33

<sipa> *called

17:34

<sipa> but using "serialization" to refer to the "turn message type/payloads into network packets" is pretty confused

17:35

<_aj_> "class CNode: { public: /** Transport serializer/deserializer"

17:35

<sipa> sigh!

17:35

<_aj_> :D

17:36

<instagibbs> taking this as approach NACK

17:36

<glozow> The next question is about this process (and mostly an exercise to the reader): in the process of turning an application object like a `CTransactionRef` into bytes / network packets, what happens? What data structures does it turn into in the process?

17:37

<glozow> The exercise is to grep/ctags your way from https://github.com/bitcoin/bitcoin/blob/ab42b2ebdbf61225e636e4c00068fd29b2790d41/src/net_processing.cpp#L2334-L2335 to https://github.com/bitcoin/bitcoin/blob/ab42b2ebdbf61225e636e4c00068fd29b2790d41/src/net.cpp#L949

17:39

<glozow> (feel free to post your answer at any time but I'll move on to the next question)

17:40

<glozow> Next question is also an exercise. The RPC getpeerinfo returns a map of bytessent_per_msg and bytesrecv_per_msg. Add a print(self.nodes[0].getpeerinfo()[0]['bytessent_per_msg']) to one of the subtests in test/functional/p2p_sendtxrcncl.py after peers send sendtxrcncl to each other. What is the number of bytes sent for "sendtxrcncl" ?

17:40

<glozow> e.g. you could add the line here: https://github.com/bitcoin/bitcoin/blob/ab42b2ebdbf61225e636e4c00068fd29b2790d41/test/functional/p2p_sendtxrcncl.py#L75

17:41

<instagibbs> 36?

17:41

<lightlike> msgMaker.Make() serializes the CTransactionRef message, calling SerializeTransaction(), then PushMessage puts the serialized msg into the vSendMsg queue, then SocketSendData adds a header/checksum (after the changes from this PR) and asks transport for the next package to send, and finally calls m_sock->Send

17:41

<glozow> instagibbs: I got the same thing!

17:42

<glozow> But BIP330 specifies that sendtxrcncl just has a 4B version and 8B salt! Where does the other 24B come from?

17:42

<sipa> 36 bytes sounds correct

17:42

<glozow> lightlike: 👑 you dropped this

17:43

<sipa> hint: what is CMessageHeader::HEADER_SIZE?

17:43

<glozow> (hint: `CMessageHeader` is defined here: https://github.com/bitcoin/bitcoin/blob/d2ccca253f4294b8f480e1d192913f4985a1af08/src/protocol.h#L26)

17:44

<glozow> sipa: nice hint

17:44

<BrandonOdiwuor> CMessageHeader::HEADER_SIZE is 24 bytes

17:44

<sipa> BrandonOdiwuor: correct

17:45

<glozow> correct! And what makes up those 24 bytes?

17:45

<BrandonOdiwuor> MESSAGE_START_SIZE(4) + COMMAND_SIZE(12) + MESSAGE_SIZE_SIZE(4) + CHECKSUM_SIZE(4)

17:46

<glozow> BrandonOdiwuor: yep!

17:46

<sipa> so what are the 36 bytes of a sendtxrcncl on the wire?

17:49

<glozow> I guess `print(msg)` in `P2PConnection::_on_data`?

17:50

<sipa> way too practical

17:50

<instagibbs> making stuff up instead of checking: network magic + "sendtxrcncl"(padded to 12 bytes?) + 4 byte size of payload + 4 byte checksum + 4 byte version + 8 byte salt

17:51

<sipa> ding ding ding

17:51

<glozow> I was assuming you wanted the actual bytes 😂

17:51

<sipa> oh.

17:51

<sipa> Sure!

17:52

<instagibbs> header followed by the payload of the command itself, in order

17:52

<glozow> anyway next question

17:52

<glozow> After `PushMessage` returns, have we sent the bytes corresponding to this message to the peer already (yes/no/maybe)? Why?

17:53

<sipa> nice question

17:54

<instagibbs> maybe, if the queue was empty

17:55

<_aj_> yes: we(net_processing) don't have to do anything else to get it to go; no: it's extremely unlikely to have been received by the recipient by the time that function returns; maybe: if all the queues are empty it will have made it to the kernel socket layer; but if some of the queues arent, then it will still be waiting on those to drain further before getting to the OS

17:55

<sipa> _aj_: nice, all 3

17:55

<_aj_> (i also have "can you repeat the question" playing in my head)

17:56

<sipa> one nit: even if all the queues were empty, but the message size exceeds the OS's send buffer size, only the part that fits will make it to the socket layer

17:56

<glozow> nice! I was going for "maybe" for the scenarios described

17:57

<glozow> last question: Which threads access `CNode::vSendMsg`?

18:00

<lightlike> ThreadMessageHandler if it gets sent "optimistically", ThreadSocketHandler if it gets queued and picked up later

18:00

<glozow> lightlike: yes!

18:01

<glozow> This will come in handy tomorrow when we go through questions about the PR 🧠

18:01

<sipa> and i think that's all; i don't think RPC or GUI or so even access those

18:01

<glozow> Thanks everyone for coming today, we managed to get through all the questions (yay!)

18:02

<_aj_> sipa: hmm, bump net.core.wmem_max and net.core.wmem_default up to 5M or something then?

18:02

<glozow> Remember we're back tomorrow at the same time (17UTC), and we'll dig a bit deeper into the PR

18:02

<glozow> #endmeeting

Meeting 2

Meeting 2 was cancelled due to low attendance