💡

Bitcoin Core PR Review Club

BIP 157: Serve cfcheckpt requests (p2p)

May 7, 2020

https://github.com/bitcoin/bitcoin/pull/18877

Host: jnewbery - PR authors: jimpo , jnewbery

The PR branch HEAD was 967e2b10 at the time of this review club meeting.

This was a special review club to dive into the implementation of BIP 157. We specifically reviewed the first PR from that implementation: PR 18877.

There were no notes and questions for this meeting. See the notes and questions from PR 16442 for additional background information.

Meeting Log

13:00

<jnewbery> #startmeeting

13:00

<jnewbery> hi folks, welcome to Review Club special BIP 157 edition :)

13:00

<theStack> hi!

13:00

<fjahr> hi

13:00

<gleb> hi

13:01

<nehan> hi

13:01

<jonatack_> ja: i think that was added to /topic because the very first nothing was not logged, and then when it was decided to add the meeting logs to the website, the warning was added.

13:01

<jnewbery> Like I metioned yesterday, I'm making an effort to get the BIP 157 work done by jimpo merged. Tracking PR is here: https://github.com/bitcoin/bitcoin/pull/18876

13:01

<jonatack> hi

13:02

<jnewbery> I've split it up into more easily reviewable chunks. The first is here: https://github.com/bitcoin/bitcoin/pull/18877

13:02

<jnewbery> This is slightly different from other review club meetings. I haven't written notes and questions, but I wanted to make a space available for everyone to ask questions and really get deep into the implementation.

13:03

<jnewbery> We did review the full PR about 6 months ago. pinheadmz did a great job hosting: https://bitcoincore.reviews/16442.html

13:03

<jnewbery> so if you need a refresher on what the full PR is doing, I recommend re-reading those notes and logs

13:04

<jnewbery> The first PR I split out is just serving cfcheckpts, and doesn't signal NODE_COMPACT_FILTERS in the version message

13:04

<jnewbery> it's a very limited subset of the functionality, but it can be used for testing and serving checkpts to pre-configured clients

13:05

<jnewbery> Does anyone have any questions about the full PR or the sub-PR?

13:05

<theStack> jnewbery: any concrete reason why you chose the handling of (get)cfcheckpts as first reviewable chunk? is it the simplest one?

13:06

<nehan> jnewbery: i am also interested in how you decided to break of this piece

13:06

<nehan> *off

13:06

<jnewbery> theStack: to some extent yes. I expect the normal client operation will be: 1. fetch cfcheckpts, 2. fetch all other cfheaders, 3. fetch cfilters.

13:07

<jnewbery> That's how the test is structured at least

13:07

<jnewbery> so in that order of operations, supporting getcfcheckpt and sending the cfcheckpt message is the first thing to do

13:07

<fjahr> It's kind of nit but I am a bit annoyed that it's sometimes block filter and sometimes compact filter. Is it too late to change naming of the flags? I think in the code it's ok because devs can make the connection. But for users I may be confusing. Maybe it should be `-peerblockfilter` to match `-blockfilterindex`?

13:09

<jnewbery> fjahr: definitely not too late, and definitely a good question to ask. Once things are merged, we're stuck with bad names forever, so we should get it right first time

13:10

<jnewbery> -peerblockfilter seems reasonable. I'll think about it some more and maybe update the flag name

13:11

<fjahr> sounds good!

13:11

<jnewbery> Something that I wondered about as I was reviewing, was the synchronization across threads. Is that something that anyone here thought about?

13:12

<jnewbery> There are two threads to think about. The message hander thread (https://github.com/bitcoin/bitcoin/blob/f54753293fe7355e4280944d766f22054b560ba1/src/net.cpp#L2017). This is the thread that reads messages from the receive buffer and processes them for each peer.

13:12

<jonatack> jnewbery: i'm personally undecided whether to review this PR or not (and for the same reason did not do so for the review club last fall). mind giving an elevator pitch for this PR?

13:13

<jonatack> if this is the wrong moment, not a problem ;)

13:13

<jnewbery> When we receive a GETCFCHECKPT message from a peer, the message handler thread is the one which is calling into ProcessMessages() and doing the work to serve the CFCHECKPT message

13:14

<jonatack> (because if the why isn't clear, the how doesn't matter, and the PR description does not address the why)

13:15

<jnewbery> The other thread we have to think about is the scheduler thread (https://github.com/bitcoin/bitcoin/blob/f54753293fe7355e4280944d766f22054b560ba1/src/init.cpp#L1312), which services async callback in the validationinterface (https://github.com/bitcoin/bitcoin/blob/f54753293fe7355e4280944d766f22054b560ba1/src/validationinterface.h#L84-L161)

13:15

<nehan> do all messages use the message handler thread? how is that handled across messages?

13:16

<fjahr> jonatack: Is it about block filters in general or this PR in particular? i.e. are you more referring to #18876 or #18877?

13:16

<nehan> i guess i'm asking if there's some guidelines on "don't tie up these threads ever"

13:16

<jnewbery> when we receive a block and it's connected, validation will send a BlockConnected signal

13:17

<jnewbery> the scheduler thread will later service that by calling the BlockConnected callback on all the components that subscribed. Here's where the indexer's callback is: https://github.com/bitcoin/bitcoin/blob/f54753293fe7355e4280944d766f22054b560ba1/src/index/base.cpp#L191

13:17

<jnewbery> nehan: yes, all messages from the network are handled by the message handler thread

13:18

<jnewbery> nehan: that's a good question. I haven't seen that guideline anywhere, but it makes sense, right? We shouldn't block the message handler thread, since it's what's doing all the important work

13:18

<theStack> am i right in the assumption that GETCFCHECKPT is just a convenience feature and the same could be achieved by request multiple single GETCFHEADERS?

13:18

<theStack> *requesting

13:18

<nehan> jnewebery: let me clarify my question which was phrased poorly. where is the bulk of the "work" for a message usually handled? is the idea spawn a thread to complete the work for this message, or is it add a bunch of things to different queues that other threads will handle?

13:18

<theStack> (needing much more single network messages and communication rounds of course)

13:18

<jnewbery> jonatack: good question about motivation. I'll get to it in a bit :)

13:19

<sipa> nehan: message is received from network by network thread, put on the received messages queue, where it is processed in the message handler thread

13:19

<jnewbery> nehan: no, the bulk of the work is done by the message handler thread. The only place we farm out work to other threads is script checking for blocks

13:20

<jnewbery> message processing in bitcoin core is almost entirely single-threaded

13:20

<sipa> (for block validation there are separate but fixed script verification threads, which verify all the scripts in parallel for the whole block; for all other things the bulk of the work is in the message handler thread)

13:20

<nehan> sipa: jnewbery: thanks! so to make sure i'm clear: messages are never processed in parallel, because there is only one message handler thread?

13:21

<jnewbery> nehan: correct!

13:21

<sipa> correct; and lots of assumptions (in Core, but also other software) depend on serial processing of messages

13:21

<nehan> jnewbery: sipa: cool. helpful invariant to know

13:21

<jnewbery> there's some low-level preprocessing done by the socket thread, eg we verify the checksum in the socket thread, but all application-level processing is in the message handler thread

13:21

<sipa> there used to be this approach of sending a ping after sending another message to know whether the previous one was finished processing or not

13:22

<jnewbery> someday it'd be nice to have separate threads for net_processing and validation, but we're not there yet

13:22

<sipa> by some software

13:22

<jnewbery> also that assumption is all over our functional tests. We synchronize messages with a ping

13:23

<jnewbery> theStack: The nice thing about the cfcheckpts is that because the headers form a hash chain, you can get your checkpts from somewhere trusted, and then fetch the other headers trustlessly and verify that they connect

13:25

<jonatack> fjahr: yes, the larger direction, beyond being an improvement over bip37

13:25

<jnewbery> you could get any cfheaders from a trusted source, but having the checkpts as a standard in the protocol is nice, I think

13:26

<theStack> jnewbery: oh, i just see now that there is actually a difference between "filter header" and "filter hash"; and that CFHEADERS includes more than just headers

13:26

<jnewbery> they're close enough (1000 blocks), that you can fill in the other headers with a single cfheaders request, and since it's a standard sequence of block heights, the cfcheckpt can be cached and delivered from anywhere

13:28

<jnewbery> theStack: yes, the definition of the header is here: https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki#filter-headers. It's the hash of the filter and previous header.

13:28

<jnewbery> so the cfheaders message just provides the first header, and then the hashes of the subsequent filters, so you can reconstuct the headers yourself (and verify that they connect to your checkpoints)

13:29

<jnewbery> ok, back to synchronization. My two questions when thinking about this were: is there the possibility of a data race, and is there the possibility of the message handler thread being blocked?

13:29

<jnewbery> anyone have thoughts about those?

13:31

<fjahr> I did not think about this before but maybe during IBD?

13:32

<jnewbery> fjahr: what about IBD?

13:32

<fjahr> could it be blocked if we get a request during ibd?

13:32

<theStack> fjahr: hmm wouldn't the call to BlockRequestAllowed() just return false if the requested block is not existing yet on the node?

13:33

<theStack> (or maybe before, on LookupBlockIndex)

13:34

<jnewbery> fjahr: perhaps. IBD is when the most messages are going through the validation interface.

13:34

<theStack> s/block/stop block/

13:35

<jnewbery> but I haven't done any testing. The indexer's BlockConnected callback seems to return pretty quickly, so perhaps it isn't a concern

13:35

<jnewbery> theStack: but you could ask for a cfilter for an early block during IBD

13:36

<jnewbery> perhaps we should just not serve filters if we're not out of IBD?

13:38

<theStack> what exactly defines the state of "during IBD"? isn't it semantically pretty much the same, just with an (indefinitely ;-)) smaller amount of time between fetching blocks?

13:38

<sipa> during IBD means "while IsInitialBlockDownload() returns true"

13:38

<jnewbery> For the data race concern, you should see that the message handler thread is only reading from leveldb (https://github.com/jnewbery/bitcoin/blob/2a15f9943c065547e896aa221bcf26e7db8bd318/src/index/blockfilterindex.cpp#L390). The schduler thread writes to that database.

13:39

<sipa> it's a heuristic, but it affects lots of things in Bitcoin Core's behavior

13:39

<jnewbery> leveldb's own locking should stop any data races (although I haven't actually looked into that)

13:39

<michaelfolkson> Sorry. Basic question.... Do we forward on transactions/blocks that we know about to other peers while we are still in IBD?

13:40

<jnewbery> michaelfolkson: good question! I think the answer is no, but I'd need to double check that that's always true

13:41

<sipa> jnewbery: https://github.com/google/leveldb/blob/master/doc/index.md#concurrency

13:41

<theStack> sipa: ah, good to know. i didn't expect that

13:41

<jnewbery> thanks sipa!

13:42

<jnewbery> the next PR in the sequence adds an in-memory cache, so if we have multiple threads reading/writing that, we'd need our own locking. See https://github.com/bitcoin/bitcoin/pull/18876/commits/4f159aa8e29c059586bfea51b99074c2b76d196e

13:42

<theStack> because i always saw it as permanent IBD with just extended time in-between blocks :)

13:43

<jnewbery> theStack: no! IBD is very different. It's an in-memory latch. We start off in IBD, when we are at the tip we consider that we're no longer in IBD, and then we never re-enter IBD (unless you restart the node)

13:44

<michaelfolkson> If the answer to transactions/blocks is no then I would guess serving filters during IBD should also be no

13:44

<sipa> i think the most important impact is that during IBD we prefer faster synchronization over partition resistance (we'll kick peers that are too slow to feed us blocks)

13:45

<jnewbery> jonatack: the motivation is that BIP 157 is a better mechanism for light clients, including LN clients, than other protocols

13:45

<theStack> jnewbery: so if a node has finished IBD, the network connections gets -- for whatever reason -- lost for a long time, then the catching up of blocks has a different behaviour?

13:46

<jnewbery> theStack: that is correct

13:47

<sipa> unless you're talking about days... weeks... it doesn't matter much

13:48

<theStack> jnewbery: sipa: okay, that's very interesting. i saw that there is a function for IBD state but assumed it was only used as information for the user and otherwise didn't influence the behaviour

13:48

<sipa> no, i believe it's actually not even exposed to the user

13:48

<jnewbery> I think it's in getblockchaininfo?

13:48

<sipa> ah yes, it's there

13:49

<sipa> added in 0.16

13:49

<jnewbery> (mostly for testing and developers)

13:50

<jnewbery> 10 minutes!

13:50

<jonatack> jnewbery: (heh i began reviewing) right, it is much better than BIP 37, I'm not yet sure if that is a good thing(tm) yet.

13:50

<jnewbery> I have a call at 2, so I'll need to get going promptly

13:51

<jonatack> not to bikeshed on that here; i'll have to work that out for myself

13:51

<jnewbery> I think a good standard for thinking about these things is:

13:52

<jnewbery> 1. Does it have a clear use case (I think in this case, the answer is yes because lightning people want it)

13:52

<jonatack> (HW people too, it seems?)

13:52

<jnewbery> 2. Does it impost a significant or unreasonable resource burden on our software (I think the answer is no here)

13:53

<jnewbery> 3. Is the code correct and maintainable (that's what I need your help for)

13:53

<theStack> just a last note to the IBD topic: i'd agree then to not serve filters during IBD -- not even block headers are served during IBD as i just saw

13:53

<sipa> I think there is a 0. Does this belong in the P2P protocol, or can it be done better in another way - which I think is where some of the controversy is

13:53

<jnewbery> From a software maintainability standpoint, Jimpo's implementation is far better than a lot of the other legacy functionality we have to support

13:54

<michaelfolkson> You mean outside of Core sipa?

13:54

<jonatack> jnewbery: agree, am nevertheless zooming out further: should we be doing this, and if yes, this way?

13:54

<jonatack> sipa: yes

13:55

<sipa> michaelfolkson: no; i mean in the P2P protocol at all, by whatever software

13:55

<michaelfolkson> Ah ok thanks

13:55

<sipa> (of course, we don't control what other people do in their software - but clearly we shouldn't add things that we don't believe belong there)

13:55

<fjahr> jonatack: there is a lot more detail to it but in short I think that if people want to not run a node and don't care about privacy too much then I rather have them run a light client than they keep their money on sth. like coinbase. No doing BIP157 would not make more people run a full node.

13:56

<jnewbery> sipa: I think as long as we satisfy (2) and (3), then (0) doesn't matter so much. Or maybe I should say, 'belongs' in (0) is very subjective. (2) and (3) are also subjective, but not as much.

13:56

<fjahr> To me it is providing an alternative that hopefully will attract more people for who it is an improvement, not a 'downgrade' :)

13:57

<jonatack> yes. i have to work that out. anyway, am reviewing. thanks fjahr jnewbery sipa.

13:57

<sipa> jnewbery: i disagree, but let's keep it at that

13:58

<jonatack> jnewbery: do you want to add this log to the website?

13:58

<jnewbery> ok, I've gotta go. Thanks everyone for coming. I hope you found it useful!

13:59

<jonatack> i can maybe append it to the previous bip157 session page

13:59

<michaelfolkson> Yup very interesting, thanks jnewbery

13:59

<jnewbery> If the first one gets merged, then perhaps we can do another meeting for the next PR in the sequence

13:59

<fjahr> thanks jnewbery, will try to think a little more about threads

13:59

<theStack> thanks for hosting jnerbery! will there be another bip157 review meeting next thursday?

13:59

<theStack> *jnewbery

13:59

<michaelfolkson> haha

13:59

<jonatack> thanks everyone, this is helpful

14:00

<jnewbery> theStack: only if 18877 is merged. No point in reviewing further ahead

14:00

<jnewbery> ok, really going now. Bye!

14:00

<jnewbery> #endmeeting