BIP 157: Serve cfcheckpt requests (p2p)

https://github.com/bitcoin/bitcoin/pull/18877

Host: jnewbery  -  PR authors: jimpo , jnewbery

The PR branch HEAD was 967e2b10 at the time of this review club meeting.

This was a special review club to dive into the implementation of BIP 157. We specifically reviewed the first PR from that implementation: PR 18877.

There were no notes and questions for this meeting. See the notes and questions from PR 16442 for additional background information.

Meeting Log

  113:00 <jnewbery> #startmeeting
  213:00 <jnewbery> hi folks, welcome to Review Club special BIP 157 edition :)
  313:00 <theStack> hi!
  413:00 <fjahr> hi
  513:00 <gleb> hi
  613:01 <nehan> hi
  713:01 <jonatack_> ja: i think that was added to /topic because the very first nothing was not logged, and then when it was decided to add the meeting logs to the website, the warning was added.
  813:01 <jnewbery> Like I metioned yesterday, I'm making an effort to get the BIP 157 work done by jimpo merged. Tracking PR is here: https://github.com/bitcoin/bitcoin/pull/18876
  913:01 <jonatack> hi
 1013:02 <jnewbery> I've split it up into more easily reviewable chunks. The first is here: https://github.com/bitcoin/bitcoin/pull/18877
 1113:02 <jnewbery> This is slightly different from other review club meetings. I haven't written notes and questions, but I wanted to make a space available for everyone to ask questions and really get deep into the implementation.
 1213:03 <jnewbery> We did review the full PR about 6 months ago. pinheadmz did a great job hosting: https://bitcoincore.reviews/16442.html
 1313:03 <jnewbery> so if you need a refresher on what the full PR is doing, I recommend re-reading those notes and logs
 1413:04 <jnewbery> The first PR I split out is just serving cfcheckpts, and doesn't signal NODE_COMPACT_FILTERS in the version message
 1513:04 <jnewbery> it's a very limited subset of the functionality, but it can be used for testing and serving checkpts to pre-configured clients
 1613:05 <jnewbery> Does anyone have any questions about the full PR or the sub-PR?
 1713:05 <theStack> jnewbery: any concrete reason why you chose the handling of (get)cfcheckpts as first reviewable chunk? is it the simplest one?
 1813:06 <nehan> jnewbery: i am also interested in how you decided to break of this piece
 1913:06 <nehan> *off
 2013:06 <jnewbery> theStack: to some extent yes. I expect the normal client operation will be: 1. fetch cfcheckpts, 2. fetch all other cfheaders, 3. fetch cfilters.
 2113:07 <jnewbery> That's how the test is structured at least
 2213:07 <jnewbery> so in that order of operations, supporting getcfcheckpt and sending the cfcheckpt message is the first thing to do
 2313:07 <fjahr> It's kind of nit but I am a bit annoyed that it's sometimes block filter and sometimes compact filter. Is it too late to change naming of the flags? I think in the code it's ok because devs can make the connection. But for users I may be confusing. Maybe it should be `-peerblockfilter` to match `-blockfilterindex`?
 2413:09 <jnewbery> fjahr: definitely not too late, and definitely a good question to ask. Once things are merged, we're stuck with bad names forever, so we should get it right first time
 2513:10 <jnewbery> -peerblockfilter seems reasonable. I'll think about it some more and maybe update the flag name
 2613:11 <fjahr> sounds good!
 2713:11 <jnewbery> Something that I wondered about as I was reviewing, was the synchronization across threads. Is that something that anyone here thought about?
 2813:12 <jnewbery> There are two threads to think about. The message hander thread (https://github.com/bitcoin/bitcoin/blob/f54753293fe7355e4280944d766f22054b560ba1/src/net.cpp#L2017). This is the thread that reads messages from the receive buffer and processes them for each peer.
 2913:12 <jonatack> jnewbery: i'm personally undecided whether to review this PR or not (and for the same reason did not do so for the review club last fall). mind giving an elevator pitch for this PR?
 3013:13 <jonatack> if this is the wrong moment, not a problem ;)
 3113:13 <jnewbery> When we receive a GETCFCHECKPT message from a peer, the message handler thread is the one which is calling into ProcessMessages() and doing the work to serve the CFCHECKPT message
 3213:14 <jonatack> (because if the why isn't clear, the how doesn't matter, and the PR description does not address the why)
 3313:15 <jnewbery> The other thread we have to think about is the scheduler thread (https://github.com/bitcoin/bitcoin/blob/f54753293fe7355e4280944d766f22054b560ba1/src/init.cpp#L1312), which services async callback in the validationinterface (https://github.com/bitcoin/bitcoin/blob/f54753293fe7355e4280944d766f22054b560ba1/src/validationinterface.h#L84-L161)
 3413:15 <nehan> do all messages use the message handler thread? how is that handled across messages?
 3513:16 <fjahr> jonatack: Is it about block filters in general or this PR in particular? i.e. are you more referring to #18876 or #18877?
 3613:16 <nehan> i guess i'm asking if there's some guidelines on "don't tie up these threads ever"
 3713:16 <jnewbery> when we receive a block and it's connected, validation will send a BlockConnected signal
 3813:17 <jnewbery> the scheduler thread will later service that by calling the BlockConnected callback on all the components that subscribed. Here's where the indexer's callback is: https://github.com/bitcoin/bitcoin/blob/f54753293fe7355e4280944d766f22054b560ba1/src/index/base.cpp#L191
 3913:17 <jnewbery> nehan: yes, all messages from the network are handled by the message handler thread
 4013:18 <jnewbery> nehan: that's a good question. I haven't seen that guideline anywhere, but it makes sense, right? We shouldn't block the message handler thread, since it's what's doing all the important work
 4113:18 <theStack> am i right in the assumption that GETCFCHECKPT is just a convenience feature and the same could be achieved by request multiple single GETCFHEADERS?
 4213:18 <theStack> *requesting
 4313:18 <nehan> jnewebery: let me clarify my question which was phrased poorly. where is the bulk of the "work" for a message usually handled? is the idea spawn a thread to complete the work for this message, or is it add a bunch of things to different queues that other threads will handle?
 4413:18 <theStack> (needing much more single network messages and communication rounds of course)
 4513:18 <jnewbery> jonatack: good question about motivation. I'll get to it in a bit :)
 4613:19 <sipa> nehan: message is received from network by network thread, put on the received messages queue, where it is processed in the message handler thread
 4713:19 <jnewbery> nehan: no, the bulk of the work is done by the message handler thread. The only place we farm out work to other threads is script checking for blocks
 4813:20 <jnewbery> message processing in bitcoin core is almost entirely single-threaded
 4913:20 <sipa> (for block validation there are separate but fixed script verification threads, which verify all the scripts in parallel for the whole block; for all other things the bulk of the work is in the message handler thread)
 5013:20 <nehan> sipa: jnewbery: thanks! so to make sure i'm clear: messages are never processed in parallel, because there is only one message handler thread?
 5113:21 <jnewbery> nehan: correct!
 5213:21 <sipa> correct; and lots of assumptions (in Core, but also other software) depend on serial processing of messages
 5313:21 <nehan> jnewbery: sipa: cool. helpful invariant to know
 5413:21 <jnewbery> there's some low-level preprocessing done by the socket thread, eg we verify the checksum in the socket thread, but all application-level processing is in the message handler thread
 5513:21 <sipa> there used to be this approach of sending a ping after sending another message to know whether the previous one was finished processing or not
 5613:22 <jnewbery> someday it'd be nice to have separate threads for net_processing and validation, but we're not there yet
 5713:22 <sipa> by some software
 5813:22 <jnewbery> also that assumption is all over our functional tests. We synchronize messages with a ping
 5913:23 <jnewbery> theStack: The nice thing about the cfcheckpts is that because the headers form a hash chain, you can get your checkpts from somewhere trusted, and then fetch the other headers trustlessly and verify that they connect
 6013:25 <jonatack> fjahr: yes, the larger direction, beyond being an improvement over bip37
 6113:25 <jnewbery> you could get any cfheaders from a trusted source, but having the checkpts as a standard in the protocol is nice, I think
 6213:26 <theStack> jnewbery: oh, i just see now that there is actually a difference between "filter header" and "filter hash"; and that CFHEADERS includes more than just headers
 6313:26 <jnewbery> they're close enough (1000 blocks), that you can fill in the other headers with a single cfheaders request, and since it's a standard sequence of block heights, the cfcheckpt can be cached and delivered from anywhere
 6413:28 <jnewbery> theStack: yes, the definition of the header is here: https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki#filter-headers. It's the hash of the filter and previous header.
 6513:28 <jnewbery> so the cfheaders message just provides the first header, and then the hashes of the subsequent filters, so you can reconstuct the headers yourself (and verify that they connect to your checkpoints)
 6613:29 <jnewbery> ok, back to synchronization. My two questions when thinking about this were: is there the possibility of a data race, and is there the possibility of the message handler thread being blocked?
 6713:29 <jnewbery> anyone have thoughts about those?
 6813:31 <fjahr> I did not think about this before but maybe during IBD?
 6913:32 <jnewbery> fjahr: what about IBD?
 7013:32 <fjahr> could it be blocked if we get a request during ibd?
 7113:32 <theStack> fjahr: hmm wouldn't the call to BlockRequestAllowed() just return false if the requested block is not existing yet on the node?
 7213:33 <theStack> (or maybe before, on LookupBlockIndex)
 7313:34 <jnewbery> fjahr: perhaps. IBD is when the most messages are going through the validation interface.
 7413:34 <theStack> s/block/stop block/
 7513:35 <jnewbery> but I haven't done any testing. The indexer's BlockConnected callback seems to return pretty quickly, so perhaps it isn't a concern
 7613:35 <jnewbery> theStack: but you could ask for a cfilter for an early block during IBD
 7713:36 <jnewbery> perhaps we should just not serve filters if we're not out of IBD?
 7813:38 <theStack> what exactly defines the state of "during IBD"? isn't it semantically pretty much the same, just with an (indefinitely ;-)) smaller amount of time between fetching blocks?
 7913:38 <sipa> during IBD means "while IsInitialBlockDownload() returns true"
 8013:38 <jnewbery> For the data race concern, you should see that the message handler thread is only reading from leveldb (https://github.com/jnewbery/bitcoin/blob/2a15f9943c065547e896aa221bcf26e7db8bd318/src/index/blockfilterindex.cpp#L390). The schduler thread writes to that database.
 8113:39 <sipa> it's a heuristic, but it affects lots of things in Bitcoin Core's behavior
 8213:39 <jnewbery> leveldb's own locking should stop any data races (although I haven't actually looked into that)
 8313:39 <michaelfolkson> Sorry. Basic question.... Do we forward on transactions/blocks that we know about to other peers while we are still in IBD?
 8413:40 <jnewbery> michaelfolkson: good question! I think the answer is no, but I'd need to double check that that's always true
 8513:41 <sipa> jnewbery: https://github.com/google/leveldb/blob/master/doc/index.md#concurrency
 8613:41 <theStack> sipa: ah, good to know. i didn't expect that
 8713:41 <jnewbery> thanks sipa!
 8813:42 <jnewbery> the next PR in the sequence adds an in-memory cache, so if we have multiple threads reading/writing that, we'd need our own locking. See https://github.com/bitcoin/bitcoin/pull/18876/commits/4f159aa8e29c059586bfea51b99074c2b76d196e
 8913:42 <theStack> because i always saw it as permanent IBD with just extended time in-between blocks :)
 9013:43 <jnewbery> theStack: no! IBD is very different. It's an in-memory latch. We start off in IBD, when we are at the tip we consider that we're no longer in IBD, and then we never re-enter IBD (unless you restart the node)
 9113:44 <michaelfolkson> If the answer to transactions/blocks is no then I would guess serving filters during IBD should also be no
 9213:44 <sipa> i think the most important impact is that during IBD we prefer faster synchronization over partition resistance (we'll kick peers that are too slow to feed us blocks)
 9313:45 <jnewbery> jonatack: the motivation is that BIP 157 is a better mechanism for light clients, including LN clients, than other protocols
 9413:45 <theStack> jnewbery: so if a node has finished IBD, the network connections gets -- for whatever reason -- lost for a long time, then the catching up of blocks has a different behaviour?
 9513:46 <jnewbery> theStack: that is correct
 9613:47 <sipa> unless you're talking about days... weeks... it doesn't matter much
 9713:48 <theStack> jnewbery: sipa: okay, that's very interesting. i saw that there is a function for IBD state but assumed it was only used as information for the user and otherwise didn't influence the behaviour
 9813:48 <sipa> no, i believe it's actually not even exposed to the user
 9913:48 <jnewbery> I think it's in getblockchaininfo?
10013:48 <sipa> ah yes, it's there
10113:49 <sipa> added in 0.16
10213:49 <jnewbery> (mostly for testing and developers)
10313:50 <jnewbery> 10 minutes!
10413:50 <jonatack> jnewbery: (heh i began reviewing) right, it is much better than BIP 37, I'm not yet sure if that is a good thing(tm) yet.
10513:50 <jnewbery> I have a call at 2, so I'll need to get going promptly
10613:51 <jonatack> not to bikeshed on that here; i'll have to work that out for myself
10713:51 <jnewbery> I think a good standard for thinking about these things is:
10813:52 <jnewbery> 1. Does it have a clear use case (I think in this case, the answer is yes because lightning people want it)
10913:52 <jonatack> (HW people too, it seems?)
11013:52 <jnewbery> 2. Does it impost a significant or unreasonable resource burden on our software (I think the answer is no here)
11113:53 <jnewbery> 3. Is the code correct and maintainable (that's what I need your help for)
11213:53 <theStack> just a last note to the IBD topic: i'd agree then to not serve filters during IBD -- not even block headers are served during IBD as i just saw
11313:53 <sipa> I think there is a 0. Does this belong in the P2P protocol, or can it be done better in another way - which I think is where some of the controversy is
11413:53 <jnewbery> From a software maintainability standpoint, Jimpo's implementation is far better than a lot of the other legacy functionality we have to support
11513:54 <michaelfolkson> You mean outside of Core sipa?
11613:54 <jonatack> jnewbery: agree, am nevertheless zooming out further: should we be doing this, and if yes, this way?
11713:54 <jonatack> sipa: yes
11813:55 <sipa> michaelfolkson: no; i mean in the P2P protocol at all, by whatever software
11913:55 <michaelfolkson> Ah ok thanks
12013:55 <sipa> (of course, we don't control what other people do in their software - but clearly we shouldn't add things that we don't believe belong there)
12113:55 <fjahr> jonatack: there is a lot more detail to it but in short I think that if people want to not run a node and don't care about privacy too much then I rather have them run a light client than they keep their money on sth. like coinbase. No doing BIP157 would not make more people run a full node.
12213:56 <jnewbery> sipa: I think as long as we satisfy (2) and (3), then (0) doesn't matter so much. Or maybe I should say, 'belongs' in (0) is very subjective. (2) and (3) are also subjective, but not as much.
12313:56 <fjahr> To me it is providing an alternative that hopefully will attract more people for who it is an improvement, not a 'downgrade' :)
12413:57 <jonatack> yes. i have to work that out. anyway, am reviewing. thanks fjahr jnewbery sipa.
12513:57 <sipa> jnewbery: i disagree, but let's keep it at that
12613:58 <jonatack> jnewbery: do you want to add this log to the website?
12713:58 <jnewbery> ok, I've gotta go. Thanks everyone for coming. I hope you found it useful!
12813:59 <jonatack> i can maybe append it to the previous bip157 session page
12913:59 <michaelfolkson> Yup very interesting, thanks jnewbery
13013:59 <jnewbery> If the first one gets merged, then perhaps we can do another meeting for the next PR in the sequence
13113:59 <fjahr> thanks jnewbery, will try to think a little more about threads
13213:59 <theStack> thanks for hosting jnerbery! will there be another bip157 review meeting next thursday?
13313:59 <theStack> *jnewbery
13413:59 <michaelfolkson> haha
13513:59 <jonatack> thanks everyone, this is helpful
13614:00 <jnewbery> theStack: only if 18877 is merged. No point in reviewing further ahead
13714:00 <jnewbery> ok, really going now. Bye!
13814:00 <jnewbery> #endmeeting