BIP 157: Serve cfcheckpt requests (
p2p) May 7, 2020
The PR branch HEAD was 967e2b10 at the time of this review club meeting.
This was a special review club to dive into
the implementation of BIP
157. We specifically reviewed
the first PR from that implementation: PR
There were no notes and questions for this meeting. See the
questions from PR 16442 for additional background information. Meeting Log
1 13:00 <jnewbery> #startmeeting
2 13:00 <jnewbery> hi folks, welcome to Review Club special BIP 157 edition :)
7 13:01 <jonatack_> ja: i think that was added to /topic because the very first nothing was not logged, and then when it was decided to add the meeting logs to the website, the warning was added.
11 13:02 <jnewbery> This is slightly different from other review club meetings. I haven't written notes and questions, but I wanted to make a space available for everyone to ask questions and really get deep into the implementation.
13 13:03 <jnewbery> so if you need a refresher on what the full PR is doing, I recommend re-reading those notes and logs
14 13:04 <jnewbery> The first PR I split out is just serving cfcheckpts, and doesn't signal NODE_COMPACT_FILTERS in the version message
15 13:04 <jnewbery> it's a very limited subset of the functionality, but it can be used for testing and serving checkpts to pre-configured clients
16 13:05 <jnewbery> Does anyone have any questions about the full PR or the sub-PR?
17 13:05 <theStack> jnewbery: any concrete reason why you chose the handling of (get)cfcheckpts as first reviewable chunk? is it the simplest one?
18 13:06 <nehan> jnewbery: i am also interested in how you decided to break of this piece
20 13:06 <jnewbery> theStack: to some extent yes. I expect the normal client operation will be: 1. fetch cfcheckpts, 2. fetch all other cfheaders, 3. fetch cfilters.
21 13:07 <jnewbery> That's how the test is structured at least
22 13:07 <jnewbery> so in that order of operations, supporting getcfcheckpt and sending the cfcheckpt message is the first thing to do
23 13:07 <fjahr> It's kind of nit but I am a bit annoyed that it's sometimes block filter and sometimes compact filter. Is it too late to change naming of the flags? I think in the code it's ok because devs can make the connection. But for users I may be confusing. Maybe it should be `-peerblockfilter` to match `-blockfilterindex`?
24 13:09 <jnewbery> fjahr: definitely not too late, and definitely a good question to ask. Once things are merged, we're stuck with bad names forever, so we should get it right first time
25 13:10 <jnewbery> -peerblockfilter seems reasonable. I'll think about it some more and maybe update the flag name
26 13:11 <fjahr> sounds good!
27 13:11 <jnewbery> Something that I wondered about as I was reviewing, was the synchronization across threads. Is that something that anyone here thought about?
29 13:12 <jonatack> jnewbery: i'm personally undecided whether to review this PR or not (and for the same reason did not do so for the review club last fall). mind giving an elevator pitch for this PR?
30 13:13 <jonatack> if this is the wrong moment, not a problem ;)
31 13:13 <jnewbery> When we receive a GETCFCHECKPT message from a peer, the message handler thread is the one which is calling into ProcessMessages() and doing the work to serve the CFCHECKPT message
32 13:14 <jonatack> (because if the why isn't clear, the how doesn't matter, and the PR description does not address the why)
34 13:15 <nehan> do all messages use the message handler thread? how is that handled across messages?
35 13:16 <fjahr> jonatack: Is it about block filters in general or this PR in particular? i.e. are you more referring to #18876 or #18877?
36 13:16 <nehan> i guess i'm asking if there's some guidelines on "don't tie up these threads ever"
37 13:16 <jnewbery> when we receive a block and it's connected, validation will send a BlockConnected signal
39 13:17 <jnewbery> nehan: yes, all messages from the network are handled by the message handler thread
40 13:18 <jnewbery> nehan: that's a good question. I haven't seen that guideline anywhere, but it makes sense, right? We shouldn't block the message handler thread, since it's what's doing all the important work
41 13:18 <theStack> am i right in the assumption that GETCFCHECKPT is just a convenience feature and the same could be achieved by request multiple single GETCFHEADERS?
42 13:18 <theStack> *requesting
43 13:18 <nehan> jnewebery: let me clarify my question which was phrased poorly. where is the bulk of the "work" for a message usually handled? is the idea spawn a thread to complete the work for this message, or is it add a bunch of things to different queues that other threads will handle?
44 13:18 <theStack> (needing much more single network messages and communication rounds of course)
45 13:18 <jnewbery> jonatack: good question about motivation. I'll get to it in a bit :)
46 13:19 <sipa> nehan: message is received from network by network thread, put on the received messages queue, where it is processed in the message handler thread
47 13:19 <jnewbery> nehan: no, the bulk of the work is done by the message handler thread. The only place we farm out work to other threads is script checking for blocks
48 13:20 <jnewbery> message processing in bitcoin core is almost entirely single-threaded
49 13:20 <sipa> (for block validation there are separate but fixed script verification threads, which verify all the scripts in parallel for the whole block; for all other things the bulk of the work is in the message handler thread)
50 13:20 <nehan> sipa: jnewbery: thanks! so to make sure i'm clear: messages are never processed in parallel, because there is only one message handler thread?
51 13:21 <jnewbery> nehan: correct!
52 13:21 <sipa> correct; and lots of assumptions (in Core, but also other software) depend on serial processing of messages
53 13:21 <nehan> jnewbery: sipa: cool. helpful invariant to know
54 13:21 <jnewbery> there's some low-level preprocessing done by the socket thread, eg we verify the checksum in the socket thread, but all application-level processing is in the message handler thread
55 13:21 <sipa> there used to be this approach of sending a ping after sending another message to know whether the previous one was finished processing or not
56 13:22 <jnewbery> someday it'd be nice to have separate threads for net_processing and validation, but we're not there yet
57 13:22 <sipa> by some software
58 13:22 <jnewbery> also that assumption is all over our functional tests. We synchronize messages with a ping
59 13:23 <jnewbery> theStack: The nice thing about the cfcheckpts is that because the headers form a hash chain, you can get your checkpts from somewhere trusted, and then fetch the other headers trustlessly and verify that they connect
60 13:25 <jonatack> fjahr: yes, the larger direction, beyond being an improvement over bip37
61 13:25 <jnewbery> you could get any cfheaders from a trusted source, but having the checkpts as a standard in the protocol is nice, I think
62 13:26 <theStack> jnewbery: oh, i just see now that there is actually a difference between "filter header" and "filter hash"; and that CFHEADERS includes more than just headers
63 13:26 <jnewbery> they're close enough (1000 blocks), that you can fill in the other headers with a single cfheaders request, and since it's a standard sequence of block heights, the cfcheckpt can be cached and delivered from anywhere
65 13:28 <jnewbery> so the cfheaders message just provides the first header, and then the hashes of the subsequent filters, so you can reconstuct the headers yourself (and verify that they connect to your checkpoints)
66 13:29 <jnewbery> ok, back to synchronization. My two questions when thinking about this were: is there the possibility of a data race, and is there the possibility of the message handler thread being blocked?
67 13:29 <jnewbery> anyone have thoughts about those?
68 13:31 <fjahr> I did not think about this before but maybe during IBD?
69 13:32 <jnewbery> fjahr: what about IBD?
70 13:32 <fjahr> could it be blocked if we get a request during ibd?
71 13:32 <theStack> fjahr: hmm wouldn't the call to BlockRequestAllowed() just return false if the requested block is not existing yet on the node?
72 13:33 <theStack> (or maybe before, on LookupBlockIndex)
73 13:34 <jnewbery> fjahr: perhaps. IBD is when the most messages are going through the validation interface.
74 13:34 <theStack> s/block/stop block/
75 13:35 <jnewbery> but I haven't done any testing. The indexer's BlockConnected callback seems to return pretty quickly, so perhaps it isn't a concern
76 13:35 <jnewbery> theStack: but you could ask for a cfilter for an early block during IBD
77 13:36 <jnewbery> perhaps we should just not serve filters if we're not out of IBD?
78 13:38 <theStack> what exactly defines the state of "during IBD"? isn't it semantically pretty much the same, just with an (indefinitely ;-)) smaller amount of time between fetching blocks?
79 13:38 <sipa> during IBD means "while IsInitialBlockDownload() returns true"
81 13:39 <sipa> it's a heuristic, but it affects lots of things in Bitcoin Core's behavior
82 13:39 <jnewbery> leveldb's own locking should stop any data races (although I haven't actually looked into that)
83 13:39 <michaelfolkson> Sorry. Basic question.... Do we forward on transactions/blocks that we know about to other peers while we are still in IBD?
84 13:40 <jnewbery> michaelfolkson: good question! I think the answer is no, but I'd need to double check that that's always true
86 13:41 <theStack> sipa: ah, good to know. i didn't expect that
87 13:41 <jnewbery> thanks sipa!
89 13:42 <theStack> because i always saw it as permanent IBD with just extended time in-between blocks :)
90 13:43 <jnewbery> theStack: no! IBD is very different. It's an in-memory latch. We start off in IBD, when we are at the tip we consider that we're no longer in IBD, and then we never re-enter IBD (unless you restart the node)
91 13:44 <michaelfolkson> If the answer to transactions/blocks is no then I would guess serving filters during IBD should also be no
92 13:44 <sipa> i think the most important impact is that during IBD we prefer faster synchronization over partition resistance (we'll kick peers that are too slow to feed us blocks)
93 13:45 <jnewbery> jonatack: the motivation is that BIP 157 is a better mechanism for light clients, including LN clients, than other protocols
94 13:45 <theStack> jnewbery: so if a node has finished IBD, the network connections gets -- for whatever reason -- lost for a long time, then the catching up of blocks has a different behaviour?
95 13:46 <jnewbery> theStack: that is correct
96 13:47 <sipa> unless you're talking about days... weeks... it doesn't matter much
97 13:48 <theStack> jnewbery: sipa: okay, that's very interesting. i saw that there is a function for IBD state but assumed it was only used as information for the user and otherwise didn't influence the behaviour
98 13:48 <sipa> no, i believe it's actually not even exposed to the user
99 13:48 <jnewbery> I think it's in getblockchaininfo?
100 13:48 <sipa> ah yes, it's there
101 13:49 <sipa> added in 0.16
102 13:49 <jnewbery> (mostly for testing and developers)
103 13:50 <jnewbery> 10 minutes!
104 13:50 <jonatack> jnewbery: (heh i began reviewing) right, it is much better than BIP 37, I'm not yet sure if that is a good thing(tm) yet.
105 13:50 <jnewbery> I have a call at 2, so I'll need to get going promptly
106 13:51 <jonatack> not to bikeshed on that here; i'll have to work that out for myself
107 13:51 <jnewbery> I think a good standard for thinking about these things is:
108 13:52 <jnewbery> 1. Does it have a clear use case (I think in this case, the answer is yes because lightning people want it)
109 13:52 <jonatack> (HW people too, it seems?)
110 13:52 <jnewbery> 2. Does it impost a significant or unreasonable resource burden on our software (I think the answer is no here)
111 13:53 <jnewbery> 3. Is the code correct and maintainable (that's what I need your help for)
112 13:53 <theStack> just a last note to the IBD topic: i'd agree then to not serve filters during IBD -- not even block headers are served during IBD as i just saw
113 13:53 <sipa> I think there is a 0. Does this belong in the P2P protocol, or can it be done better in another way - which I think is where some of the controversy is
114 13:53 <jnewbery> From a software maintainability standpoint, Jimpo's implementation is far better than a lot of the other legacy functionality we have to support
115 13:54 <michaelfolkson> You mean outside of Core sipa?
116 13:54 <jonatack> jnewbery: agree, am nevertheless zooming out further: should we be doing this, and if yes, this way?
117 13:54 <jonatack> sipa: yes
118 13:55 <sipa> michaelfolkson: no; i mean in the P2P protocol at all, by whatever software
119 13:55 <michaelfolkson> Ah ok thanks
120 13:55 <sipa> (of course, we don't control what other people do in their software - but clearly we shouldn't add things that we don't believe belong there)
121 13:55 <fjahr> jonatack: there is a lot more detail to it but in short I think that if people want to not run a node and don't care about privacy too much then I rather have them run a light client than they keep their money on sth. like coinbase. No doing BIP157 would not make more people run a full node.
122 13:56 <jnewbery> sipa: I think as long as we satisfy (2) and (3), then (0) doesn't matter so much. Or maybe I should say, 'belongs' in (0) is very subjective. (2) and (3) are also subjective, but not as much.
123 13:56 <fjahr> To me it is providing an alternative that hopefully will attract more people for who it is an improvement, not a 'downgrade' :)
124 13:57 <jonatack> yes. i have to work that out. anyway, am reviewing. thanks fjahr jnewbery sipa.
125 13:57 <sipa> jnewbery: i disagree, but let's keep it at that
126 13:58 <jonatack> jnewbery: do you want to add this log to the website?
127 13:58 <jnewbery> ok, I've gotta go. Thanks everyone for coming. I hope you found it useful!
128 13:59 <jonatack> i can maybe append it to the previous bip157 session page
129 13:59 <michaelfolkson> Yup very interesting, thanks jnewbery
130 13:59 <jnewbery> If the first one gets merged, then perhaps we can do another meeting for the next PR in the sequence
131 13:59 <fjahr> thanks jnewbery, will try to think a little more about threads
132 13:59 <theStack> thanks for hosting jnerbery! will there be another bip157 review meeting next thursday?
133 13:59 <theStack> *jnewbery
134 13:59 <michaelfolkson> haha
135 13:59 <jonatack> thanks everyone, this is helpful
136 14:00 <jnewbery> theStack: only if 18877 is merged. No point in reviewing further ahead
137 14:00 <jnewbery> ok, really going now. Bye!
138 14:00 <jnewbery> #endmeeting