💡

Bitcoin Core PR Review Club

BIP 157: Add support for getcfheaders (p2p)

May 21, 2020

https://github.com/bitcoin/bitcoin/pull/19010

Host: jnewbery - PR authors: jnewbery , jimpo

The PR branch HEAD was 4376797 at the time of this review club meeting.

This is the third PR in our special series on the BIP 157 implementation. See the previous meetings on PR 18877 and PR 18960.

Notes

This PR adds support for the getcfheaders P2P message. You should familiarise yourself with the format of that message and the cfheaders response before reviewing the code changes. Both of those messages are defined in BIP 157.
The getcfheaders message contains two fields: StartHeight (the height of the first block in the requested range) and StopHash (the hash of the last block in the requested range). The message handling code starts by deserializing those fields into uint_32t and uint256 local variables, respectively.
The PrepareBlockFilterRequest() helper function has its interface changed to accept start_height and a max_height_diff parameters.
The ProcessGetCFHeaders() function calls a new method on the block filter indexer class called BlockFilterTypeName().
New tests are added to the p2p_blockfilters.py test file.

Questions

What happens if the getcfheaders message is malformed and we’re not able to deserialize it? Try testing this for yourself and looking at the debug logs.
What changes do we need to make to the ProcessGetCFCheckPt() function in this PR? Why?
What data is the LookupFilterHashRange() function returning? Why don’t we just return the filter headers to the peer?
How is the new functionality tested? Should any other tests be added?

Meeting Log

13:00

<jnewbery> #startmeeting

13:00

<jnewbery> hi!

13:00

<theStack> hi!

13:00

<jkczyz> hi

13:00

<troygiorshev> hi

13:00

<jnewbery> fjahr: Thanks! I've updated #18876

13:00

<fjahr> hi

13:00

<michaelfolkson> hi

13:01

<jnewbery> Notes/questions: https://bitcoincore.reviews/19010.html

13:01

<jnewbery> who had a chance to review the PR? (y/n)

13:01

<troygiorshev> n

13:01

<jkczyz> y

13:01

<fjahr> n

13:01

<theStack> y

13:02

<michaelfolkson> 0.5y

13:02

<jnewbery> Let's get into the questions. What happens if the getcfheaders message is malformed and we’re not able to deserialize it?

13:03

<theStack> concluding from my log, CDataStream::read() seems to throw an exception

13:04

<michaelfolkson> I didn't do this. But what was the best approach here to malform it? Randomly tweak it or deliberately construct a message that didn't follow the definition?

13:04

<theStack> [msghand] ProcessMessages(getcfheaders, 11 bytes): Exception 'CDataStream::read(): end of data: iostream error' (NSt8ios_base7failureB5cxx11E) caught

13:04

<jnewbery> theStack: exactly right. The deserialization is here: https://github.com/jnewbery/bitcoin/blob/befc63d5b0937ca24fd48ed2580349740b87b0ec/src/net_processing.cpp#L2061

13:05

<theStack> michaelfolkson: i just modified the method msg_getcfheaders.serialize() in test_framework/messages.py to create some garbage data

13:05

<michaelfolkson> theStack: Ah nice, makes sense

13:05

<jnewbery> but just like all messages being deserialized, error handling is done in the ProcessMessages() try-except catch: https://github.com/jnewbery/bitcoin/blob/befc63d5b0937ca24fd48ed2580349740b87b0ec/src/net_processing.cpp#L3611

13:06

<jnewbery> We don't need to add custom deserialization error handling for new message types

13:07

<jnewbery> Next question: What changes do we need to make to the ProcessGetCFCheckPt() function in this PR? Why?

13:09

<theStack> were there ever any discussions among bitcoin core devs about whether to use exceptions in general? there seem to be very divided views about that in general... same say they are evil because they basically behave like goto/longjmp, and some languages like golang don't even include them (sorry for slight off-topic :))

13:10

<jkczyz> Adding stop_hash and max_height_dif params to the PrepareBlockFilterRequest call

13:10

<jnewbery> theStack: not off-topic. It's a good question!

13:11

<theStack> PrepareBlockFilterRequst was extended to test the preconditions more detailled, by also checking the allowed block range

13:12

<jnewbery> My opinion: using try-except for control flow is to be discouraged (since like you say, it's basically a goto). I'd argue that the code here is doing that: https://github.com/bitcoin/bitcoin/blob/fed1a9043fdf802c7cf7eab1c8aab25ca1c90e8d/src/bitcoin-cli.cpp#L503-L531

13:12

<jnewbery> if it's to handle errors from input that you don't control (as is the case in ProcessMessage()), then it makes sense

13:13

<theStack> jnewbery: sounds reasonable. in this case it's definitely a good thing that a deserialization return code doesn't need to be checked every time

13:14

<jnewbery> jkczyz: correct, although I see you have a suggestion for how to improve that here: https://github.com/bitcoin/bitcoin/pull/19010#discussion_r428737628

13:15

<jnewbery> Next question: What data is the LookupFilterHashRange() function returning? Why don’t we just return the block headers to the peer?

13:16

<theStack> jnewbery: i think this is supposed to be "filter headers" in the second question? (i submitted a PR shortly before the start of the meeting)

13:17

<theStack> the idea is that only the one filter header is part of the cfheaders response, the rest are filter _hashes_, and the headers can be computed on the client-side from that information

13:18

<jnewbery> theStack: you're right. I mean filter headers.

13:18

<jkczyz> which allows them to verify they are linked correctly

13:18

<theStack> jkczyz: +1

13:19

<jnewbery> jkczyz: exactly

13:19

<jnewbery> meaning you can fetch the headers trustlessly if you have the checkpts

13:20

<jnewbery> jkzyz: I don't know if you had anything to say about error handling based on your comment here: https://github.com/bitcoin/bitcoin/pull/16202#issuecomment-535152710

13:21

<jnewbery> ok, final question. How is the new functionality tested? Should any other tests be added?

13:22

<jkczyz> jnewbery: ah, error handling in relation to PrepareBlockFilterRequest? Or the exception handling discussion?

13:23

<jnewbery> the exception handling discussion (or any other wisdom you want to share with us)

13:24

<fjahr> basically the test does what the client should do as described above, it asks for cfheaders and then validates the hashes it receives by computing that it ends up at the right header

13:25

<jkczyz> I came from a world where C++ exception handling was discourage / outright banned. I tend to agree with the reasons behind that. I've come to like variant types where the return value is either the result or an error. Then you never have uninitialized or partially initialized objects around

13:25

<jkczyz> Rust has a Result enum type for this. I believe newer version of C++ have something similar

13:26

<jnewbery> yeah, that comment struck me as quite rust-ish

13:26

<michaelfolkson> There's no test that the node doesn't respond to getcfheaders with an unsupported filter type

13:26

<theStack> some languages go the way of returning two values, one for the actual result and one for an error code (i think golang also does this)

13:27

<michaelfolkson> That was outlined in the BIP

13:28

<fjahr> theStack: I think that's more of a style pattern? I mean you can write code this way in any language.

13:29

<jkczyz> jnewbery: Wrote the comment before I learned Rust but happy to see the idiom is prevalent in that world :)

13:29

<michaelfolkson> There is only one filter type currently though right?

13:31

<jnewbery> michaelfolkson: there's already a test that requesting a getcfcheckpt with an invalid filter type fails: https://github.com/bitcoin/bitcoin/pull/19010/files#diff-7cb56395a1bd1037534080986f33bd9dR158

13:31

<theStack> fjahr: yes, you are right. let's say then some languages propose this style pattern in their guidelines _and_ also provide the syntactic sugar for it (to not need an out-parameter, for example), more than others :)

13:31

<jnewbery> jkczyz commented that our behaviour regarding invalid requests is different from the BIP: https://github.com/bitcoin/bitcoin/pull/19010#pullrequestreview-416266764

13:32

<theStack> i think one of the most ugliest methods of error handling is global variables, like errno in the C standard library

13:33

<fjahr> I guess there could be a test that the client is disconnected if it requests invalid or too many filter headers

13:33

<jnewbery> michaelfolkson: yes, currently only one filter type. BIP 158 initially defined two filter types, but that was reduced to just basic filter types

13:33

<jnewbery> fjahr: seems reasonable

13:34

<michaelfolkson> Thanks jnewbery. What was the motivation for the second (now ditched) filter type?

13:35

<jkczyz> theStack: I soured on out params once I saw the result type idiom. Out params seem to encourage uninitialized objects and null ptrs

13:35

<jnewbery> michaelfolkson: I'm not sure. I didn't follow early discussion of BIP 158 very closely

13:35

<michaelfolkson> I know Pieter said something about needing a soft fork to make this more trust minimized. Maybe it was related to that

13:36

<jnewbery> jkczyz: when you say newer C++ has a Result enum type, do you know where I could find out more?

13:37

<jnewbery> jkczyz: maybe this: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4015.pdf ?

13:38

<sipa> michaelfolkson: i argued against the other filter because i thought it was useless

13:39

<michaelfolkson> Ok thanks sipa

13:40

<fjahr> michaelfolkson: https://github.com/bitcoin/bips/commit/4a85759f0229450f4a63b9404dfe8e8c10bdc92e#diff-367f1ff45d7efb5c6774e4112cd17489

13:40

<jkczyz> jnewbery: Looks like it from a brief glance though I was thinking of std::variant

13:40

<jkczyz> https://en.cppreference.com/w/cpp/utility/variant

13:40

<theStack> jnewbery: i also couldn't find much about this idiom, at least there seems to be a wiki article: https://en.m.wikipedia.org/wiki/Result_type

13:40

<sipa> haskell has an (actual data or error string) type

13:40

<sipa> iirc

13:41

<sipa> variant looks like a crazy hammer to achieve that

13:41

<michaelfolkson> Re the cfheaders cache size being big enough until 2047. I'm assuming this is a direct trade-off with OOM concerns. Could make it a bit bigger but then you are also raising OOM concerns

13:42

<jnewbery> unfortunately discussion about basic and extended filters was before the optech newsletter, so you'll just need to go back to the primary source material!

13:42

<michaelfolkson> Where are the Optech minus versions?!

13:43

<jnewbery> the paper I linked to talks about Haskell monads

13:43

<theStack> michaelfolkson: i think the OOM concerns were primarily about it being unlimited

13:43

<sipa> the extended filter was including all data elements pushed in scriptSigs i believe, which i think was a huge mistake in bip37

13:43

<sipa> or sPKs, i don't remember

13:44

<michaelfolkson> So could bump it theStack? Not a big deal but might as well push it out as far as possible if no downsides?

13:44

<jnewbery> This might be an interesting talk if anyone's interested in error handling: https://channel9.msdn.com/Shows/Going+Deep/C-and-Beyond-2012-Andrei-Alexandrescu-Systematic-Error-Handling-in-C

13:44

<fjahr> The extended filter contains extra data that is meant to enable applications

13:44

<fjahr> with more advanced smart contracts. An extended filter MUST contain exactly the

13:44

<fjahr> following items for each transaction in a block ''except the coinbase'':

13:44

<fjahr> * Each item within the witness stack of each input (if the input has a witness)

13:44

<fjahr> * Each data push in the scriptSig of each input

13:44

<jnewbery> (I haven't watched it, but it looks interesting)

13:45

<fjahr> (the removed part of BIP158 on extended filter type)

13:45

<theStack> sipa: putting all scriptSigs in a filter sounds crazy... would there ever be any use to match signatures?

13:46

<theStack> michaelfolkson: in my opinion it could be bumped, the memory usage is neglictible, whether it is 2000 or e.g. 4000...

13:46

<jkczyz> sipa: Not sure if you were around when something like this became widespread: https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/stubs/statusor.h

13:47

<sipa> jkczyz: i can't remember seeing that

13:47

<sipa> theStack: the idea was presumably that it allows matching for the full pubkey

13:48

<jnewbery> we've covered all the questions from https://bitcoincore.reviews/19010.html. Any final thoughts/questions before we wrap up?

13:50

<michaelfolkson> One final one. I know we are going through these PRs pretty quickly week by week. Do they need longer time to be reviewed? Or are they pretty much ready to all be merged?

13:50

<theStack> sipa: yes matching only the public key part of scriptSig data sounds reasonable... the signatures themselves not i think (don't know though how easy it is to distinguish)

13:51

<michaelfolkson> There should be a few weeks on the first one, a few weeks on the second one etc etc?

13:51

<jnewbery> michaelfolkson: they're ready to be merged once they've been reviewed and the maintainers consider them safe to merge.

13:51

<sipa> theStack: imho the only thing you should ever care about is the entire sPK as a whole

13:53

<jnewbery> This code has been PR'ed since July: https://github.com/bitcoin/bitcoin/pull/16442 so leaving the PR open for additional weeks doesn't seem to give any benefit

13:53

<jnewbery> (and just means that there may be conflicts that require rebase and rereview)

13:54

<jnewbery> The next PR in the sequence is "Serve cfilters requests", which is pretty much more of the same, so I suggest we don't do a review club on that and reconvene when "Signal NODE_COMPACT_FILTERS support" is open.

13:55

<theStack> michaelfolkson: concerns about too quick merging on the first BIP 157 PR were expressed, see last two comments on #18877

13:55

<michaelfolkson> Makes sense. Doesn't seem applicable here but generally there is a trade-off of waiting for more reviews versus getting the merging over with.

13:56

<jnewbery> michaelfolkson: it's down to the maintainer's judgement. If they think it's received enough reviews, then they'll merge it

13:56

<fjahr> I also found it a bit unusual that there were merge target date. On the other hand I agree, the code has been out there for a long time, just in different commits

13:56

<jnewbery> There's also nothing to stop people from reviewing a PR after it's been merged.

13:57

<michaelfolkson> Not asking because I'm critical, just asking to understand the thought process better ;)

13:57

<michaelfolkson> But thanks

13:57

<sipa> i also find it strange to have a merge target date

13:58

<jnewbery> I put a target date to hold myself accountable to make progress. I'm trying to turn around all review comments on these PRs in less than 24 hours.

13:59

<michaelfolkson> I suppose this set of PRs has a unique history of being held up for years due to a couple of Concept NACKs too. If the code has been sitting there for all that time...

14:00

<jnewbery> It seems like a monumental waste of effort to have a PR open for 9 months where review happens, there's some delay before the author responds, the reviewer gets bored, the PR needs to be rebased, etc

14:00

<michaelfolkson> Yup

14:01

<theStack> sipa: hmm if you only put sPK data in a block filter you couldn't detect if a block spent an utxo of your interest, or did you mean something different?

14:01

<jnewbery> I hope it's obvious that I'm not proposing that these things get merged on those dates if there hasn't been enough review

14:02

<sipa> theStack: BIP158 includes the sPK of UTXOs when they're created *and* when they're spent

14:02

<fjahr> jnewbery, Bitcoin Scrum Master :p

14:02

<michaelfolkson> Yeah that was clear to me jnewbery

14:03

<theStack> sipa: ah, that makes sense then

14:03

<jnewbery> jfahr: haha. I'll pass on that title if you don't mind

14:03

<fjahr> ok :D

14:04

<jnewbery> alright that's time. Thanks everyone!

14:05

<jnewbery> #endmeeting

14:05

<fjahr> Thanks!

14:05

<theStack> so far i didn't have the impression that something got merged without a good amount of ACKs -- but i guess it's always a good thing if people stay watchful

14:05

<theStack> thanks!

14:05

<michaelfolkson> Thanks. Great work on progressing these jnewbery

14:05

<jkczyz> thanks jnewbery

14:08

<jnewbery> michaelfolkson: I don't think it's true that this was 'held up for years by a couple of concept NACKs'. There were no concept NACKs on the original PR: https://github.com/bitcoin/bitcoin/pull/16442

14:08

<jnewbery> (until a couple of weeks ago when I rebased it)

14:10

<michaelfolkson> Or general understanding that there was some limited opposition at a conceptual level?

14:10

<michaelfolkson> PRs languish due to lack of interest or opposition right? It didn't seem like the former to me but I could be wrong

14:13

<jnewbery> Certainly there's been disagreement, but the opposition to it has changed over time. Luke originally seemed to be fine with adding BIP 157 support behind a flag: https://github.com/bitcoin/bitcoin/pull/16442#issuecomment-522805991

14:17

<michaelfolkson> Yeah interesting. I don't envy the maintainers. We had a Socratic with Jeremy Rubin on CTV and it seems like the design for that proposal has benefitted from the perceived opposition over time

14:18

<michaelfolkson> Seems like a much stronger proposal now than if it had been rushed through years ago. But knowing when to delay or oppose versus supporting it being merged seems like a fiendish responsibility

14:19

<michaelfolkson> Above my pay grade :)

14:19

<theStack> my assumption is that the potential for opposition or controversy would be way larger if it was enabled by default