The PR branch HEAD was e734228 at the time of this review club meeting.
Notes
Compact block filters (not to be
confused with compact blocks)
were introduced in BIP 158 as a
more privacy-friendly and incentive-compatible alternative to Bloom filters.
The main purpose of PR #24832 is to improve
performance when loading filters from disk by bypassing unnecessarily expensive checks. To verify
this, the authors (this builds on the work done in
#19280) also introduced new benchmark tests.
Benchmarks are located in /src/bench, and you can also read the usage
instructions on how to compile
and run the benchmarks.
Once you’ve compiled bench_bitcoin, you can run the entire benchmark test suite, or just limit
your tests to the tests related to this PR by using the filter option:
When are compact block filters actually loaded from disk? Is this a frequent process?
This PR introduces a new skip_decode_check bool parameter in the GCSFilter
constructor
that controls whether we check the size of the filter. In which scenarios will this parameter
actually be true? Could we remove it?
We usually don’t pass (u)int type variables by reference. Why is that different for the const
uint256& hash parameter in
BlockFilterIndex::ReadFilterFromDisk?
In
GCSFilterDecodeSkipCheck(),
is it reasonable to construct GCSFilter filter with {0, 0} as the first two of the Params
arguments? In your own words, what do the 4 Params parameters represent?
Were you able to compile the bench_bitcoin benchmark tool?
Were you able to run the benchmarks? In terms of ns/op (nanoseconds per operation), what are
the results are you getting for GCSFilterDecode and for GCSFilterDecodeSkipCheck? Is that in
line with expectations, and would you say there is sufficient benefit to warrant merging this PR?
What are the risks and downsides of this PR? Are there any new attack vectors?
<stickies-v> welcome everyone! On the menu this week is a PR by kcalvinalvin (building on earlier work by pstratem) that improves the performance of loading compact block filters from disk. In addition to those changes, we'll also look at the benchmarking tests and suite.
<stickies-v> when reviewing a PR, you need to be comfortable about understanding the *changes* it introduces. In this PR, we don't really change how we use GCS, so I'd say it's not really super important. Of course, sometimes side effects can be difficult to understand, and having enough people with deep expertise of that part of the codebase is important too.
<svav> Compact block filters - are a condensed representation of the contents of a block that allow wallets to determine whether the block contains any transactions involving the user’s keys.
<ls55> Compact block filters are a condensed representation of the contents of a block that allow wallets to determine whether the block contains any transactions involving the user’s keys.
<stickies-v> lots of good answers already - I'd say the main thing to add is that they're a compact representation of all the scriptPubKeys in a block - both the inputs and outputs
<willcl_ark> They are used when clients don't want to have to download full-size blocks, but still want to check for transactions related to themselves (e.g. "light clients")
<stickies-v> we already had bloom filters to achieve a similar goal, but they were leaking quite a bit of privacy and weren't incentive compatible (put a lot of load on full nodes), and that's fixed with compact block filters (at cost of higher bandwidth and CPU requirements for light nodes)
<willcl_ark> When you find a match against the filter, you then request the full block to get full transaction information, without leaking too much privacy to the "server"
<willcl_ark> With BIP157 theres a single filter per block, shared to all clients. With BIP37 each client can ask for matches against custom filters (and there's no DOS protection to requesting matching against many filters).
<sipa> GCS is also smaller than Bloom (approximately 1.4x for the same false positive rate), but that's just an incremental change. The fundamental difference is that the client does not just tell the server what they care about anymore (a huge privacy leak).
<stickies-v> the tunable false positive rate is also something we'll cover later on in the questions. Awesome discussion, I'll move on to the next question. As always, this discussion is async so feel free to continue discussing earlier questions
<sipa> ls55: Well, if they want the block, yes. There is no way to only download just the tx they care about (because that would reveal which tx they are interested in).
<stickies-v> ls55: this is also not consensus or P2P protocol or anything, I think any dev could implement this for their application however they want to, nothing's holding you back from querying individual TXs
<sipa> Oh, I guess BIP152 compact block relay also has a mechanism for transferring just a subset of transactions, but that's in a very different context.
<stickies-v> so the compact block filters are only loaded on request, through RPC, through REST or through P2P networking with the GETCFILTERS message (if `NODE_COMPACT_FILTERS` service bit is set)
<sipa> (That's assuming the lightweight wallet communicates over the P2P protocol; they don't have to, e.g. Electrum uses its own protocol, talking to Electrum servers, not Bitcoin P2P nodes - in that case BIP157 is obviously irrelevant)
<stickies-v> next question: this PR introduces a new `skip_decode_check` bool parameter in the `GCSFilter` constructor that controls whether we check the size of the filter. In which scenarios will this parameter actually be `true`? Could we remove it?
<stickies-v> lightlike: good point. since Core doesn't have a "light client mode", there is this asymmetry where it only sends but never consumes compact block filters
<lightlike> stickies-v: I think skip_decode_check can be removed (and I think I suggested that in the PR some time ago). skip_decode_check=false is only used when deserializing blockfilters we get from others, and since core does not request filters from others, it doesn't need to do that (outside of tests)
<lightlike> though I'm not sure if we should remove Unserialize completely, as was also suggested. I'd be fine with just removing the extra check code.
<stickies-v> alright next Q: we usually don't pass `(u)int` type variables by reference. Why is that different for the `const uint256& hash` parameter in `BlockFilterIndex::ReadFilterFromDisk`?
<stickies-v> since we already store the hash of each compact block filter in the db, we just compare that hash with the filter that we just loaded from disk
<sipa> nit: references don't have a size (from the perspective of the source code - since the reference is a perfect stand-in for what it is referencing). typically references are compiled to pointers, but the compiler isn't required to do so
<sipa> My point is just that "reference" is a concept that only exists in the source code, and its size is meaningless. If you'd ask sizeof() on a reference, you'd just get the size of what it is referencing.
<sipa> What you care about is what the overhead of passing an argument by reference or by pointer is, and while the compiler is free to do anything that works, typically both pointer and reference arguments result in 1 extra register being passed to the callee.