This PR is a re-attempt of PR 15845
from 2019, which was closed without being merged. PR 15845 was the
subject of an earlier review club.
Its notes apply here as well.
This PR is a performance improvement (no functional difference).
BIP 157
(see also review club)
adds the P2P support (light client protocol) for block filters, while
BIP 158
specifies the filters themselves. This PR takes advantage of BIP 158.
One difference between this PR and 15845 is that this PR works only with
descriptor wallets, which is a more recent type of wallet added in v0.17 (2019).
(See doc/descriptors.md and
Andrew Chow’s video)
To review this PR, you will need to create a descriptor wallet. This requires
building your node with sqlite; see the
build instructions
for your environment (search for “sqlite”).
bitcoind does not automatically create a descriptor wallet
(or any wallet). To create a wallet, run the
createwallet RPC.
You don’t need to specify any arguments except wallet name, such as my_wallet
(the default is to create a descriptor wallet).
It’s probably best to also use -signet=1, since you can run a non-pruned node.
You can get some coins to play with at the Signet Faucet.
When your node is finished syncing, run and time the rescanblockchain RPC.
You can restart with block filters enabled using -blockfilterindex=1, and
run -rescanblockchain again to use the optimization.
The getindexinfo RPC will show you if block filter index is enabled.
The listreceivedbyaddress RPC will show you received transactions; this
list should be the same with and without -blockfilterindex=1 (and with and
without running this PR’s branch).
Why would a node operator enable BIP 158 filters (-blockfilterindex=1)? Does the motivation
make sense?
What downsides, if any, are there to enabling BIP 158 filters?
Were you able to set up and run the PR on signet as described in the notes?
Did you see a difference in performance with and without -blockfilterindex?
What is the advantage of descriptor wallets compared to legacy wallets,
especially in the creation of the filter set?
(Hint: what exact type of data do we need to put into the filter set?)
On a new descriptor wallet with default settings (i.e. ‘keypoolsize=1000’), how many elements would we need to put the filter set?
(Hint: the
listdescriptors RPC
can be used to count the number of descriptors created)
<larryruane_> This week's PR is 25957: "wallet: fast rescan with BIP157 block filters for descriptor wallets". Notes and questions at https://bitcoincore.reviews/25957.html
<larryruane_> willcl_ark: interesting point, are there some other indices the wallet should have? use of existing ones, or new ones? (I know slightly off-topic)
<theStack> Kaizen_Kintsugi_: ad "reading over BIP158": i think understanding how exactly the filters are constructed in detail (BIP158) is not mandatory for reviewing this PR; knowing the basic idea should be sufficient
<larryruane_> Kaizen_Kintsugi_: yes, and also to identify transactions that pay TO the wallet (or more precisely, that the wallet has watch-only addresses of, or spending keys to)
<larryruane_> willcl_ark: yes, that's how I think about it, if we have this index to benefit light client peers of ours, why not use it to benefit ourselves? no extra cost (other than a little more code)
<willcl_ark> A few reasons: to offer better privacy to light clients connected to you, lower resource usage for yourself (as the server) and no ability for clients to DoS the server by requesting you monitor many unique filters (like BIP37 can do), and now faster rescans for yourself too!
<larryruane_> willcl_ark: good answer, before this PR, I would say it's providing a community service, not sure if there's any reason other than altruism (before this PR)
<furszy> larryruane_, theStack: small add: not only txes that are sent from or received on the wallet are important. The wallet can watch scripts as well.
<larryruane_> side question, is it possible to enable the building and maintaining this index (`-blockfilterindex=1`) but not provide the BIP 157 peer-to-peer service?
<theStack> side-note: for people wanting to learn more details about block filters and BIP 157/158, there has been a row of interesting PR review clubs about that in 2020 (i think https://bitcoincore.reviews/18877 was the first one)
<larryruane_> I think conceptually BIP 158 filter is similar to a bloom filter, but better for this use case (more efficient), but I don't know the details
<larryruane_> and more side node, the BIP 37 bloom filter had the light client provide the bloom filter to its server (the full node), and that was different for each light client (so the server had to remember a bunch of them), whereas with BIP 157/158, the server generates just one for each block, and can send it (the same filter) to ALL of its light clients
<larryruane_> Kaizen_Kintsugi_: I think the term rescan is specific to the wallet (?) ... but yes, enabling txindex, or the block filters, requires reading all the blocks again
<larryruane_> should we move on? (feel free to continue previous discussions) ... question 4, Why would a node operator enable BIP 158 filters (-blockfilterindex=1)? Does the motivation make sense?
<larryruane_> oh I'm sorry, that copy-paste was wrong, question 4 is: Were you able to set up and run the PR on signet as described in the notes? Did you see a difference in performance with and without -blockfilterindex?
<larryruane_> it ends up using the block filter to check each block (rather than checking each block directly), but using the filter seems to take longer than checking an empty (or near-empty) block!
<theStack> Kaizen_Kintsugi_: even if we know the transaction count, it's a bad metric to determine how long a block takes to rescan. has anyone an idea why?
<larryruane_> Kaizen_Kintsugi_: that's a really good point.. and technically the transaction count isn't enough, it depends on the number of tx inputs and outputs
<larryruane_> or at least you'd need to know how many inputs and outputs there are to examine in a block ... which you don't really have easy access to
<theStack> Kaizen_Kintsugi_: yes. as an extreme example, i've seen blocks every now and then that only consist of 10 txs but are still full (each one takes 100kvbytes, which is a policy limit IIRC)
<larryruane_> personally I'd say it's not worth optimizing ... this inverted performance behavior wouldn't occur on mainnet, which is all we really care about
<larryruane_> Kaizen_Kintsugi_: +1 ... the block header does include the transaction count but that's always zero (this is why block headers are 81 bytes serialized, not 80)
<sipa> RE BIP158's GCS filter: it is indeed similar to a Bloom filter (no false negatives, a controllable rate of false positives), but more compact (iirc around 1.3x-1.4x). The downsides are the GCSs are write-once (you can't update them once created), and querying is much slower. Bloom filters are effectively O(n) for finding n elements in them. GCS are O(m+n) for finding n elements in a filter of size m.
<sipa> So Bloom filters are way faster if you're only going to do one or a few queries. But as you're querying for larger and larger number of elements, the relative downside of a GCS's performance goes down.
<larryruane_> question 6: What is the advantage of descriptor wallets compared to legacy wallets, especially in the creation of the filter set? (Hint: what exact type of data do we need to put into the filter set?)
<sipa> Yeah BIP37 offered a way to just downloading matcing transactions in blocks. BIP157 does not, as the server judt doesn't know what it'd need to give. This is an advantage on its own, as it avoids gratuitously revealing which transactions are interesting to the client (BIP37 has terrible privacy for this reason)
<larryruane_> sipa: so the only privacy leak is that the server knows that a particular light client is interested in *something* within this block (but not which tx(s))
<willcl_ark> Do we create an SPKM for each pubkey in legacy wallets, for each address type, resulting in hundreds (thousands?) whereas for descriptor wallets we have 8 SPKMans, 2 for each of 4 address types, receive and change?
<larryruane_> we're almost out of time, there are a few questions remaining (7-10) sorry we didn't get to them, any comments on those questions? or anything else?