BIP 158
defines compact block filters: compact representations of data in a block.
Compact block filters use Golomb-Rice coding for compression and can give a
probabilistic answer to the question âdoes the block contain x?â There are no
false negatives (if x is in the block, the answer will never be ânoâ), but there are
false positives (if x is not in the block, the answer may be âyesâ). The false
positive rate is a parameter of the filter construction.
BIP 158 also specifies one filter type called Basic block filters, which
encode the scriptPubKeys of all the UTXOs spent in the block, and the
scriptPubKeys of all the new UTXOs created in the block.
PR 12254 implemented compact
block filters in Bitcoin Core, and PR
14121 added a new index, which
stores the compact block filters for blocks that have been validated.
When rescanning, the wallet wants to know whether there are any transactions
that send outputs to addresses it owns or spends those outputs.
Currently rescan is done by iterating through every transaction in every
block
(from the rescan start point).
Rescanning the entire block chain can typically take half an hour or more.
This PR changes the rescan logic to first compute a Golomb-Coded Set of all
the walletâs scripts, and then tests for matches against all blocks in
the block chain. Only blocks with a positive match need to be fetched and
scanned by the wallet.
This functionality can only be used if a compact block filter index has been
constructed by the node. Use the -blockfilterindex=1 argument.
How many scripts will a newly created wallet have? How many items will be
added to the GCS returned by GetAllRelevantScriptPubKeys()?
What is the false positive rate for Basic compact block filters? How many
false positive blocks would you expect a newly created block to retrieve if
scanning the entire block-chain? (note that newly created keys have birthdays
which mean they donât actually have to scan before the key was generated).
What do you think of the new tests? Would you add any additional test cases?
<@jnewbery> First: jump right in! I'll prompt the conversation with the questions at https://bitcoincore.reviews/15845.html, but don't feel like you need to wait until after those questions to make your comment or ask your question.
<@jnewbery> We can have multiple threads going at the same time. If it gets too confusing I might step in and ask people to address one topic first, but in general just go for it.
<@jnewbery> Second: don't ask to ask questions. There's no need to say things like "I have a question that I think might be off topic" or "is it ok if I ask my question now?". Just jump right in and ask. If it's off-topic, then people will tell you!
<@jnewbery> Was anyone able to answer q3: How many scripts will a newly created wallet have? How many items will be added to the GCS returned by GetAllRelevantScriptPubKeys()?
<@jnewbery> there should be no user-facing difference running with -rescan before and after this change, except it should be _a lot_ faster after this PR
<@jnewbery> The code changes in this PR are quite small, but I think you need to understand keypools and how the wallet rescans to give it a good review
<provoostenator> My biggest worry would be that somehow we miss some weird transaction type during the rescan. But that would (?) imply a bug in the filter definitions.
<Talkless> "This PR changes the rescan logic to first compute a Golomb-Coded Set of all the walletâs scripts, and then tests for matches against all blocks in the block chain. Only blocks with a positive match need to be fetched and scanned by the wallet."
<@jnewbery> This part is important: "The keypool is used to implement a 'gap limit'. The keypool maintains a set of keys (by default 1000) ahead of the last used key and scans for the addresses of those keys."
<@jnewbery> pinheadmz: see earlier comment: "The keypool is used to implement a 'gap limit'. The keypool maintains a set of keys (by default 1000) ahead of the last used key and scans for the addresses of those keys."
<@jnewbery> ok, so let's recap: a freshly created wallet will have 1000 keys (which equates to 6000 scriptPubKeys). In master, on rescan, we'll iterate through the blocks in the block chain. If we find a transaction that matches one of those scriptPubKeys, we'll mark that keypool entry (and every keypool entry before it) as used...
<@jnewbery> the gap limit of 1000 means that if there are any gaps, or the keys are used out-of-order, then as long as the gap is less than 1000, we won't miss any transactions
<@jnewbery> slight correction - we have 2000 keys and 6000 scriptPubKeys (1000 external keys for handing out as addresses, 1000 internal keys for change)
<@jnewbery> I think there may be circumstances where you'd have gaps in your change keys, but it's less likely than with giving out addresses (where someone might just not pay you)
<jonatack> MarcoFalke: in commit faf7cc8 updating wallet_import_rescan.py::134, I was wondering, why is "rescan" changed to "expect_rescan" in ImportNode?
<molz> so currently core wallet has deterministic backup so if we make a backup at the start of the wallet, it will automatically backup all our keys by the TopUp call?
<pinheadmz> But the filter comes from each block. So it's not the filter that needs to be updated it's the element list of keys to match against right?
<jonatack> Chris Stewart, who has an open PR to add many property-based tests, was suggesting that BIP157/158 might be a fruitful area for rapidcheck testing.
<molz> jnewbery, yea i was around the bitcoin channels when this was changed, i think because many business owners lost funds thinking if they backup once (100 keypool then) then the rest of their money was backed up
<provoostenator> molz: a potential followup to make it snappier, is to have two scan passes. One with very small lookahead window, and then one with a bigger window just in case.
<provoostenator> I even thought about adding a wallet based cache to the REST address lookup API, which could do a just-in-time rescan. Then you wouldn't need electrum server :-)
<molz> jnewbery, thanks for doing this for us, I've been lurking until today, you've been doing a wonderful job, i really enjoy reading the convo and it helps me learn :D
<@jnewbery> I would really like it if I could convince more of you to host. I think having to host makes you really understand the PR and it's a great way to learn.