Fast rescan with BIP157 block filters (wallet)

Host: jnewbery  -  PR author: MarcoFalke


  • BIP 158 defines compact block filters: compact representations of data in a block.
  • Compact block filters use Golomb-Rice coding for compression and can give a probabilistic answer to the question “does the block contain x?” There are no false negatives (if x is in the block, the answer will never be ‘no’), but there are false positives (if x is not in the block, the answer may be ‘yes’). The false positive rate is a parameter of the filter construction.
  • BIP 158 also specifies one filter type called Basic block filters, which encode the scriptPubKeys of all the UTXOs spent in the block, and the scriptPubKeys of all the new UTXOs created in the block.
  • PR 12254 implemented compact block filters in Bitcoin Core, and PR 14121 added a new index, which stores the compact block filters for blocks that have been validated.
  • When rescanning, the wallet wants to know whether there are any transactions that send outputs to addresses it owns or spends those outputs. Currently rescan is done by iterating through every transaction in every block (from the rescan start point).
  • Rescanning the entire block chain can typically take half an hour or more.
  • This PR changes the rescan logic to first compute a Golomb-Coded Set of all the wallet’s scripts, and then tests for matches against all blocks in the block chain. Only blocks with a positive match need to be fetched and scanned by the wallet.
  • This functionality can only be used if a compact block filter index has been constructed by the node. Use the -blockfilterindex=1 argument.


  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)

  2. What steps did you take, beyond reading the code?

  3. How many scripts will a newly created wallet have? How many items will be added to the GCS returned by GetAllRelevantScriptPubKeys()?

  4. What is the false positive rate for Basic compact block filters? How many false positive blocks would you expect a newly created block to retrieve if scanning the entire block-chain? (note that newly created keys have birthdays which mean they don’t actually have to scan before the key was generated).

  5. What do you think of the new tests? Would you add any additional test cases?

Meeting Log

  112:00 <@jnewbery> #startmeeting
  212:00 <@jnewbery> hi!
  312:00 <jonatack> hi
  412:00 <amiti> hi
  512:00 <pinheadmz> Hi!
  612:00 <@jnewbery> Hi all. This week's PR is 15845: "Fast rescan with BIP157 block filters". Notes and questions at
  712:00 <@jnewbery> Before we start, I wanted to give a couple of tips for using IRC here.
  812:01 <@jnewbery> First: jump right in! I'll prompt the conversation with the questions at, but don't feel like you need to wait until after those questions to make your comment or ask your question.
  912:01 <lightlike> hi
 1012:01 <michaelfolkson> Hey
 1112:01 <jonatack> #topic irc tips :p
 1212:01 <@jnewbery> You're not being rude if I ask a question and you make an unrelated comment!
 1312:01 <@jnewbery> We can have multiple threads going at the same time. If it gets too confusing I might step in and ask people to address one topic first, but in general just go for it.
 1412:02 <@jnewbery> Second: don't ask to ask questions. There's no need to say things like "I have a question that I think might be off topic" or "is it ok if I ask my question now?". Just jump right in and ask. If it's off-topic, then people will tell you!
 1512:02 <@jnewbery> ok, that's it for now. Let's get started!
 1612:02 <@jnewbery> What did everyone think of the PR?
 1712:02 <provoostenator> (hi)
 1812:02 <@jnewbery> Concept ACK, approach ACK, tested ACK, or NACK?
 1912:02 <provoostenator> It's fast!
 2012:03 <provoostenator> It's increased my sanity :-)
 2112:03 <pinheadmz> Conceptually very cool use case and makes a lot of sense. Rescanning is terrible and this is a huge improvement
 2212:03 <jonatack> Tested ACK review underway.
 2312:03 <jkczyz_> hi
 2412:04 <@jnewbery> great! Everyone seems happy with the concept at least. What did you all do to review and test?
 2512:04 <pinheadmz> Also reveals how a full neutrino client would work
 2612:04 -!- Irssi: #bitcoin-core-pr-reviews: Total of 78 nicks [1 ops, 0 halfops, 0 voices, 77 normal]
 2712:04 <provoostenator> I mostly tested it by scanning old wallets.
 2812:04 <pinheadmz> Just read the code. Didn't have time to install before the meeting
 2912:04 <provoostenator> Also briefly read the code
 3012:05 <@jnewbery> did people have a chance to read BIP 158? It's not as scary as you might think
 3112:05 <jkczyz_> yep
 3212:05 <jonatack> Built, ran all tests, read code, now poking through the tests, would want to test against wallets too.
 3312:06 <pinheadmz> Yes and we merged bip158 filters into bcoin recently as well
 3412:07 <@jnewbery> pinheadmz: great. Are you serving compact block filters over p2p yet?
 3512:07 <pinheadmz> No :-)
 3612:07 <pinheadmz> Just indexing and rpc
 3712:07 <pinheadmz> P2P service is coming
 3812:07 <emilengler> Hi
 3912:07 <@jnewbery> Was anyone able to answer q3: How many scripts will a newly created wallet have? How many items will be added to the GCS returned by GetAllRelevantScriptPubKeys()?
 4012:07 <molz> I compiled PR 15845 for windows and i'm running it now but i don't see anything different, do I need to put something in bitcoin.conf?
 4112:08 <@jnewbery> molz: you'll need to use -blockfilterindex=1 to build the index
 4212:08 <provoostenator> FYI: p2p is here:
 4312:08 <molz> jnewbery, ah i see, thanks for that tip
 4412:08 <MarcoFalke> molz: There is no user facing change (other than less time spent rescanning)
 4512:09 <@jnewbery> Thanks provoostenator. I think we should cover that in a future PR review club. Anyone want to host that one? :)
 4612:09 <molz> im so used to running btcd where i don't need to put any flag for the filters in .conf, thanks
 4712:09 <provoostenator> Yeah, that's a fun one, and you can test it against btcd (which I did, and found some issues) and other clients.
 4812:09 <sebastianvstaa> jnewberry: how is -blockfilterindex=! different from -rescan?
 4912:10 <@jnewbery> blockfilterindex=1 builds the index in the background. -rescan=1 rescans the blockchain for all wallet transactions on startup
 5012:10 <sebastianvstaa> *blockfilterindex=1
 5112:10 <sebastianvstaa> ok
 5212:10 <provoostenator> You can also rescan using RPC, e.g. rescanblockchain FROMHEIGHT
 5312:11 <@jnewbery> there should be no user-facing difference running with -rescan before and after this change, except it should be _a lot_ faster after this PR
 5412:11 <molz> so if you didn't start the node with -blockfilterindex=1, do you have to delete 'blocks' and 'chainstate' and start over?
 5512:11 <sebastianvstaa> yes. on testnet it'S only 10 seconds or so now.
 5612:11 <provoostenator> molz: no
 5712:11 <molz> oh ok, provoostenator
 5812:11 <provoostenator> There's a new indexing system, thanks to Jimpo,
 5912:12 <@jnewbery> The code changes in this PR are quite small, but I think you need to understand keypools and how the wallet rescans to give it a good review
 6012:12 <provoostenator> Which lets you add and remove indexes without having to do a chainstate / block reindex.
 6112:12 <provoostenator> You can even temporarliy turn an index off.
 6212:13 <provoostenator> My biggest worry would be that somehow we miss some weird transaction type during the rescan. But that would (?) imply a bug in the filter definitions.
 6312:14 <MarcoFalke> provoostenator: Or a bug in GetAllRelevantScriptPubKeys
 6412:14 <Talkless> hi
 6512:14 <Talkless> "This PR changes the rescan logic to first compute a Golomb-Coded Set of all the wallet’s scripts, and then tests for matches against all blocks in the block chain. Only blocks with a positive match need to be fetched and scanned by the wallet."
 6612:14 <Talkless> does that mean bitcoin core wallet will use these filters by itself?
 6712:14 <@jnewbery> provoostenator: yes, the thing that reviewers should be asking themselves is "could rescan miss a wallet transaction with this new code"
 6812:14 <provoostenator> Talkless: what do you mean by "by itself"?
 6912:14 <Talkless> not via peer services by other wallets
 7012:15 <MarcoFalke> Talkless: No, the filters are handled by the node, which computes the result and hands it to the wallet
 7112:15 <provoostenator> Talkless: correct: the node generates the filters by scanning the chain.
 7212:15 <@jnewbery> Talkless: There are no p2p changes in this PR
 7312:15 <Talkless> jnewbery: I uderstand
 7412:16 <pinheadmz> Did bloom filters include OPRETURN data? I think bip158 does not right?
 7512:16 <MarcoFalke> Talkless: I think "node" might be a better word for network connections, not "wallet"
 7612:16 <Talkless> but the intention is that these filters will be used only by "other" external wallets via p2p, or by bitcoin core's wallet itself/
 7712:16 <@jnewbery> The wallet keypool is defined here:
 7812:16 <@jnewbery> (moved from wallet.h very recently)
 7912:16 <Talkless> "This PR changes the rescan logic to first compute a Golomb-Coded Set of all the wallet’s scripts"
 8012:16 <Talkless> for ALL WALLET'S SRIPTS?
 8112:16 <Talkless> for the wallet that resided in this current node?
 8212:17 <@jnewbery> That class has a very full comment
 8312:17 <MarcoFalke> Talkless: for the wallet that is loaded currently
 8412:17 <MarcoFalke> The scripts are different for each wallet (in general)
 8512:17 <Talkless> MarcoFalke: ok so the node will use these filters for active wallets in that node? That's what I wanted to make clear.
 8612:18 <@jnewbery> This part is important: "The keypool is used to implement a 'gap limit'. The keypool maintains a set of keys (by default 1000) ahead of the last used key and scans for the addresses of those keys."
 8712:18 <@jnewbery> so how many scripts does a newly created wallet have?
 8812:18 <pinheadmz> Sounds like 1000 ;-)
 8912:18 <pinheadmz> Oh unless we also add nested scripts
 9012:18 <molz> 6000? :D
 9112:19 <@jnewbery> molz: Yes!
 9212:19 <@jnewbery>
 9312:19 <provoostenator> Keypool is not the only thing we care about. There can also be watch-only things. But not OP_RETURN afaik.
 9412:19 <molz> i cheat i cheat, i read what sipa said lol
 9512:19 <@jnewbery> reading what sipa says is generally encouraged
 9612:20 <@jnewbery> ok, so when we rescan an empty wallet, we're initially looking out for 6000 different scripts
 9712:20 <@jnewbery> in the current code, what happens when we match one of those scripts?
 9812:22 <jkczyz_> We lookup the block to see if it's really there
 9912:22 <provoostenator> 6000 scripts leads to a lot of false positives.
10012:22 <@jnewbery> jkczyz_: I think you're referring to what happens after this PR
10112:22 <@jnewbery> I meant in the master (pre-PR) code
10212:23 <jkczyz_> haha, yeah I guess that's what you meant by currently :P
10312:23 <@jnewbery> sorry - my question was unclear
10412:23 <@jnewbery> heres the code in master:
10512:24 <@jnewbery> SyncTransaction() is where the magic happens:
10612:25 <@jnewbery> Does anyone want to guess? :)
10712:25 <@jnewbery> what do you think we want to do when we find a matching scriptPubKey for a key that is in our keypool in a block transaction?
10812:26 <jkczyz_> update our balance
10912:26 <pinheadmz> Yeah add the utxo to wallet db
11012:26 <@jnewbery> jkczyz_: good guess! We actually calculate balance dynamically whenever it's needed. We don't keep a running total
11112:26 <@jnewbery> pinheadmz: yes!
11212:27 <pinheadmz> (Or remove if we were spending!)
11312:27 <ajonas> AddToWalletIfInvolvingMe
11412:27 <@jnewbery> ajonas: \o/
11512:27 <@jnewbery> ok, so we see a transaction that we're interested in, and then add it to the wallet
11612:28 <@jnewbery> what else do we need to do? Think about the keypool
11712:28 <amiti> mark the keys as used
11812:28 <amiti> aka remove from keypool
11912:28 <pinheadmz> And generate one to replace?
12012:28 <@jnewbery> pinheadmz: we don't really 'remove' transactions from the wallet. If they send to us or spend from us, then we want to keep track of them
12112:29 <@jnewbery> amiti and pinheadmz: yes and yes :) :)
12212:29 <pinheadmz> So does this actually mean that the lookahead is 1000?
12312:29 <pinheadmz> In the context of bip44
12412:29 <@jnewbery> pinheadmz: correct
12512:29 <molz> the keypool would be smaller until we use up 1000 keys then the keypool would generate another 1000?
12612:29 <pinheadmz> Interesting. The bip only specifies 20 I think
12712:29 <provoostenator> 1000 receive + 1000 change I think
12812:30 <@jnewbery> pinheadmz: see earlier comment: "The keypool is used to implement a 'gap limit'. The keypool maintains a set of keys (by default 1000) ahead of the last used key and scans for the addresses of those keys."
12912:30 <provoostenator> BIP 44 specifies a GAP limit of 20, but we don't use BIP 44.
13012:30 <pinheadmz> I see
13112:30 <@jnewbery> molz: no, we top up as we go. So if we drop to 999, we'll generate another 1
13212:30 <@jnewbery> as long as the wallet is unlocked (not encrypted)
13312:30 <ajonas> why do we need so many?
13412:30 <provoostenator> With descriptor wallets it'll be 6000: 1000 for each address type, adn then receve+ change
13512:31 <provoostenator> ajonas: I think it's historical, before we had HD wallets it would pre-generate private keys
13612:31 <MarcoFalke> ajonas: I think it was bumped when we switched to hd wallets
13712:31 <jonatack> To avoid loss of funds iirc
13812:31 <provoostenator> I started with 100 I think:
13912:32 <@jnewbery> sorry, was just looking at the code, trying to find where we top up the key pool
14012:32 <@jnewbery> it's changed around recently
14112:32 <@jnewbery> here we are:
14212:33 <@jnewbery> in AddToWalletIfInvolvingMe(), we'll mark the keypool keys as used there ^
14312:33 <provoostenator> And bumbed to 1000 here: (indeed after HD)
14412:33 <@jnewbery> And MarkUnusedAddresses() calls TopUp():
14512:33 <@jnewbery> L294 in that function
14612:36 <@jnewbery> ok, so let's recap: a freshly created wallet will have 1000 keys (which equates to 6000 scriptPubKeys). In master, on rescan, we'll iterate through the blocks in the block chain. If we find a transaction that matches one of those scriptPubKeys, we'll mark that keypool entry (and every keypool entry before it) as used...
14712:36 <@jnewbery> ... and then topup the keypool before continuing to scan the blockchain
14812:36 <@jnewbery> so, if keys are used in order then we won't miss any transactions
14912:37 <@jnewbery> the gap limit of 1000 means that if there are any gaps, or the keys are used out-of-order, then as long as the gap is less than 1000, we won't miss any transactions
15012:37 <@jnewbery> slight correction - we have 2000 keys and 6000 scriptPubKeys (1000 external keys for handing out as addresses, 1000 internal keys for change)
15112:38 <@jnewbery> any questions about any of that ^ ?
15212:38 <pinheadmz> 👍
15312:39 <amiti> does the ordering / gap limit only apply to the keys handed out as addresses?
15412:39 <@jnewbery> amiti: good question! You're asking because you have complete control over change keys so you wouldn't leave gaps?
15512:39 <amiti> yeah
15612:40 <@jnewbery> I think there may be circumstances where you'd have gaps in your change keys, but it's less likely than with giving out addresses (where someone might just not pay you)
15712:40 <amiti> although I guess you could form txns at different fee rates that get mined at different times and are thus seen on the chain out of order?
15812:41 <@jnewbery> yes, true
15912:41 <amiti> yeah I was wondering that... what happens in that case?
16012:41 <@jnewbery> and if you RBF bump, then you use a new key for change I think
16112:41 <jonatack> MarcoFalke: in commit faf7cc8 updating, I was wondering, why is "rescan" changed to "expect_rescan" in ImportNode?
16212:42 <@jnewbery> the logic in the core wallet is the same for both. We have a gap limit of 1000. Perhaps it could be reduced for internal keys
16312:42 <jonatack> MarcoFalke: ImportNode = collections.namedtuple("ImportNode", "prune expect_rescan blockfilter")
16412:42 <@jnewbery> has anyone spotted the bug in this PR yet? :)
16512:42 <@jnewbery> (I've been giving you hints by talking about the keypool)
16612:44 <@jnewbery> hint: why is important that TopUp() is called in the rescan loop?
16712:45 <amiti> TopUpKeyPool()? or is there a different function TopUp() that I'm missing?
16812:46 <jkczyz>
16912:46 <amiti> ah thanks
17012:46 <@jnewbery>
17112:46 <@jnewbery> that's where it's called from AddToWalletIfInvolvingMe()
17212:47 <@jnewbery> what happens if you create a wallet, make a backup, then receive 1001 payments, and then try to restore from that backup?
17312:48 <molz> one of them isn't in the backup
17412:48 <pinheadmz> Are we not topping up every time a key is found in a block during rescan?
17512:48 <@jnewbery> molz: none of the transactions are in the backup. The initial 1000 keys are in the backup
17612:48 <@jnewbery> pinheadmz: you're getting close
17712:48 <molz> so currently core wallet has deterministic backup so if we make a backup at the start of the wallet, it will automatically backup all our keys by the TopUp call?
17812:49 <@jnewbery> molz: yes, the keys are deterministic, so when topping up the backup, you'll get the same new keypool keys
17912:49 <molz> ah
18012:49 <@jnewbery> pinheadmz: in the new PR15845 logic, what are we matching the blocks against?
18112:50 <pinheadmz> GetAllRelevantScriptPubKeys()
18212:50 <@jnewbery> which returns a GCS filter
18312:50 <@jnewbery> and where is that filter built?
18412:50 <jkczyz> So the filter needs to be updated?
18512:50 <@jnewbery> jkczyq: \o/
18612:50 <pinheadmz> \o/ haha
18712:51 <pinheadmz> I should've guessed that!
18812:51 <@jnewbery> GetAllRelevantScriptPubKeys() is only called _once_, outside the rescan loop
18912:51 <@jnewbery> that means as we advance through the blockchain, then it doesn't get updated
19012:51 <pinheadmz> In bcoin we use internal bloom filters this same way, and in some cases the filter is not updated fast enough between blocks
19112:52 <pinheadmz> But the filter comes from each block. So it's not the filter that needs to be updated it's the element list of keys to match against right?
19212:53 <jkczyz> GCSFilter::ElementSet
19312:53 <@jnewbery> pinheadmz: i'm not entirely sure what the right terminology is. The GCS is what the wallet creates to match against the block filter
19412:53 <amiti> ah looks like ryanofsky changed it from TopUpKeyPool() to TopUp() yesterday, and I hadn't pulled the recent changed
19512:54 <@jnewbery> From this comment, I think that an earlier version of the PR might have recalculated the GCS for each block:
19612:55 <pinheadmz> Hm but conceptually, the filter is derived once the block is verified and the filter for each block doesn't get changed or updated ?
19712:55 <@jnewbery> which fixes this bug but is slow
19812:55 <@jnewbery> pinheadmz: correct. The block filter doesn't ever change
19912:56 <@jnewbery> 5 minutes left. Did anyont take a look at false-positive rates?
20012:58 <pinheadmz> 😬
20112:58 <jonatack> Any links on this apart from jimpo's comment?
20212:58 <ajonas> supposed to be 1/M right? and M is a param here:
20312:58 <@jnewbery> ajonas: yes
20412:58 <michaelfolkson> What's the latest update of BIP 157/158 in Core? I found this useful resource from some random called <@jnewbery> on StackExchange :)
20512:59 <michaelfolkson> There's still contention on serving filters right?
20612:59 <@jnewbery> M is specified here:
20712:59 <provoostenator> jnewbery: nice catch!
20812:59 <@jnewbery> so a transaction that isn't in a block will return positive with probability 1/784931
20913:01 <@jnewbery> I think that means that a full keypool of addresses will return positive with a probability of (1 - (784930/784931)^6000)
21013:01 <@jnewbery> which I think is ~ 6000/784931
21113:02 <ajonas> for those smarter than me Sipa has a write up on the optimization:
21213:02 <jonatack> Interesting catch! We're missing tests that aren't seeing this bug, it would seem.
21313:02 <@jnewbery> jonatack: it'd require a wallet with lots of transactions
21413:02 <@jnewbery> the current import-rescan test doesn't have enough transactions to hit keypool topup issues
21513:03 <jonatack> Chris Stewart, who has an open PR to add many property-based tests, was suggesting that BIP157/158 might be a fruitful area for rapidcheck testing.
21613:03 <@jnewbery> We're over time, but I can go for a few more minutes if people have more questions
21713:03 <pinheadmz> jnewbery: thanks again as always!!!
21813:04 <jonatack> jnewbery: how did you catch this... code review?
21913:05 <@jnewbery> jonatack: I implemented this PR a couple of years ago: so I know what sharp corners to look out for :)
22013:06 <ysangkok> thanks! awesome when there is a known bug that people have to find! so dramatic
22113:06 <pinheadmz> Whoa was that with bip158 filters? A couple of years ago?
22213:06 <@jnewbery> pinheadmz: no, just regular good old fashioned rescans
22313:06 <pinheadmz> Ah
22413:07 <@jnewbery> ok, let's call time there
22513:07 <@jnewbery> Thanks everyone. Fun meeting today!
22613:07 <@jnewbery> #endmeeting
22713:07 <jonatack> Nice :) and it was not caught yet in the review until now. Really good session.
22813:07 <michaelfolkson> Thanks, fascinating
22913:07 <molz> so the bug is not related to the filters?
23013:07 <amiti> thanks !
23113:07 <jonatack> Thanks!
23213:08 <@jnewbery> I spotted it when I reviewed it last week but didn't want to spoil the fun for you all :)
23313:08 <provoostenator> jonatack jnewbery keypool size on regtest is tiny (10?), so should be easy to test this
23413:08 <jonatack> jnewbery: nice one
23513:08 <jonatack> provoostenator: oh interesting
23613:09 <molz> would it be better if the keypool is smaller ? like 500 keys?
23713:09 <@jnewbery> I'll try to get notes and questions up before the weekend. Please ping me if you want to host. I think lots of you are ready.
23813:09 <@jnewbery> I'm happy to help you prepare and help with notes and questions
23913:10 <@jnewbery> molz: I think 1000 is too large for most people. gmax disagrees:
24013:11 <molz> jnewbery, yea i was around the bitcoin channels when this was changed, i think because many business owners lost funds thinking if they backup once (100 keypool then) then the rest of their money was backed up
24113:11 <@jnewbery> oh. One more thing
24213:11 <@jnewbery> It turns out 17:00 UTC isn't ideal for me now that daylight savings is over
24313:11 <jonatack> Same
24413:12 <sebastianvstaa> same :)
24513:12 <@jnewbery> would changing it to 18:00 preclude anyone from joining us?
24613:12 <purposely> Not an issue for you, Maybe a poll would help
24713:13 <purposely> not an issue for me*
24813:13 <provoostenator> molz: a potential followup to make it snappier, is to have two scan passes. One with very small lookahead window, and then one with a bigger window just in case.
24913:13 <molz> provoostenator, how do you do this in core?
25013:13 <@jnewbery> let me know if 18:00 doesn't work for you. If I don't hear from anyone, I'll update the website to say 18:00 this weekend
25113:14 <provoostenator> I even thought about adding a wallet based cache to the REST address lookup API, which could do a just-in-time rescan. Then you wouldn't need electrum server :-)
25213:14 <jonatack> jnewbery: yes, i'd say change it, communicate it well, and see if anyone complains
25313:14 <provoostenator> (reckless idea, but still)
25413:14 <purposely> if time changes, let's by loud about changing the time
25513:14 <@jnewbery> I'm not going to do a poll. Just let me know if 18:00 doesn't work
25613:16 <molz> jnewbery, thanks for doing this for us, I've been lurking until today, you've been doing a wonderful job, i really enjoy reading the convo and it helps me learn :D
25713:17 <lightlike> 18:00 is much better for me as well.
25813:17 <@jnewbery> molz: thanks. Glad you're taking a more active part now!
25913:17 <@jnewbery> I enjoy doing these. I think it's a really good way for us all to learn from each other
26013:19 <@jnewbery> I would really like it if I could convince more of you to host. I think having to host makes you really understand the PR and it's a great way to learn.
26113:19 <@jnewbery> (but I understand most of you are doing this in your free time, and it's a big ask)
26213:19 <jonatack> +1 for a great way to learn
26313:19 <MarcoFalke> +1
26413:19 <MarcoFalke> This one was eye-opening
26513:19 <sebastianvstaa> +1