Fast rescan with BIP157 block filters (
wallet) Nov 6, 2019
defines compact block filters: compact representations of data in a block. Compact block filters use Golomb-Rice coding for compression and can give a
probabilistic answer to the question “does the block contain x?” There are no
false negatives (if x is in the block, the answer will never be ‘no’), but there are
false positives (if x is not in the block, the answer may be ‘yes’). The false
positive rate is a parameter of the filter construction.
BIP 158 also specifies one filter type called
Basic block filters, which
encode the scriptPubKeys of all the UTXOs spent in the block, and the
scriptPubKeys of all the new UTXOs created in the block.
PR 12254 implemented compact
block filters in Bitcoin Core, and PR
14121 added a new index, which
stores the compact block filters for blocks that have been validated. When rescanning, the wallet wants to know whether there are any transactions
that send outputs to addresses it owns or spends those outputs.
Currently rescan is done by
iterating through every transaction in every
(from the rescan start point). Rescanning the entire block chain can typically take half an hour or more.
This PR changes the rescan logic to first compute a Golomb-Coded Set of all
the wallet’s scripts, and then tests for matches against all blocks in
the block chain. Only blocks with a positive match need to be fetched and
scanned by the wallet.
This functionality can only be used if a compact block filter index has been
constructed by the node. Use the
Did you review the PR?
Concept ACK, approach ACK, tested ACK, or
(Don’t forget to put your PR review on GitHub.)
What steps did you take, beyond reading the code?
How many scripts will a newly created wallet have? How many items will be
added to the GCS returned by
What is the false positive rate for Basic compact block filters? How many
false positive blocks would you expect a newly created block to retrieve if
scanning the entire block-chain? (note that newly created keys have
which mean they don’t actually have to scan before the key was generated).
What do you think of the new tests? Would you add any additional test cases?
1 12:00 <@jnewbery> #startmeeting
7 12:00 <@jnewbery> Before we start, I wanted to give a couple of tips for using IRC here.
8 12:01 <@jnewbery> First: jump right in! I'll prompt the conversation with the questions at https://bitcoincore.reviews/15845.html, but don't feel like you need to wait until after those questions to make your comment or ask your question.
10 12:01 <michaelfolkson> Hey
11 12:01 <jonatack> #topic irc tips :p
12 12:01 <@jnewbery> You're not being rude if I ask a question and you make an unrelated comment!
13 12:01 <@jnewbery> We can have multiple threads going at the same time. If it gets too confusing I might step in and ask people to address one topic first, but in general just go for it.
14 12:02 <@jnewbery> Second: don't ask to ask questions. There's no need to say things like "I have a question that I think might be off topic" or "is it ok if I ask my question now?". Just jump right in and ask. If it's off-topic, then people will tell you!
15 12:02 <@jnewbery> ok, that's it for now. Let's get started!
16 12:02 <@jnewbery> What did everyone think of the PR?
17 12:02 <provoostenator> (hi)
18 12:02 <@jnewbery> Concept ACK, approach ACK, tested ACK, or NACK?
19 12:02 <provoostenator> It's fast!
20 12:03 <provoostenator> It's increased my sanity :-)
21 12:03 <pinheadmz> Conceptually very cool use case and makes a lot of sense. Rescanning is terrible and this is a huge improvement
22 12:03 <jonatack> Tested ACK review underway.
24 12:04 <@jnewbery> great! Everyone seems happy with the concept at least. What did you all do to review and test?
25 12:04 <pinheadmz> Also reveals how a full neutrino client would work
26 12:04 -!- Irssi: #bitcoin-core-pr-reviews: Total of 78 nicks [1 ops, 0 halfops, 0 voices, 77 normal]
27 12:04 <provoostenator> I mostly tested it by scanning old wallets.
28 12:04 <pinheadmz> Just read the code. Didn't have time to install before the meeting
29 12:04 <provoostenator> Also briefly read the code
30 12:05 <@jnewbery> did people have a chance to read BIP 158? It's not as scary as you might think
32 12:05 <jonatack> Built, ran all tests, read code, now poking through the tests, would want to test against wallets too.
33 12:06 <pinheadmz> Yes and we merged bip158 filters into bcoin recently as well
34 12:07 <@jnewbery> pinheadmz: great. Are you serving compact block filters over p2p yet?
35 12:07 <pinheadmz> No :-)
36 12:07 <pinheadmz> Just indexing and rpc
37 12:07 <pinheadmz> P2P service is coming
39 12:07 <@jnewbery> Was anyone able to answer q3: How many scripts will a newly created wallet have? How many items will be added to the GCS returned by GetAllRelevantScriptPubKeys()?
40 12:07 <molz> I compiled PR 15845 for windows and i'm running it now but i don't see anything different, do I need to put something in bitcoin.conf?
41 12:08 <@jnewbery> molz: you'll need to use -blockfilterindex=1 to build the index
43 12:08 <molz> jnewbery, ah i see, thanks for that tip
44 12:08 <MarcoFalke> molz: There is no user facing change (other than less time spent rescanning)
45 12:09 <@jnewbery> Thanks provoostenator. I think we should cover that in a future PR review club. Anyone want to host that one? :)
46 12:09 <molz> im so used to running btcd where i don't need to put any flag for the filters in .conf, thanks
47 12:09 <provoostenator> Yeah, that's a fun one, and you can test it against btcd (which I did, and found some issues) and other clients.
48 12:09 <sebastianvstaa> jnewberry: how is -blockfilterindex=! different from -rescan?
49 12:10 <@jnewbery> blockfilterindex=1 builds the index in the background. -rescan=1 rescans the blockchain for all wallet transactions on startup
50 12:10 <sebastianvstaa> *blockfilterindex=1
51 12:10 <sebastianvstaa> ok
52 12:10 <provoostenator> You can also rescan using RPC, e.g. rescanblockchain FROMHEIGHT
53 12:11 <@jnewbery> there should be no user-facing difference running with -rescan before and after this change, except it should be _a lot_ faster after this PR
54 12:11 <molz> so if you didn't start the node with -blockfilterindex=1, do you have to delete 'blocks' and 'chainstate' and start over?
55 12:11 <sebastianvstaa> yes. on testnet it'S only 10 seconds or so now.
56 12:11 <provoostenator> molz: no
57 12:11 <molz> oh ok, provoostenator
58 12:11 <provoostenator> There's a new indexing system, thanks to Jimpo,
59 12:12 <@jnewbery> The code changes in this PR are quite small, but I think you need to understand keypools and how the wallet rescans to give it a good review
60 12:12 <provoostenator> Which lets you add and remove indexes without having to do a chainstate / block reindex.
61 12:12 <provoostenator> You can even temporarliy turn an index off.
62 12:13 <provoostenator> My biggest worry would be that somehow we miss some weird transaction type during the rescan. But that would (?) imply a bug in the filter definitions.
63 12:14 <MarcoFalke> provoostenator: Or a bug in GetAllRelevantScriptPubKeys
65 12:14 <Talkless> "This PR changes the rescan logic to first compute a Golomb-Coded Set of all the wallet’s scripts, and then tests for matches against all blocks in the block chain. Only blocks with a positive match need to be fetched and scanned by the wallet."
66 12:14 <Talkless> does that mean bitcoin core wallet will use these filters by itself?
67 12:14 <@jnewbery> provoostenator: yes, the thing that reviewers should be asking themselves is "could rescan miss a wallet transaction with this new code"
68 12:14 <provoostenator> Talkless: what do you mean by "by itself"?
69 12:14 <Talkless> not via peer services by other wallets
70 12:15 <MarcoFalke> Talkless: No, the filters are handled by the node, which computes the result and hands it to the wallet
71 12:15 <provoostenator> Talkless: correct: the node generates the filters by scanning the chain.
72 12:15 <@jnewbery> Talkless: There are no p2p changes in this PR
73 12:15 <Talkless> jnewbery: I uderstand
74 12:16 <pinheadmz> Did bloom filters include OPRETURN data? I think bip158 does not right?
75 12:16 <MarcoFalke> Talkless: I think "node" might be a better word for network connections, not "wallet"
76 12:16 <Talkless> but the intention is that these filters will be used only by "other" external wallets via p2p, or by bitcoin core's wallet itself/
78 12:16 <@jnewbery> (moved from wallet.h very recently)
79 12:16 <Talkless> "This PR changes the rescan logic to first compute a Golomb-Coded Set of all the wallet’s scripts"
80 12:16 <Talkless> for ALL WALLET'S SRIPTS?
81 12:16 <Talkless> for the wallet that resided in this current node?
82 12:17 <@jnewbery> That class has a very full comment
83 12:17 <MarcoFalke> Talkless: for the wallet that is loaded currently
84 12:17 <MarcoFalke> The scripts are different for each wallet (in general)
85 12:17 <Talkless> MarcoFalke: ok so the node will use these filters for active wallets in that node? That's what I wanted to make clear.
86 12:18 <@jnewbery> This part is important: "The keypool is used to implement a 'gap limit'. The keypool maintains a set of keys (by default 1000) ahead of the last used key and scans for the addresses of those keys."
87 12:18 <@jnewbery> so how many scripts does a newly created wallet have?
88 12:18 <pinheadmz> Sounds like 1000 ;-)
89 12:18 <pinheadmz> Oh unless we also add nested scripts
91 12:19 <@jnewbery> molz: Yes!
93 12:19 <provoostenator> Keypool is not the only thing we care about. There can also be watch-only things. But not OP_RETURN afaik.
94 12:19 <molz> i cheat i cheat, i read what sipa said lol
95 12:19 <@jnewbery> reading what sipa says is generally encouraged
96 12:20 <@jnewbery> ok, so when we rescan an empty wallet, we're initially looking out for 6000 different scripts
97 12:20 <@jnewbery> in the current code, what happens when we match one of those scripts?
98 12:22 <jkczyz_> We lookup the block to see if it's really there
99 12:22 <provoostenator> 6000 scripts leads to a lot of false positives.
100 12:22 <@jnewbery> jkczyz_: I think you're referring to what happens after this PR
101 12:22 <@jnewbery> I meant in the master (pre-PR) code
102 12:23 <jkczyz_> haha, yeah I guess that's what you meant by currently :P
103 12:23 <@jnewbery> sorry - my question was unclear
106 12:25 <@jnewbery> Does anyone want to guess? :)
107 12:25 <@jnewbery> what do you think we want to do when we find a matching scriptPubKey for a key that is in our keypool in a block transaction?
108 12:26 <jkczyz_> update our balance
109 12:26 <pinheadmz> Yeah add the utxo to wallet db
110 12:26 <@jnewbery> jkczyz_: good guess! We actually calculate balance dynamically whenever it's needed. We don't keep a running total
111 12:26 <@jnewbery> pinheadmz: yes!
112 12:27 <pinheadmz> (Or remove if we were spending!)
113 12:27 <ajonas> AddToWalletIfInvolvingMe
114 12:27 <@jnewbery> ajonas: \o/
115 12:27 <@jnewbery> ok, so we see a transaction that we're interested in, and then add it to the wallet
116 12:28 <@jnewbery> what else do we need to do? Think about the keypool
117 12:28 <amiti> mark the keys as used
118 12:28 <amiti> aka remove from keypool
119 12:28 <pinheadmz> And generate one to replace?
120 12:28 <@jnewbery> pinheadmz: we don't really 'remove' transactions from the wallet. If they send to us or spend from us, then we want to keep track of them
121 12:29 <@jnewbery> amiti and pinheadmz: yes and yes :) :)
122 12:29 <pinheadmz> So does this actually mean that the lookahead is 1000?
123 12:29 <pinheadmz> In the context of bip44
124 12:29 <@jnewbery> pinheadmz: correct
125 12:29 <molz> the keypool would be smaller until we use up 1000 keys then the keypool would generate another 1000?
126 12:29 <pinheadmz> Interesting. The bip only specifies 20 I think
127 12:29 <provoostenator> 1000 receive + 1000 change I think
128 12:30 <@jnewbery> pinheadmz: see earlier comment: "The keypool is used to implement a 'gap limit'. The keypool maintains a set of keys (by default 1000) ahead of the last used key and scans for the addresses of those keys."
129 12:30 <provoostenator> BIP 44 specifies a GAP limit of 20, but we don't use BIP 44.
130 12:30 <pinheadmz> I see
131 12:30 <@jnewbery> molz: no, we top up as we go. So if we drop to 999, we'll generate another 1
132 12:30 <@jnewbery> as long as the wallet is unlocked (not encrypted)
133 12:30 <ajonas> why do we need so many?
134 12:30 <provoostenator> With descriptor wallets it'll be 6000: 1000 for each address type, adn then receve+ change
135 12:31 <provoostenator> ajonas: I think it's historical, before we had HD wallets it would pre-generate private keys
136 12:31 <MarcoFalke> ajonas: I think it was bumped when we switched to hd wallets
137 12:31 <jonatack> To avoid loss of funds iirc
139 12:32 <@jnewbery> sorry, was just looking at the code, trying to find where we top up the key pool
140 12:32 <@jnewbery> it's changed around recently
142 12:33 <@jnewbery> in AddToWalletIfInvolvingMe(), we'll mark the keypool keys as used there ^
145 12:33 <@jnewbery> L294 in that function
146 12:36 <@jnewbery> ok, so let's recap: a freshly created wallet will have 1000 keys (which equates to 6000 scriptPubKeys). In master, on rescan, we'll iterate through the blocks in the block chain. If we find a transaction that matches one of those scriptPubKeys, we'll mark that keypool entry (and every keypool entry before it) as used...
147 12:36 <@jnewbery> ... and then topup the keypool before continuing to scan the blockchain
148 12:36 <@jnewbery> so, if keys are used in order then we won't miss any transactions
149 12:37 <@jnewbery> the gap limit of 1000 means that if there are any gaps, or the keys are used out-of-order, then as long as the gap is less than 1000, we won't miss any transactions
150 12:37 <@jnewbery> slight correction - we have 2000 keys and 6000 scriptPubKeys (1000 external keys for handing out as addresses, 1000 internal keys for change)
151 12:38 <@jnewbery> any questions about any of that ^ ?
153 12:39 <amiti> does the ordering / gap limit only apply to the keys handed out as addresses?
154 12:39 <@jnewbery> amiti: good question! You're asking because you have complete control over change keys so you wouldn't leave gaps?
156 12:40 <@jnewbery> I think there may be circumstances where you'd have gaps in your change keys, but it's less likely than with giving out addresses (where someone might just not pay you)
157 12:40 <amiti> although I guess you could form txns at different fee rates that get mined at different times and are thus seen on the chain out of order?
158 12:41 <@jnewbery> yes, true
159 12:41 <amiti> yeah I was wondering that... what happens in that case?
160 12:41 <@jnewbery> and if you RBF bump, then you use a new key for change I think
161 12:41 <jonatack> MarcoFalke: in commit faf7cc8 updating wallet_import_rescan.py::134, I was wondering, why is "rescan" changed to "expect_rescan" in ImportNode?
162 12:42 <@jnewbery> the logic in the core wallet is the same for both. We have a gap limit of 1000. Perhaps it could be reduced for internal keys
163 12:42 <jonatack> MarcoFalke: ImportNode = collections.namedtuple("ImportNode", "prune expect_rescan blockfilter")
164 12:42 <@jnewbery> has anyone spotted the bug in this PR yet? :)
165 12:42 <@jnewbery> (I've been giving you hints by talking about the keypool)
166 12:44 <@jnewbery> hint: why is important that TopUp() is called in the rescan loop?
167 12:45 <amiti> TopUpKeyPool()? or is there a different function TopUp() that I'm missing?
169 12:46 <amiti> ah thanks
171 12:46 <@jnewbery> that's where it's called from AddToWalletIfInvolvingMe()
172 12:47 <@jnewbery> what happens if you create a wallet, make a backup, then receive 1001 payments, and then try to restore from that backup?
173 12:48 <molz> one of them isn't in the backup
174 12:48 <pinheadmz> Are we not topping up every time a key is found in a block during rescan?
175 12:48 <@jnewbery> molz: none of the transactions are in the backup. The initial 1000 keys are in the backup
176 12:48 <@jnewbery> pinheadmz: you're getting close
177 12:48 <molz> so currently core wallet has deterministic backup so if we make a backup at the start of the wallet, it will automatically backup all our keys by the TopUp call?
178 12:49 <@jnewbery> molz: yes, the keys are deterministic, so when topping up the backup, you'll get the same new keypool keys
180 12:49 <@jnewbery> pinheadmz: in the new PR15845 logic, what are we matching the blocks against?
181 12:50 <pinheadmz> GetAllRelevantScriptPubKeys()
182 12:50 <@jnewbery> which returns a GCS filter
183 12:50 <@jnewbery> and where is that filter built?
184 12:50 <jkczyz> So the filter needs to be updated?
185 12:50 <@jnewbery> jkczyq: \o/
186 12:50 <pinheadmz> \o/ haha
187 12:51 <pinheadmz> I should've guessed that!
188 12:51 <@jnewbery> GetAllRelevantScriptPubKeys() is only called _once_, outside the rescan loop
189 12:51 <@jnewbery> that means as we advance through the blockchain, then it doesn't get updated
190 12:51 <pinheadmz> In bcoin we use internal bloom filters this same way, and in some cases the filter is not updated fast enough between blocks
191 12:52 <pinheadmz> But the filter comes from each block. So it's not the filter that needs to be updated it's the element list of keys to match against right?
192 12:53 <jkczyz> GCSFilter::ElementSet
193 12:53 <@jnewbery> pinheadmz: i'm not entirely sure what the right terminology is. The GCS is what the wallet creates to match against the block filter
196 12:55 <pinheadmz> Hm but conceptually, the filter is derived once the block is verified and the filter for each block doesn't get changed or updated ?
197 12:55 <@jnewbery> which fixes this bug but is slow
198 12:55 <@jnewbery> pinheadmz: correct. The block filter doesn't ever change
199 12:56 <@jnewbery> 5 minutes left. Did anyont take a look at false-positive rates?
203 12:58 <@jnewbery> ajonas: yes
205 12:59 <michaelfolkson> There's still contention on serving filters right?
207 12:59 <provoostenator> jnewbery: nice catch!
208 12:59 <@jnewbery> so a transaction that isn't in a block will return positive with probability 1/784931
209 13:01 <@jnewbery> I think that means that a full keypool of addresses will return positive with a probability of (1 - (784930/784931)^6000)
210 13:01 <@jnewbery> which I think is ~ 6000/784931
212 13:02 <jonatack> Interesting catch! We're missing tests that aren't seeing this bug, it would seem.
213 13:02 <@jnewbery> jonatack: it'd require a wallet with lots of transactions
214 13:02 <@jnewbery> the current import-rescan test doesn't have enough transactions to hit keypool topup issues
215 13:03 <jonatack> Chris Stewart, who has an open PR to add many property-based tests, was suggesting that BIP157/158 might be a fruitful area for rapidcheck testing.
216 13:03 <@jnewbery> We're over time, but I can go for a few more minutes if people have more questions
217 13:03 <pinheadmz> jnewbery: thanks again as always!!!
218 13:04 <jonatack> jnewbery: how did you catch this... code review?
220 13:06 <ysangkok> thanks! awesome when there is a known bug that people have to find! so dramatic
221 13:06 <pinheadmz> Whoa was that with bip158 filters? A couple of years ago?
222 13:06 <@jnewbery> pinheadmz: no, just regular good old fashioned rescans
224 13:07 <@jnewbery> ok, let's call time there
225 13:07 <@jnewbery> Thanks everyone. Fun meeting today!
226 13:07 <@jnewbery> #endmeeting
227 13:07 <jonatack> Nice :) and it was not caught yet in the review until now. Really good session.
228 13:07 <michaelfolkson> Thanks, fascinating
229 13:07 <molz> so the bug is not related to the filters?
230 13:07 <amiti> thanks !
231 13:07 <jonatack> Thanks!
232 13:08 <@jnewbery> I spotted it when I reviewed it last week but didn't want to spoil the fun for you all :)
233 13:08 <provoostenator> jonatack jnewbery keypool size on regtest is tiny (10?), so should be easy to test this
234 13:08 <jonatack> jnewbery: nice one
235 13:08 <jonatack> provoostenator: oh interesting
236 13:09 <molz> would it be better if the keypool is smaller ? like 500 keys?
237 13:09 <@jnewbery> I'll try to get notes and questions up before the weekend. Please ping me if you want to host. I think lots of you are ready.
238 13:09 <@jnewbery> I'm happy to help you prepare and help with notes and questions
240 13:11 <molz> jnewbery, yea i was around the bitcoin channels when this was changed, i think because many business owners lost funds thinking if they backup once (100 keypool then) then the rest of their money was backed up
241 13:11 <@jnewbery> oh. One more thing
242 13:11 <@jnewbery> It turns out 17:00 UTC isn't ideal for me now that daylight savings is over
244 13:12 <sebastianvstaa> same :)
245 13:12 <@jnewbery> would changing it to 18:00 preclude anyone from joining us?
246 13:12 <purposely> Not an issue for you, Maybe a poll would help
247 13:13 <purposely> not an issue for me*
248 13:13 <provoostenator> molz: a potential followup to make it snappier, is to have two scan passes. One with very small lookahead window, and then one with a bigger window just in case.
249 13:13 <molz> provoostenator, how do you do this in core?
250 13:13 <@jnewbery> let me know if 18:00 doesn't work for you. If I don't hear from anyone, I'll update the website to say 18:00 this weekend
251 13:14 <provoostenator> I even thought about adding a wallet based cache to the REST address lookup API, which could do a just-in-time rescan. Then you wouldn't need electrum server :-)
252 13:14 <jonatack> jnewbery: yes, i'd say change it, communicate it well, and see if anyone complains
253 13:14 <provoostenator> (reckless idea, but still)
254 13:14 <purposely> if time changes, let's by loud about changing the time
255 13:14 <@jnewbery> I'm not going to do a poll. Just let me know if 18:00 doesn't work
256 13:16 <molz> jnewbery, thanks for doing this for us, I've been lurking until today, you've been doing a wonderful job, i really enjoy reading the convo and it helps me learn :D
257 13:17 <lightlike> 18:00 is much better for me as well.
258 13:17 <@jnewbery> molz: thanks. Glad you're taking a more active part now!
259 13:17 <@jnewbery> I enjoy doing these. I think it's a really good way for us all to learn from each other
260 13:19 <@jnewbery> I would really like it if I could convince more of you to host. I think having to host makes you really understand the PR and it's a great way to learn.
261 13:19 <@jnewbery> (but I understand most of you are doing this in your free time, and it's a big ask)
262 13:19 <jonatack> +1 for a great way to learn
264 13:19 <MarcoFalke> This one was eye-opening
265 13:19 <sebastianvstaa> +1