wallet: ensure the wallet is unlocked when needed for rescanning (wallet)

Jan 25, 2023

https://github.com/bitcoin/bitcoin/pull/26347

Host: ishaanam - PR author: ishaanam

The PR branch HEAD was b2e2829 at the time of this review club meeting.

Notes

A wallet rescans the blockchain in order to find transactions related to its keys. A wallet starts rescanning the blockchain upon the import of keys, descriptors, etc. A rescan can also be invoked by calling rescanblockchain.
Wallet rescans were previously discussed in review clubs #15845 and #25957.
When a wallet is rescanning, it needs to be able to top-up its keypool.
A wallet’s keypool is its reserve of unused keys of a denoted size (the default size is 1000 keys). When a wallet is using BIP 32 to generate deterministic keys, an unused key is considered to be any key whose BIP 32 index is greater than the index of the last key which was provided to the user by the wallet or which received funds.
WalletRescanReserver is used to manage rescans in a wallet.
Wallets can optionally be encrypted using a passphrase. Only the private key material of the wallet is encrypted using a user-provided passphrase.
An encrypted wallet is locked by default; it can be unlocked by calling walletpassphrase, which then stores the passphrase in memory for a specified amount of time before the wallet is locked again
Certain RPCs which make use of information relating to private keys, such as sendtoaddress and signrawtransactionwithwallet require the wallet to be unlocked in order to work correctly. If the wallet is not unlocked, the RPC typically throws an error.
The cs_wallet mutex guards access to the passphrase when stored in memory. For example, cs_wallet must be held before the passphrase can be deleted, so holding the cs_wallet mutex for the duration of the RPC ensures another thread cannot delete the passphrase while it is in use.
The rescanblockchain and importdescriptors RPCs don’t hold cs_wallet during the entire rescan, only for certain relatively fast operations, because a rescan can take over an hour, and the user might want to use other RPCs in the meantime which require the cs_wallet lock to be held.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?
What bug is this PR fixing?
Why can wallets that use unhardened derivation top up the keypool without being unlocked? Would this problem occur when rescanning a descriptor wallet with descriptors generated by Bitcoin Core?
Does the test added to wallet_importdescriptors.py fail on master for you? Can you think of any ways to make it fail on master more reliably?
What is this PR’s approach to fixing this bug? Can you think of any alternatives?
Why was the m_scanning_with_passphrase variable added to CWallet?
Why is wallet.m_scanning_with_passphrase only set to true in some RPCs and not others like importpubkey and importprivkey?
What would happen if the first commit of this PR was removed and a user tried to run walletlock while performing a rescan that required the passphrase?
Why was another mutex added instead of just using cs_wallet?
Why is RecursiveMutex used here as opposed to Mutex?
Why does m_relock_mutex get locked twice in walletlock?

Meeting Log

17:00

<ishaanam[m]> #startmeeting

17:00

<ishaanam[m]> Hey everyone, welcome to PR Review club!

17:00

<brunoerg> hi

17:00

<d33r_gee> hello

17:01

<raj21> hi

17:01

<pablomartin> hello!

17:01

<b_101> hello all!

17:01

<hernanmarino> Hello

17:01

<ishaanam[m]> Today we will be reviewing #26347 "Ensure the wallet is unlocked when needed for rescanning"

17:01

<LarryRuane> hi

17:01

<ishaanam[m]> Notes are posted here: https://bitcoincore.reviews/26347

17:01

<grettke> Hello.

17:02

<ishaanam[m]> As always, feel free to jump in with any questions or comments!

17:02

<svav> Hi

17:02

<glozow> hi

17:02

<ishaanam[m]> Has everyone gotten the chance to take a look at the PR or the notes?

17:02

<d33r_gee> yes

17:02

<roze_paul> hi

17:02

<pablomartin> yup, both

17:02

<roze_paul> a bit

17:02

<b_101> y/y

17:03

<LarryRuane> some of both :)

17:03

<svav> yes

17:03

<hernanmarino> I did a review on the PR a couple of months ago, but didn t read the notes thoroughly today

17:03

<willcl_ark> Hi

17:03

<Murch1> Hey

17:04

<ishaanam[m]> Great! Can someone give a short summary of the bug this PR is fixing?

17:05

<d33r_gee> This PR makes sure that the wallet stays unlocked while rescanning and that the following rpc calls do not run while rescanning

17:05

<d33r_gee> - walletlock

17:05

<d33r_gee> - encryptwallet

17:05

<d33r_gee> - walletpassphrasechange

17:05

<grettke> Yes took a look.

17:05

<LarryRuane> while either of the 2 long-running RPCs is running, the wallet lock can timeout (wallet locks), which prevents key-topoff (those long-running RPCs create a need for topoff)

17:05

<b_101> fix risk of missing some keys when rescanning the blockchain due to an unlockwallet time-out, particularly if time set is not enough

17:06

<LarryRuane> one thing I'm unclear on is what happens (without the PR) if the key topoff wants to run but the wallet is locked? Does the long-running RPC immediately fail?

17:06

<ishaanam[m]> d33r_gee: LarryRuane: b_101: yes!

17:06

<LarryRuane> or do we just not do the topoff? if so, what's the effect of that?

17:07

<ishaanam[m]> LarryRuane: the key topoff fails silently, meaning that the scan continues, but with less keys in its keypool than it should have

17:07

<b_101> can someone define the meaning of topoff?

17:07

<LarryRuane> ishaanam[m]: thanks

17:08

<LarryRuane> for topoff, this is a good article https://blog.lopp.net/mind-the-bitcoin-address-gap/

17:08

<b_101> LarryRuane: thx!

17:09

<ishaanam[m]> This sort of lead to the next question: Why can wallets that use unhardened derivation top up the keypool without being unlocked?

17:10

<LarryRuane> when rescanning, if you detect a payment to one of your addresses, you want to make sure you're looking for payments to the next 1000 addresses ... may need to generate more addresses

17:10

<LarryRuane> (hope i got that right!)

17:11

<ishaanam[m]> Yes, that is what it means to top up the keypool.

17:11

<rot13maxi> Will other signing operations with the wallet fail/error if the wallet is in the middle of rescanning? In other words, if my wallet normally re-locks after 30 seconds, but is "held open" because it's rescanning, could an RPC to send a transaction come in while its rescanning, or would that RPC fail because there's a scan-in-progress?

17:12

<LarryRuane> ishaanam[m]: unhardened doesn't need private keys? ... but i'm fuzzy on this topic, is it kind of watch-only situation?

17:13

<roze_paul> +1 @rot13maxi

17:13

<ishaanam[m]> LarryRuane: yes, with unhardened derivation, the "keypool" is basically the xpub, because private information is not needed to derive additional keys.

17:13

<Murch1> ishaanam: Only the private keys are encrypted, so the unlocked wallet.dat is sufficient to derive unhardened keys

17:14

<LarryRuane> ishaanam[m]: +1 thanks

17:14

<ishaanam[m]> Murch: yes, thanks for pointing that out. It is important to note that the wallet only encrypts private keys

17:15

<Murch1> err

17:15

<Murch1> also, I meant the “locked wallet is sufficient”

17:15

<LarryRuane> ishaanam[m]: does a wallet use *either* unhardened or hardened derivation, but never both?

17:15

<Murch1> I was trying to respond to your question, not correct your answpr

17:15

<rot13maxi> if other signing RPCs don't currently fail/error when a rescan is in progress, then we might want to have the ` if (pwallet->IsScanningWithPassphrase())` guard on those other operations, right?

17:16

<ishaanam[m]> rot13maxi: Yeah, I don't think that other signing RPCs currently fail when a rescan is in progress

17:17

<ishaanam[m]> rot13maxi: yes, I wonder if that is a bug or if there is a deliberate reason for doing that. I think the main problem is that scanning RPCs don't hold cs_wallet, but signing RPCs do.

17:17

<ishaanam[m]> LarryRuane: If ␑importdescriptors␏ is used, then technically both hardened and hardened derivation can be used.

17:18

<ishaanam[m]> But in Bitcoin Core legacy wallets use hardened and descriptor wallets use unhardened

17:18

<LarryRuane> while rescanning, we're discovered UTXOs that we own (have keys for), but it should be okay to, for example, `sendmany` which would just use the UTXOs that we do know about? (rather than waiting for the rescan to complete?)

17:18

<LarryRuane> *discovering

17:21

<ishaanam[m]> LarryRuane: I think so, I would have to double check with the code but I can't see any immediate problems on master with spending while rescanning

17:21

<ishaanam[m]> Writing a test for this PR was one of the trickier parts. Does the test added to wallet_importdescriptors.py fail on master for you?

17:22

<ishaanam[m]> The test I am talking about: https://github.com/bitcoin-core-review-club/bitcoin/commit/b2e2829014c5c9a3dc02eab7fcd77dd907228c78#diff-c3ff95ab6716d726b68003d2d29f72d91e15e813c8b41e3e1c82d4f2a0feb26d

17:22

<hernanmarino> I didn't have a chance to test that, will do it later

17:23

<LarryRuane> no it didn't fail for me (tried 3 times)

17:23

<LarryRuane> i was wondering if using mocktime would help make the test more reliable? (i love mocktime!)

17:24

<pablomartin> i still have to perform the tests... but talking about it, is it worth to add more coverage regarding the signing RPCs during rescan and other situations/ combinations??

17:26

<ishaanam[m]> LarryRuane: Yeah, when writing this I had played around with using mocktime but it didn't seem to make a difference. However, if you can think of any ways to make this more reliable please let me know! Perhaps you can increase the "range" and "next_index" values in the descriptor object that the scan is being done for?

17:26

<LarryRuane> yes I would be glad to take a look

17:29

<LarryRuane> for anyone who may be newer here, you can see the PR changes many `LOCK` to `LOCK2` (to include the new `m_relocl_mutex`) -- the order of those `LOCK2` arguments is very important!

17:29

<ishaanam[m]> pablomartin: Currently it is somewhat difficult for the functional test suite to have a rescan going on in the backgroung while other RPCs are being run, but I think I remember there being a PR open for fixing that.

17:29

<ishaanam[m]> So now that we have gone over the bug, what is this PR's approach to fixing it?

17:31

<LarryRuane> i think during the long-running RPCs, hold the new `m_relock_mutex` the entire time,

17:31

<hernanmarino> disallowing the execution of certain commands while the rescan is running . Also, introducing a new mutex

17:31

<b_101> Add a flag to CWallet to avoid some RPCs to be run while rescanning and add a recursive mutex

17:32

<LarryRuane> so that if the timeout wants to occur, re-locking the wallet will be delayed by this mutex

17:32

<LarryRuane> https://github.com/bitcoin/bitcoin/pull/26347/files#diff-c4eeb9bd8e84ac276465d100d6ac4c1c9b9ddc145827a03dfe11932598b028b9R1660

17:33

<b_101> LarryRuane: +1

17:33

<pablomartin> ishaanam: thanks! 25957?

17:33

<LarryRuane> (does other stuff, but IIUC, that is the heart of it)

17:33

<ishaanam[m]> Right, so I think that the new recursive mutex is the most important part of this PR. Why is ␑m_relock_mutex␏ added instead of just using ␑cs_wallet␏?

17:34

<b_101> Is there any documentation for the Bitcoin Core LOCK macros?

17:34

<rot13maxi> what does `LOCK2` do over `LOCK`?

17:34

<LarryRuane> is it because `cs_wallet` is pretty "hot", used for lots of stuff that we don't want to block during the entire time of a rescan?

17:34

<glozow> rot13maxi: it takes 2 locks, one after the other

17:34

<rot13maxi> thanks

17:35

<rot13maxi> does it do the two locks atomically?

17:35

<LarryRuane> rot13maxi: `LOCK2(a, b)` first does `LOCK(a)` and *then* does `LOCK(b)`

17:35

<LarryRuane> rot13maxi: no, not atomically

17:35

<rot13maxi> got it. thanks

17:35

<ishaanam[m]> pablomartin: perhaps but I haven't taken a look at #25957. I was referring to the draft PR #26700

17:35

<LarryRuane> it may get the first lock, then block for a while before getting the second lock

17:38

<ishaanam[m]> Theoretically while ␑cs_wallet␏ could be used, ␑m_relock_mutex␏ is added because rescanning a wallet can take a long time, and we would still want other RPCs that lock ␑cs_wallet␏ to be able to run.

17:38

<pablomartin> ishaanam: insteresting (#26700), thanks again! I'll take a look!

17:39

<ishaanam[m]> Why is ␑m_relock_mutex␏ a RecursiveMutex instead of a Mutex? What are the differences?

100

17:40

<roze_paul> so m_relock_mutex functionally is very similar to cs_wallet, it just has a more focussed scope, (and is recursive) ?

101

17:40

<b_101> Because more than one selected RPCs can be run in parallel while rescan is in progress ?

102

17:41

<ishaanam[m]> roze_paul: yes, m_relock_mutex has a more focused scope, but cs_wallet is recursive as well

103

17:41

<LarryRuane> RecursiveMutex allows the same thread that already owns the lock to lock it again ... it's only unlocked on the last unlock (that is, there's a lock recursion counter)

104

17:41

<LarryRuane> if a thread locks a `Mutex` that it already owns, we'll either crash or just hang forever (i forget which)

105

17:43

<ishaanam[m]> LarryRuane: Yes, I think that Bitcoin Core would crash because its locking code detects things like that

106

17:43

<rot13maxi> ah. I think this lock on `pwallet->cs_wallet` in the `rescanblockchain()` will prevent concurrent spends from happening (https://github.com/bitcoin/bitcoin/blob/master/src/wallet/rpc/spend.cpp#L270) right?

107

17:43

<LarryRuane> but there's a general push to move away from RecursiveMutex because it encourages sloppy code ... I'm not sure why a `Mutex` wouldn't work here (i saw that there's a comment on the PR that you replied to)

108

17:44

<rot13maxi> dropped the link. This lock in rescanblockchain: https://github.com/bitcoin/bitcoin/pull/26347/files#diff-07e4c0f110004a013432ac478fc6e1386b124cb1ad59bbb2e5fee1cdd92a307fR879

109

17:44

<hernanmarino> b_101 : take a look at https://github.com/bitcoin/bitcoin/blob/ab98673f058853e00c310afad57925f54c1ecfae/src/sync.h#L258-L263 or perhaps even the whole file

110

17:45

<rot13maxi> thanks hernanmarino

111

17:45

<b_101> hernanmarino: Thx!

112

17:46

<glozow> from https://github.com/bitcoin/bitcoin/pull/26347#discussion_r1003661127, is the concern you might wait on m_relock_mutex and cs_wallet in different orders and deadlock?

113

17:46

<ishaanam[m]> LarryRuane: it is because sometimes ␑cs_wallet␏ is locked, and then ␑CWallet::Lock␏ is called, which locks ␑m_rescan_mutex␏ and then ␑cs_wallet␏, which is inconsistent from a few other places

114

17:48

<ishaanam[m]> glozow: yes, but if there is a strong preference towards Mutex then I can revisit the code and perhaps do it differently such that this is no longer a concern. This would require a few more changes then what is there currently from what I can tell.

115

17:50

<glozow> yes, from quick glance it seems `Lock()` could require `cs_wallet` to already be held and that could solve the problem. but i'm scope creeping

116

17:50

← rot13maxi left (~rot13maxi@pool-71-171-103-145.clppva.fios.verizon.net):

117

17:50

<ishaanam[m]> What would happen if the first commit of this PR was removed and a user tried to run walletlock while performing a rescan that required the passphrase?

118

17:51

<LarryRuane> https://github.com/bitcoin/bitcoin/pull/26347#discussion_r1008917871 (question about recusive versus non-recursive mutex)

119

17:54

<glozow> Just a guess - the user doesn't get a warning, but it still blocks, waiting for m_relock_mutex to become available?

120

17:55

<ishaanam[m]> Really what this question is asking is: if none of the encryption RPCs listed in the PR ran ␑IsScanningWithPassphrase␏, would happen when locking a wallet during a rescan?

121

17:55

<ishaanam[m]> glozow: Yes!

122

17:55

<hernanmarino> I have to go, see you all next time . Thanks ishaanam[m] for hosting

123

17:56

<glozow> That's good, though getting a warning immediately is definitely preferable

124

17:56

<LarryRuane> i think those can be really tough questions in general ... if a request (probably an RPC let's say) can't be satisfied immediately, is it better to block or to return an error?

125

17:56

<ishaanam[m]> That first commit was added so the user does not get confused if an RPC takes too long because of this.

126

17:57

<LarryRuane> i don't know if bitcoin core has a convention for that ... maybe it's just case-by-case ... depending on how long the delay would typically be?

127

17:57

<ishaanam[m]> LarryRuane: I think that would make sense. Here the delay has the potential to be pretty long.

128

17:58

<LarryRuane> ishaanam[m]: +1

129

17:59

<ishaanam[m]> Looks like we are coming up on an hour. Thanks everyone for participating!

130

17:59

<d33r_gee> thanks every1!