wallet: ensure the wallet is unlocked when needed for rescanning (wallet)

https://github.com/bitcoin/bitcoin/pull/26347

Host: ishaanam  -  PR author: ishaanam

The PR branch HEAD was b2e2829 at the time of this review club meeting.

Notes

  • A wallet rescans the blockchain in order to find transactions related to its keys. A wallet starts rescanning the blockchain upon the import of keys, descriptors, etc. A rescan can also be invoked by calling rescanblockchain.

  • Wallet rescans were previously discussed in review clubs #15845 and #25957.

  • When a wallet is rescanning, it needs to be able to top-up its keypool.

  • A wallet’s keypool is its reserve of unused keys of a denoted size (the default size is 1000 keys). When a wallet is using BIP 32 to generate deterministic keys, an unused key is considered to be any key whose BIP 32 index is greater than the index of the last key which was provided to the user by the wallet or which received funds.

  • WalletRescanReserver is used to manage rescans in a wallet.

  • Wallets can optionally be encrypted using a passphrase. Only the private key material of the wallet is encrypted using a user-provided passphrase.

  • An encrypted wallet is locked by default; it can be unlocked by calling walletpassphrase, which then stores the passphrase in memory for a specified amount of time before the wallet is locked again

  • Certain RPCs which make use of information relating to private keys, such as sendtoaddress and signrawtransactionwithwallet require the wallet to be unlocked in order to work correctly. If the wallet is not unlocked, the RPC typically throws an error.

  • The cs_wallet mutex guards access to the passphrase when stored in memory. For example, cs_wallet must be held before the passphrase can be deleted, so holding the cs_wallet mutex for the duration of the RPC ensures another thread cannot delete the passphrase while it is in use.

  • The rescanblockchain and importdescriptors RPCs don’t hold cs_wallet during the entire rescan, only for certain relatively fast operations, because a rescan can take over an hour, and the user might want to use other RPCs in the meantime which require the cs_wallet lock to be held.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. What bug is this PR fixing?

  3. Why can wallets that use unhardened derivation top up the keypool without being unlocked? Would this problem occur when rescanning a descriptor wallet with descriptors generated by Bitcoin Core?

  4. Does the test added to wallet_importdescriptors.py fail on master for you? Can you think of any ways to make it fail on master more reliably?

  5. What is this PR’s approach to fixing this bug? Can you think of any alternatives?

  6. Why was the m_scanning_with_passphrase variable added to CWallet?

  7. Why is wallet.m_scanning_with_passphrase only set to true in some RPCs and not others like importpubkey and importprivkey?

  8. What would happen if the first commit of this PR was removed and a user tried to run walletlock while performing a rescan that required the passphrase?

  9. Why was another mutex added instead of just using cs_wallet?

  10. Why is RecursiveMutex used here as opposed to Mutex?

  11. Why does m_relock_mutex get locked twice in walletlock?

Meeting Log

  117:00 <ishaanam[m]> #startmeeting
  217:00 <ishaanam[m]> Hey everyone, welcome to PR Review club!
  317:00 <brunoerg> hi
  417:00 <d33r_gee> hello
  517:01 <raj21> hi
  617:01 <pablomartin> hello!
  717:01 <b_101> hello all!
  817:01 <hernanmarino> Hello
  917:01 <ishaanam[m]> Today we will be reviewing #26347 "Ensure the wallet is unlocked when needed for rescanning"
 1017:01 <LarryRuane> hi
 1117:01 <ishaanam[m]> Notes are posted here: https://bitcoincore.reviews/26347
 1217:01 <grettke> Hello.
 1317:02 <ishaanam[m]> As always, feel free to jump in with any questions or comments!
 1417:02 <svav> Hi
 1517:02 <glozow> hi
 1617:02 <ishaanam[m]> Has everyone gotten the chance to take a look at the PR or the notes?
 1717:02 <d33r_gee> yes
 1817:02 <roze_paul> hi
 1917:02 <pablomartin> yup, both
 2017:02 <roze_paul> a bit
 2117:02 <b_101> y/y
 2217:03 <LarryRuane> some of both :)
 2317:03 <svav> yes
 2417:03 <hernanmarino> I did a review on the PR a couple of months ago, but didn t read the notes thoroughly today
 2517:03 <willcl_ark> Hi
 2617:03 <Murch1> Hey
 2717:04 <ishaanam[m]> Great! Can someone give a short summary of the bug this PR is fixing?
 2817:05 <d33r_gee> This PR makes sure that the wallet stays unlocked while rescanning and that the following rpc calls do not run while rescanning
 2917:05 <d33r_gee> - walletlock
 3017:05 <d33r_gee> - encryptwallet
 3117:05 <d33r_gee> - walletpassphrasechange
 3217:05 <grettke> Yes took a look.
 3317:05 <LarryRuane> while either of the 2 long-running RPCs is running, the wallet lock can timeout (wallet locks), which prevents key-topoff (those long-running RPCs create a need for topoff)
 3417:05 <b_101> fix risk of missing some keys when rescanning the blockchain due to an unlockwallet time-out, particularly if time set is not enough
 3517:06 <LarryRuane> one thing I'm unclear on is what happens (without the PR) if the key topoff wants to run but the wallet is locked? Does the long-running RPC immediately fail?
 3617:06 <ishaanam[m]> d33r_gee: LarryRuane: b_101: yes!
 3717:06 <LarryRuane> or do we just not do the topoff? if so, what's the effect of that?
 3817:07 <ishaanam[m]> LarryRuane: the key topoff fails silently, meaning that the scan continues, but with less keys in its keypool than it should have
 3917:07 <b_101> can someone define the meaning of topoff?
 4017:07 <LarryRuane> ishaanam[m]: thanks
 4117:08 <LarryRuane> for topoff, this is a good article https://blog.lopp.net/mind-the-bitcoin-address-gap/
 4217:08 <b_101> LarryRuane: thx!
 4317:09 <ishaanam[m]> This sort of lead to the next question: Why can wallets that use unhardened derivation top up the keypool without being unlocked?
 4417:10 <LarryRuane> when rescanning, if you detect a payment to one of your addresses, you want to make sure you're looking for payments to the next 1000 addresses ... may need to generate more addresses
 4517:10 <LarryRuane> (hope i got that right!)
 4617:11 <ishaanam[m]> Yes, that is what it means to top up the keypool.
 4717:11 <rot13maxi> Will other signing operations with the wallet fail/error if the wallet is in the middle of rescanning? In other words, if my wallet normally re-locks after 30 seconds, but is "held open" because it's rescanning, could an RPC to send a transaction come in while its rescanning, or would that RPC fail because there's a scan-in-progress?
 4817:12 <LarryRuane> ishaanam[m]: unhardened doesn't need private keys? ... but i'm fuzzy on this topic, is it kind of watch-only situation?
 4917:13 <roze_paul> +1 @rot13maxi
 5017:13 <ishaanam[m]> LarryRuane: yes, with unhardened derivation, the "keypool" is basically the xpub, because private information is not needed to derive additional keys.
 5117:13 <Murch1> ishaanam: Only the private keys are encrypted, so the unlocked wallet.dat is sufficient to derive unhardened keys
 5217:14 <LarryRuane> ishaanam[m]: +1 thanks
 5317:14 <ishaanam[m]> Murch: yes, thanks for pointing that out. It is important to note that the wallet only encrypts private keys
 5417:15 <Murch1> err
 5517:15 <Murch1> also, I meant the “locked wallet is sufficient”
 5617:15 <LarryRuane> ishaanam[m]: does a wallet use *either* unhardened or hardened derivation, but never both?
 5717:15 <Murch1> I was trying to respond to your question, not correct your answpr
 5817:15 <rot13maxi> if other signing RPCs don't currently fail/error when a rescan is in progress, then we might want to have the ` if (pwallet->IsScanningWithPassphrase())` guard on those other operations, right?
 5917:16 <ishaanam[m]> rot13maxi: Yeah, I don't think that other signing RPCs currently fail when a rescan is in progress
 6017:17 <ishaanam[m]> rot13maxi: yes, I wonder if that is a bug or if there is a deliberate reason for doing that. I think the main problem is that scanning RPCs don't hold cs_wallet, but signing RPCs do.
 6117:17 <ishaanam[m]> LarryRuane: If ␑importdescriptors␏ is used, then technically both hardened and hardened derivation can be used.
 6217:18 <ishaanam[m]> But in Bitcoin Core legacy wallets use hardened and descriptor wallets use unhardened
 6317:18 <LarryRuane> while rescanning, we're discovered UTXOs that we own (have keys for), but it should be okay to, for example, `sendmany` which would just use the UTXOs that we do know about? (rather than waiting for the rescan to complete?)
 6417:18 <LarryRuane> *discovering
 6517:21 <ishaanam[m]> LarryRuane: I think so, I would have to double check with the code but I can't see any immediate problems on master with spending while rescanning
 6617:21 <ishaanam[m]> Writing a test for this PR was one of the trickier parts. Does the test added to wallet_importdescriptors.py fail on master for you?
 6717:22 <ishaanam[m]> The test I am talking about: https://github.com/bitcoin-core-review-club/bitcoin/commit/b2e2829014c5c9a3dc02eab7fcd77dd907228c78#diff-c3ff95ab6716d726b68003d2d29f72d91e15e813c8b41e3e1c82d4f2a0feb26d
 6817:22 <hernanmarino> I didn't have a chance to test that, will do it later
 6917:23 <LarryRuane> no it didn't fail for me (tried 3 times)
 7017:23 <LarryRuane> i was wondering if using mocktime would help make the test more reliable? (i love mocktime!)
 7117:24 <pablomartin> i still have to perform the tests... but talking about it, is it worth to add more coverage regarding the signing RPCs during rescan and other situations/ combinations??
 7217:26 <ishaanam[m]> LarryRuane: Yeah, when writing this I had played around with using mocktime but it didn't seem to make a difference. However, if you can think of any ways to make this more reliable please let me know! Perhaps you can increase the "range" and "next_index" values in the descriptor object that the scan is being done for?
 7317:26 <LarryRuane> yes I would be glad to take a look
 7417:29 <LarryRuane> for anyone who may be newer here, you can see the PR changes many `LOCK` to `LOCK2` (to include the new `m_relocl_mutex`) -- the order of those `LOCK2` arguments is very important!
 7517:29 <ishaanam[m]> pablomartin: Currently it is somewhat difficult for the functional test suite to have a rescan going on in the backgroung while other RPCs are being run, but I think I remember there being a PR open for fixing that.
 7617:29 <ishaanam[m]> So now that we have gone over the bug, what is this PR's approach to fixing it?
 7717:31 <LarryRuane> i think during the long-running RPCs, hold the new `m_relock_mutex` the entire time,
 7817:31 <hernanmarino> disallowing the execution of certain commands while the rescan is running . Also, introducing a new mutex
 7917:31 <b_101> Add a flag to CWallet to avoid some RPCs to be run while rescanning and add a recursive mutex
 8017:32 <LarryRuane> so that if the timeout wants to occur, re-locking the wallet will be delayed by this mutex
 8117:32 <LarryRuane> https://github.com/bitcoin/bitcoin/pull/26347/files#diff-c4eeb9bd8e84ac276465d100d6ac4c1c9b9ddc145827a03dfe11932598b028b9R1660
 8217:33 <b_101> LarryRuane: +1
 8317:33 <pablomartin> ishaanam: thanks! 25957?
 8417:33 <LarryRuane> (does other stuff, but IIUC, that is the heart of it)
 8517:33 <ishaanam[m]> Right, so I think that the new recursive mutex is the most important part of this PR. Why is ␑m_relock_mutex␏ added instead of just using ␑cs_wallet␏?
 8617:34 <b_101> Is there any documentation for the Bitcoin Core LOCK macros?
 8717:34 <rot13maxi> what does `LOCK2` do over `LOCK`?
 8817:34 <LarryRuane> is it because `cs_wallet` is pretty "hot", used for lots of stuff that we don't want to block during the entire time of a rescan?
 8917:34 <glozow> rot13maxi: it takes 2 locks, one after the other
 9017:34 <rot13maxi> thanks
 9117:35 <rot13maxi> does it do the two locks atomically?
 9217:35 <LarryRuane> rot13maxi: `LOCK2(a, b)` first does `LOCK(a)` and *then* does `LOCK(b)`
 9317:35 <LarryRuane> rot13maxi: no, not atomically
 9417:35 <rot13maxi> got it. thanks
 9517:35 <ishaanam[m]> pablomartin: perhaps but I haven't taken a look at #25957. I was referring to the draft PR #26700
 9617:35 <LarryRuane> it may get the first lock, then block for a while before getting the second lock
 9717:38 <ishaanam[m]> Theoretically while ␑cs_wallet␏ could be used, ␑m_relock_mutex␏ is added because rescanning a wallet can take a long time, and we would still want other RPCs that lock ␑cs_wallet␏ to be able to run.
 9817:38 <pablomartin> ishaanam: insteresting (#26700), thanks again! I'll take a look!
 9917:39 <ishaanam[m]> Why is ␑m_relock_mutex␏ a RecursiveMutex instead of a Mutex? What are the differences?
10017:40 <roze_paul> so m_relock_mutex functionally is very similar to cs_wallet, it just has a more focussed scope, (and is recursive) ?
10117:40 <b_101> Because more than one selected RPCs can be run in parallel while rescan is in progress ?
10217:41 <ishaanam[m]> roze_paul: yes, m_relock_mutex has a more focused scope, but cs_wallet is recursive as well
10317:41 <LarryRuane> RecursiveMutex allows the same thread that already owns the lock to lock it again ... it's only unlocked on the last unlock (that is, there's a lock recursion counter)
10417:41 <LarryRuane> if a thread locks a `Mutex` that it already owns, we'll either crash or just hang forever (i forget which)
10517:43 <ishaanam[m]> LarryRuane: Yes, I think that Bitcoin Core would crash because its locking code detects things like that
10617:43 <rot13maxi> ah. I think this lock on `pwallet->cs_wallet` in the `rescanblockchain()` will prevent concurrent spends from happening (https://github.com/bitcoin/bitcoin/blob/master/src/wallet/rpc/spend.cpp#L270) right?
10717:43 <LarryRuane> but there's a general push to move away from RecursiveMutex because it encourages sloppy code ... I'm not sure why a `Mutex` wouldn't work here (i saw that there's a comment on the PR that you replied to)
10817:44 <rot13maxi> dropped the link. This lock in rescanblockchain: https://github.com/bitcoin/bitcoin/pull/26347/files#diff-07e4c0f110004a013432ac478fc6e1386b124cb1ad59bbb2e5fee1cdd92a307fR879
10917:44 <hernanmarino> b_101 : take a look at https://github.com/bitcoin/bitcoin/blob/ab98673f058853e00c310afad57925f54c1ecfae/src/sync.h#L258-L263 or perhaps even the whole file
11017:45 <rot13maxi> thanks hernanmarino
11117:45 <b_101> hernanmarino: Thx!
11217:46 <glozow> from https://github.com/bitcoin/bitcoin/pull/26347#discussion_r1003661127, is the concern you might wait on m_relock_mutex and cs_wallet in different orders and deadlock?
11317:46 <ishaanam[m]> LarryRuane: it is because sometimes ␑cs_wallet␏ is locked, and then ␑CWallet::Lock␏ is called, which locks ␑m_rescan_mutex␏ and then ␑cs_wallet␏, which is inconsistent from a few other places
11417:48 <ishaanam[m]> glozow: yes, but if there is a strong preference towards Mutex then I can revisit the code and perhaps do it differently such that this is no longer a concern. This would require a few more changes then what is there currently from what I can tell.
11517:50 <glozow> yes, from quick glance it seems `Lock()` could require `cs_wallet` to already be held and that could solve the problem. but i'm scope creeping
11617:50 ← rot13maxi left (~rot13maxi@pool-71-171-103-145.clppva.fios.verizon.net):
11717:50 <ishaanam[m]> What would happen if the first commit of this PR was removed and a user tried to run walletlock while performing a rescan that required the passphrase?
11817:51 <LarryRuane> https://github.com/bitcoin/bitcoin/pull/26347#discussion_r1008917871 (question about recusive versus non-recursive mutex)
11917:54 <glozow> Just a guess - the user doesn't get a warning, but it still blocks, waiting for m_relock_mutex to become available?
12017:55 <ishaanam[m]> Really what this question is asking is: if none of the encryption RPCs listed in the PR ran ␑IsScanningWithPassphrase␏, would happen when locking a wallet during a rescan?
12117:55 <ishaanam[m]> glozow: Yes!
12217:55 <hernanmarino> I have to go, see you all next time . Thanks ishaanam[m] for hosting
12317:56 <glozow> That's good, though getting a warning immediately is definitely preferable
12417:56 <LarryRuane> i think those can be really tough questions in general ... if a request (probably an RPC let's say) can't be satisfied immediately, is it better to block or to return an error?
12517:56 <ishaanam[m]> That first commit was added so the user does not get confused if an RPC takes too long because of this.
12617:57 <LarryRuane> i don't know if bitcoin core has a convention for that ... maybe it's just case-by-case ... depending on how long the delay would typically be?
12717:57 <ishaanam[m]> LarryRuane: I think that would make sense. Here the delay has the potential to be pretty long.
12817:58 <LarryRuane> ishaanam[m]: +1
12917:59 <ishaanam[m]> Looks like we are coming up on an hour. Thanks everyone for participating!
13017:59 <d33r_gee> thanks every1!
13117:59 <glozow> Great meeting, thank you ishaanam!
13218:00 <ishaanam[m]> #endmeeting