Keep inactive seeds after sethdseed and derive keys from them as needed (wallet)

https://github.com/bitcoin/bitcoin/pull/17681

Host: ryanofsky  -  PR author: achow101

The PR branch HEAD was 218c4f640 at the time of this review club meeting.

Notes

  • The sethdseed RPC creates a new BIP 32 HD seed for the wallet. The previous keys or seeds are no longer used to generate receive or change addresses.

  • If more than 1000 receive addresses have been given out from the old HD seed, and then the wallet is restored from an old backup, any keys after the first 1000 will not be in the keypool look-ahead.

  • Prior to this PR, if a new HD seed is generated, and subsequently funds are sent to the first 1000 addresses given out from the old HD seed, the keypool for the old HD seed would not top up, and any funds sent to keys after the first 1000 would not be added to the wallet.

  • This PR keeps the HD seed in the wallet (as an inactivehdseed), and tops up the keypool as necessary when new funds are received. New receive addresses are derived from the new HD seed as before.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)

  2. How does it help users to top up keypools for inactive HD seeds? What is not good about the behavior before this PR?

  3. The original motivation for the PR was to remove the restriction that setting a new HD seed wasn’t allowed during Initial Block Download. What was the reason for this restriction?

  4. Why was the sethdseed RPC added? Are there uses for having multiple HD seeds in the same wallet? Does this change to sethdseed affect new descriptor wallets as well as existing wallets?

  5. The main data structures used in this PR are CHDChain, CKeyMetadata, and KeyOriginInfo. What are the purposes of the different data structures?

  6. After the wallet has seen an existing key being used, how does the wallet decide whether to “top up” and generate new keys? What factors determine whether the wallet does or does not top up? Are there differences here between the new TopUpInactiveHDChain method and existing TopUp method?

  7. When is AddInactiveHDChain called? Where do the inactive HD chain objects come from? Is AddInactiveHDChain called everywhere it needs to be called?

  8. How is the PR structured and divided up? Would it make sense to split up or combine commits?

  9. An earlier version of this PR tried (the implementation was buggy) to store m_inactive_hd_chains map entries as "inactivehdchain" rows in the wallet database. Was it good to stop doing this? What are the advantages of storing this information? What are the advantages of not storing it?

  10. A previous version of this PR had a subtle bug on this line. What was the bug and what were the effects?

  11. Do you think this PR has sufficient test coverage? Are there ways the test could be extended?

Meeting Log

  113:00 <jnewbery> #startmeeting
  213:00 <kanzure> hi
  313:00 <andrewtoth> hi
  413:01 <troygiorshev> hi
  513:01 <raj_149> hi
  613:01 <theStack> hi
  713:01 <vasild> hi
  813:01 <ryanofsky> hi
  913:01 <soup> :hello:
 1013:01 <fjahr> hi
 1113:01 <jnewbery> hi!
 1213:01 <jkczyz> hi
 1313:01 <lightlike> HI
 1413:01 <jnewbery> ryanofksy is hosting today. Notes in the usual place: https://bitcoincore.reviews/17681.html
 1513:01 <ryanofsky> Welcome to the meeting on the sethdseed key topup PR: https://github.com/bitcoin/bitcoin/pull/17681, https://bitcoincore.reviews/17681.html
 1613:01 <jonatack> hi
 1713:01 <ryanofsky> if any one has any questions about the pr, feel free to ask, or we can go through questions at https://bitcoincore.reviews/17681.html
 1813:02 <ryanofsky> first question there is who's reviewed the pr? (y/n)
 1913:02 <jnewbery> y
 2013:02 <andrewtoth> n
 2113:02 <fjahr> y
 2213:02 <troygiorshev> n
 2313:02 <vasild> n
 2413:02 <jonatack> y
 2513:02 <soup> n
 2613:02 <raj_149> y
 2713:02 <jkczyz> y
 2813:02 <nehan> hi
 2913:03 <theStack> n (only Concept ACK)
 3013:03 <lightlike> n
 3113:03 <ryanofsky> That's good, next question is the purpose of the PR clear? How does it help users to top up keypools for inactive HD seeds? What is not good about the behavior before this PR?
 3213:04 <raj_149> it helps to keep track of inactive chains, top it up properly if addresses are querried. Not sure about the bad behaviour part.
 3313:05 <jonatack> I think the src/wallet/scriptpubkeyman.h::L50 CKeyPool documentation is helpful on this
 3413:05 <fjahr> for whatever reason funds could still be sent to addresses of these inactive seeds. Those should not be missed by the wallet.
 3513:06 <jnewbery> raj_149: it's not about querying addresses. It's if addresses from the keypool are used up if a transaction output is sent to them
 3613:06 <andrewtoth> it's a bit tricky, but i think the bug is that if a highly used seed is replaced by sethdseed, but lots of new txs come into that seed, and then the entire wallet is restored from an old backup but using the new seed, then txs sent to new addresses in the old seed won't be added to the wallet
 3713:06 <raj_149> jnewbery: oh, i thought it tops up whenever adress is generated. Where it is checking if the address actually have incoming transaction?
 3813:07 <ryanofsky> andrewtoth, yeah that kind of scenario (old backups) seems more likely to cause a bug, than someone intentionally changing hd seed on their wallet and expecting to receive old funds from new keys generated externally
 3913:07 <andrewtoth> ahh i see, i didn't consider keys generated externally by that replaced seed. makes sense thanks
 4013:08 <jnewbery> raj_149: yes, you're right that the keypool tops up when an address is given out. It also tops up if we receive a transaction output to one of the addresses in the keypool. See the MarkUnusedAddresses() function.
 4113:09 <jnewbery> in this case, we're not giving out new addresses from the old HD seed
 4213:09 <vasild> So, if I create a wallet, generate 1100 addresses, send them to some people, receive funds on 500 of them, replace the seed with sethdseed, for sure the 500 already received will be accounted for, later the remaining 600 addresses also receive funds. Would none of those 600 be accounted or just the last 100 (over 1000)?
 4313:09 <raj_149> jnewbery: right, missed that part. thanks..
 4413:10 <ryanofsky> vasild, yes that should be right
 4513:10 <vasild> which one? none of the new 600 are accounted or only the last 100 are not accounted?
 4613:10 <raj_149> my guess is it would miss the last 100. is that correct?
 4713:11 <jnewbery> vasild: in that scenario, the keypool would already have up to index 2100 (when you give out address 1100). I think a crucial part is that this happens only when you restore from an old backup
 4813:11 <theStack> raj_149: that would be my guess too
 4913:11 <troygiorshev> I guess it won't miss any of those 1100
 5013:12 <ryanofsky> oh, misread, yeah, none of the 1100 would be lost, it's the keys that come after those 1100, which might exist because the wallet is old backup, or there's another active wallet out with the same seed
 5113:12 <jonatack> * In the unlikely case where none of the addresses in the `gap limit` are
 5213:12 <jonatack> * used on-chain, the look-ahead will not be incremented to keep
 5313:12 <jonatack> * a constant size and addresses beyond this range will not be detected by an
 5413:12 <jonatack> * old backup.
 5513:12 <vasild> I see
 5613:12 <ryanofsky> good, so that's the general motivation for this pr, but next question is about the specific motivation
 5713:13 <ryanofsky> 3. The original motivation for the PR was to remove the restriction that setting a new HD seed wasn’t allowed during Initial Block Download. What was the reason for this restriction?
 5813:13 <ryanofsky> link to the relevant code is https://github.com/bitcoin/bitcoin/commit/769b03a83c2aa2b97f344b58dc689be26c6e08e5
 5913:14 <achow101> hi everyone, ping me if you have questions
 6013:14 <raj_149> ryanofsky during IBD the old chain can get some new tx that wont be accounted for if it is replaced?
 6113:14 <ryanofsky> hi achow101
 6213:14 <jonatack> hi achow101
 6313:15 <jkczyz> It wasn't clear to me from the PR description
 6413:15 <andrewtoth> sethdseed requires a rescan right?
 6513:15 <ryanofsky> raj_149, right, it can happen outside of ibd, but it's more likely keypool will run out during ibd with all the historical blocks being processed
 6613:15 <jnewbery> I think the easiest scenario to think about where this could be a problem is something like:
 6713:15 <jnewbery> 1. new wallet
 6813:15 <jnewbery> 2. create backup
 6913:15 <jnewbery> 3. give out 1001 addresses. keyopol tops up to index 2001 (assuming it's not locked)
 7013:15 <jnewbery> 4. receive funds to 1001 addresses
 7113:15 <jnewbery> ...
 7213:15 <jnewbery> 5. restore old backup on node - only the first 1000 keypool keys are regenerated
 7313:15 <jnewbery> 6. set new hd seed before the wallet has sync'ed to tip
 7413:15 <jnewbery> 7. the first 1000 payments are received, but payment 1001 is not because we're not able to top-up the keypool for the old seed after setting a new seed.
 7513:15 <ryanofsky> andrewtoth, sethdseed for an existing seed can require a rescan, for a new seed shouldn't require
 7613:17 <ryanofsky> jkczyz, your question about this is answered?
 7713:17 <raj_149> ryanofsky: i couldn't follow the rescan part. where the rescan logic is triggered by sethdseed?
 7813:18 <ryanofsky> raj_149, normally when sethdseed is called you are just creating a brand new hdseed, so no rescan is required
 7913:19 <achow101> sethdseed doesn't rescan. you would rescan separately afterwards if you have/want to
 8013:19 <ryanofsky> if you are setting an pre-existing seed that may have transactions associated, you have to rescan manually with a separate rpc call
 8113:19 <raj_149> oh i see.
 8213:19 <ryanofsky> it might have made sense (or might make sense) to add a rescan option to sethdseed or key birthday option like we have for other imports
 8313:19 <ryanofsky> but that kind of leads to next question
 8413:20 <ryanofsky> 4. Why was the sethdseed RPC added? Are there uses for having multiple HD seeds in the same wallet? Does this change to sethdseed affect new descriptor wallets as well as existing wallets?
 8513:20 <troygiorshev> the fact that it can still happen after IDB is even more motivation for this PR. Part of the point of HD wallets was that you can have a "master" wallet that can spend funds and then put a receive-only wallet on a vulnerable machine. But if you change the hd seed on the master and the receive-only receives a lot of new transactions (and generates more than 1000 new addresses) then
 8613:20 <troygiorshev> the master will lose them
 8713:20 <troygiorshev> ah sorry a little late there, did I get that right?
 8813:21 <raj_149> troygiorshev: that seems right as the topop wont happen for old chains currently.
 8913:21 <ryanofsky> troygiorshev, that makes sense in principle, but in practice we don't support a receive only wallet part that can top up, because we use hardened keys
 9013:22 <troygiorshev> ryanofsky: ah ok
 9113:22 <raj_149> ryanofsky: recieve only wallet part can remian is some other system like a merchant website.
 9213:22 <ryanofsky> raj_149, oh good, point actually that does seem like a good motivation then
 9313:23 <jonatack> As troygiorshev's and jnewbery's examples show, it's not a great idea, as a user, to reduce the gap limit.
 9413:23 <ryanofsky> or wait, isn't that precluded because of the hardened keys?
 9513:23 <jnewbery> achow101: for descriptor wallets do we still only use hardened keys, or is it possible to use unhardened keys too?
 9613:24 <achow101> jnewbery: descriptor wallets will use unhardened derivation
 9713:24 <jkczyz> ryanofsky: yes, I believe so. My main qualm is the reason that "we no longer need to wait for IBD to finish before sethdseed can work" is not given
 9813:24 <ryanofsky> (question 4 is about future of sethdseed and descriptor wallets)
 9913:25 <jnewbery> if anyone is unfamiliar with hardened/unhardened key derivation, it's defined in BIP32 here: https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki#child-key-derivation-ckd-functions
10013:25 <jonatack> achow101: by default we only make descriptors that use unhardened derivation, yes?
10113:25 <ryanofsky> jkczyz, yeah it think that may be phrased as a binary statement when it was always a matter of precaution. but with this change there is no longer any benefit to disabling sethdseed during ibd
10213:26 <achow101> jonatack: yes. descriptor wallets use bip44/49/84
10313:26 <jnewbery> specifically of note is that for hardened keys it's not possible to derive a child public key from a parent public key: https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki#public-parent-key--public-child-key
10413:26 <ryanofsky> because after this change, no matter when sethdseed is call, during ibd or before, behavior for the previous hdseed will be the same
10513:26 <raj_149> ryanofsky: that seems correct. Corollary question, what one needs to do to have a recieve only wallet in a merchant site and still get it tracked by core?
10613:27 <achow101> raj_149: you have to export tons of addresses and constantly do that as addresses get used
10713:28 <jnewbery> achow101: why wouldn't you be able to use the xpub for an unhardened descriptor?
10813:29 <jonatack> raj_149: i think it will finally become possible to export an xpub to, say, btcpayserver from a bitcoin core wallet, with descriptor wallets
10913:29 <raj_149> jonatack: is there any specific reason why core never creates unhardened keys and only hardened ones?
11013:30 <achow101> jnewbery: exports aren't implemented for descriptors yet. but once that's done, yes, you can use the xpub
11113:30 <jonatack> raj_149: see https://bitcoin.stackexchange.com/questions/90135/how-do-i-export-an-xpub-from-bitcoin-core-for-use-in-btcpayserver
11213:30 <achow101> but also sethdseed is disabled for descriptor wallets
11313:31 <raj_149> jonatack: thanks, answers it.
11413:31 <ryanofsky> just to wrap up question 4, I couldn't think of real uses cases for sethdseed anymore, and didn't check but suspect it was just added before there was multiwallet support
11513:31 <jonatack> raj_149: for security reasons, bitcoin core legacy wallets don't support xpub key derivation because if an attacker knows the private key of any of the child keys and the xpub, he can compute the private key of all child keys
11613:31 <jonatack> right
11713:32 <achow101> ryanofsky: it's original purpose was for the upgrade from non-HD to HD, but that got cut at some point
11813:32 <ryanofsky> anybody want to summarize purpose of CHDChain, CKeyMetadata, and KeyOriginInfo for question 5?
11913:33 <ryanofsky> achow101, oh that makes sense. and of course it makes a lot of sense this is dropped for descriptor wallets
12013:34 <raj_149> CHDChain: contans data to construct the full chain. CKeyMetadata: Contains metadata for a particular key KeyOriginInfo: contains origin and path to get to this key.
12113:35 <ryanofsky> yeah important fields in CHDChain for this are nExternalChainCounter and nInternalChainCounter
12213:35 <jonatack> if helpful to anyone, achow101 wrote a really helpful doc lately about descriptor wallets: https://gist.github.com/achow101/94d889715afd49181f8efdca1f9faa25
12313:35 <jonatack> achow101: hope it wasn't a secret doc :p
12413:35 <raj_149> ryanofsky: my idea is they just contains the max used index for the chain. is that correct?
12513:36 <ryanofsky> and the key indices in KeyOriginInfo::path field are used to derive these on startup
12613:36 <raj_149> jonatack: cant help anymore if it was, already starred it. :P
12713:36 <ryanofsky> raj_149, right they're the max indices that summarize the state of the keypool
12813:37 <ryanofsky> question 6 is about differences between the new TopUpInactiveHDChain method and existing TopUp method, anything notable there?
12913:37 <ryanofsky> https://github.com/ryanofsky/bitcoin/blob/review.17681.5/src/wallet/scriptpubkeyman.cpp#L293 vs https://github.com/ryanofsky/bitcoin/blob/review.17681.5/src/wallet/scriptpubkeyman.cpp#L1194
13013:38 <ryanofsky> or opinions on whether it's good to have such similar methods or merge them?
13113:38 <raj_149> ryanofsky: the new one has the bool internal parameter, so tops up only a single chain.
13213:39 <jnewbery> ryanofsky: The call to AddKeypoolPubkeyWithDB() has been removed from TopUpInactiveHDChain() in the latest version
13313:40 <jonatack> jnewbery: that change thanks to recent review :D
13413:40 <ryanofsky> raj_149, that's interesting, yeah i guess it gets to a basic difference that TopUpInactive is told specifically what to top up, while TopUp just figures it out itself
13513:41 <ryanofsky> jnewbery, oh actually forgot I was pasting links to an old version
13613:41 <raj_149> ryanofsky: if TopUp can have some way to know the chain in context, cant it handle the inactive topup part also?
13713:42 <ryanofsky> raj_149, yes to merge the functions in an efficient way TopUp would have to be passed more information
13813:42 <jkczyz> Maybe pass kp_size and chain to a common helper? Haven't looked in detail if it is possible
13913:42 <achow101> iirc there was an iteration (before the pr was opened) where TopUp was being used for both, but it ended a bit too messy
14013:43 <ryanofsky> yeah, point of the question was just get people looking at these functions, and see how new code compares to existing
14113:43 <ryanofsky> if these are more or less clear we can go on to AddInactiveHDChain in question 7
14213:43 <ryanofsky> 7. When is AddInactiveHDChain called? Where do the inactive HD chain objects come from? Is AddInactiveHDChain called everywhere it needs to be called?
14313:44 <ryanofsky> there's a twist to this question in that I think there's a bug in the current version of the pr related to this
14413:45 <jonatack> Good question. Only called from LoadWallet atm.
14513:46 <jnewbery> ryanofsky: should this be called from sethdseed too?
14613:46 <ryanofsky> jnewbery, yes that's what i was thinking at least
14713:48 <jnewbery> I think you're right
14813:48 <ryanofsky> is it clear to people what AddInactiveHDChain does and when it should be called? anyone want to summarize?
14913:49 <raj_149> ryanofsky: it adds a chain to m_inactive_hd_chains. But if sethdchain requires also m_inactive_hd_chains to be updated and its not right now. How the test is passing?
15013:51 <fjahr> the test does us loadwallet to check persistence
15113:51 <ryanofsky> right the test isn't just calling sethdseed, it's also loading & unloading which could mask the bug
15213:52 <ryanofsky> maybe skip through some other questions, unless people have thoughts / opinions on them
15313:52 <jnewbery> right. The test doesn't do any operations or checking between sethdseed and unload/reload wallet
15413:53 <ryanofsky> one question that might be interesting is 10: A previous version of this PR had a subtle bug on this line. What was the bug and what were the effects?
15513:53 <ryanofsky> previous pr version with bug: https://github.com/ryanofsky/bitcoin/commits/review.17681.4
15613:53 <ryanofsky> previous line of code with the bug: https://github.com/ryanofsky/bitcoin/blob/review.17681.4/src/wallet/walletdb.cpp#L446
15713:54 <raj_149> ryanofsky: it was a bit wise and instead of or. Resulting into nothing happening. Is that correct?
15813:54 <fjahr> yes but I thing only internal keys would not be extended because of this bug
15913:55 <jonatack> raj_149: yes. a good catch by ryanofsky. i missed it during my first review.
16013:55 <ryanofsky> raj_149 & fjahr that's right, shows why it's good to look carefully at arithmetic operations, and new code cleans this up quite a bit
16113:55 <fjahr> test fails if the two lines with unload and load are commented out btw
16213:56 <fjahr> *the functional test I mean, going back to previous question
16313:56 <raj_149> can anyone give a one liner on what happens at loading and unloading of wallet?
16413:57 <jnewbery> raj_149: there's a lot going on during load/unload. I don't think it's possible to summarize in one line
16513:58 <ryanofsky> raj_149, i'd say the important thing happening is that m_inactive_hd_chains is populated on loading, by looping over all the keys and finding the max internal/external indices
16613:59 <ryanofsky> maybe people have thoughts in general on the code / pr
16713:59 <ryanofsky> my thoughts are that i'm glad this stuff is going away with descriptor wallets :)
16813:59 <jnewbery> ryanofsky: +1
16914:00 <raj_149> ryanofsky: i need to understand descriptor wallet better. But overall the motivation made sense. The wallets need to handle old chains sensibly.
17014:00 <jnewbery> ok, that's time. Thanks for hosting ryanofsky!
17114:00 <fjahr> Thanks ryanofsky, great job!
17214:00 <jnewbery> if anyone wants to host in the coming weeks, please message me
17314:00 <ryanofsky> Thanks everyone!
17414:00 <jonatack> thanks ryanofsky!
17514:00 <andrewtoth> thanks ryanofsky!
17614:01 <troygiorshev> thanks ryanofsky!
17714:01 <theStack> thanks ryanofsky
17814:01 <michaelfolkson> Thanks!
17914:01 <raj_149> thanks everybody. It was a great session. For the first time it felt like i understood enough to have meaningful conversation. thats motivating. :)