Keep inactive seeds after sethdseed and derive keys from them as needed (wallet)

May 13, 2020

https://github.com/bitcoin/bitcoin/pull/17681

Host: ryanofsky - PR author: achow101

The PR branch HEAD was 218c4f640 at the time of this review club meeting.

Notes

The sethdseed RPC creates a new BIP 32 HD seed for the wallet. The previous keys or seeds are no longer used to generate receive or change addresses.
If more than 1000 receive addresses have been given out from the old HD seed, and then the wallet is restored from an old backup, any keys after the first 1000 will not be in the keypool look-ahead.
Prior to this PR, if a new HD seed is generated, and subsequently funds are sent to the first 1000 addresses given out from the old HD seed, the keypool for the old HD seed would not top up, and any funds sent to keys after the first 1000 would not be added to the wallet.
This PR keeps the HD seed in the wallet (as an inactivehdseed), and tops up the keypool as necessary when new funds are received. New receive addresses are derived from the new HD seed as before.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)
How does it help users to top up keypools for inactive HD seeds? What is not good about the behavior before this PR?
The original motivation for the PR was to remove the restriction that setting a new HD seed wasn’t allowed during Initial Block Download. What was the reason for this restriction?
Why was the sethdseed RPC added? Are there uses for having multiple HD seeds in the same wallet? Does this change to sethdseed affect new descriptor wallets as well as existing wallets?
The main data structures used in this PR are CHDChain, CKeyMetadata, and KeyOriginInfo. What are the purposes of the different data structures?
After the wallet has seen an existing key being used, how does the wallet decide whether to “top up” and generate new keys? What factors determine whether the wallet does or does not top up? Are there differences here between the new TopUpInactiveHDChain method and existing TopUp method?
When is AddInactiveHDChain called? Where do the inactive HD chain objects come from? Is AddInactiveHDChain called everywhere it needs to be called?
How is the PR structured and divided up? Would it make sense to split up or combine commits?
An earlier version of this PR tried (the implementation was buggy) to store m_inactive_hd_chains map entries as "inactivehdchain" rows in the wallet database. Was it good to stop doing this? What are the advantages of storing this information? What are the advantages of not storing it?
A previous version of this PR had a subtle bug on this line. What was the bug and what were the effects?
Do you think this PR has sufficient test coverage? Are there ways the test could be extended?

Meeting Log

13:00

<jnewbery> #startmeeting

13:00

<kanzure> hi

13:00

<andrewtoth> hi

13:01

<troygiorshev> hi

13:01

<raj_149> hi

13:01

<theStack> hi

13:01

<vasild> hi

13:01

<ryanofsky> hi

13:01

<soup> :hello:

13:01

<fjahr> hi

13:01

<jnewbery> hi!

13:01

<jkczyz> hi

13:01

<lightlike> HI

13:01

<jnewbery> ryanofksy is hosting today. Notes in the usual place: https://bitcoincore.reviews/17681.html

13:01

<ryanofsky> Welcome to the meeting on the sethdseed key topup PR: https://github.com/bitcoin/bitcoin/pull/17681, https://bitcoincore.reviews/17681.html

13:01

<jonatack> hi

13:01

<ryanofsky> if any one has any questions about the pr, feel free to ask, or we can go through questions at https://bitcoincore.reviews/17681.html

13:02

<ryanofsky> first question there is who's reviewed the pr? (y/n)

13:02

<jnewbery> y

13:02

<andrewtoth> n

13:02

<fjahr> y

13:02

<troygiorshev> n

13:02

<vasild> n

13:02

<jonatack> y

13:02

<soup> n

13:02

<raj_149> y

13:02

<jkczyz> y

13:02

<nehan> hi

13:03

<theStack> n (only Concept ACK)

13:03

<lightlike> n

13:03

<ryanofsky> That's good, next question is the purpose of the PR clear? How does it help users to top up keypools for inactive HD seeds? What is not good about the behavior before this PR?

13:04

<raj_149> it helps to keep track of inactive chains, top it up properly if addresses are querried. Not sure about the bad behaviour part.

13:05

<jonatack> I think the src/wallet/scriptpubkeyman.h::L50 CKeyPool documentation is helpful on this

13:05

<fjahr> for whatever reason funds could still be sent to addresses of these inactive seeds. Those should not be missed by the wallet.

13:06

<jnewbery> raj_149: it's not about querying addresses. It's if addresses from the keypool are used up if a transaction output is sent to them

13:06

<andrewtoth> it's a bit tricky, but i think the bug is that if a highly used seed is replaced by sethdseed, but lots of new txs come into that seed, and then the entire wallet is restored from an old backup but using the new seed, then txs sent to new addresses in the old seed won't be added to the wallet

13:06

<raj_149> jnewbery: oh, i thought it tops up whenever adress is generated. Where it is checking if the address actually have incoming transaction?

13:07

<ryanofsky> andrewtoth, yeah that kind of scenario (old backups) seems more likely to cause a bug, than someone intentionally changing hd seed on their wallet and expecting to receive old funds from new keys generated externally

13:07

<andrewtoth> ahh i see, i didn't consider keys generated externally by that replaced seed. makes sense thanks

13:08

<jnewbery> raj_149: yes, you're right that the keypool tops up when an address is given out. It also tops up if we receive a transaction output to one of the addresses in the keypool. See the MarkUnusedAddresses() function.

13:09

<jnewbery> in this case, we're not giving out new addresses from the old HD seed

13:09

<vasild> So, if I create a wallet, generate 1100 addresses, send them to some people, receive funds on 500 of them, replace the seed with sethdseed, for sure the 500 already received will be accounted for, later the remaining 600 addresses also receive funds. Would none of those 600 be accounted or just the last 100 (over 1000)?

13:09

<raj_149> jnewbery: right, missed that part. thanks..

13:10

<ryanofsky> vasild, yes that should be right

13:10

<vasild> which one? none of the new 600 are accounted or only the last 100 are not accounted?

13:10

<raj_149> my guess is it would miss the last 100. is that correct?

13:11

<jnewbery> vasild: in that scenario, the keypool would already have up to index 2100 (when you give out address 1100). I think a crucial part is that this happens only when you restore from an old backup

13:11

<theStack> raj_149: that would be my guess too

13:11

<troygiorshev> I guess it won't miss any of those 1100

13:12

<ryanofsky> oh, misread, yeah, none of the 1100 would be lost, it's the keys that come after those 1100, which might exist because the wallet is old backup, or there's another active wallet out with the same seed

13:12

<jonatack> * In the unlikely case where none of the addresses in the `gap limit` are

13:12

<jonatack> * used on-chain, the look-ahead will not be incremented to keep

13:12

<jonatack> * a constant size and addresses beyond this range will not be detected by an

13:12

<jonatack> * old backup.

13:12

<vasild> I see

13:12

<ryanofsky> good, so that's the general motivation for this pr, but next question is about the specific motivation

13:13

<ryanofsky> 3. The original motivation for the PR was to remove the restriction that setting a new HD seed wasn’t allowed during Initial Block Download. What was the reason for this restriction?

13:13

<ryanofsky> link to the relevant code is https://github.com/bitcoin/bitcoin/commit/769b03a83c2aa2b97f344b58dc689be26c6e08e5

13:14

<achow101> hi everyone, ping me if you have questions

13:14

<raj_149> ryanofsky during IBD the old chain can get some new tx that wont be accounted for if it is replaced?

13:14

<ryanofsky> hi achow101

13:14

<jonatack> hi achow101

13:15

<jkczyz> It wasn't clear to me from the PR description

13:15

<andrewtoth> sethdseed requires a rescan right?

13:15

<ryanofsky> raj_149, right, it can happen outside of ibd, but it's more likely keypool will run out during ibd with all the historical blocks being processed

13:15

<jnewbery> I think the easiest scenario to think about where this could be a problem is something like:

13:15

<jnewbery> 1. new wallet

13:15

<jnewbery> 2. create backup

13:15

<jnewbery> 3. give out 1001 addresses. keyopol tops up to index 2001 (assuming it's not locked)

13:15

<jnewbery> 4. receive funds to 1001 addresses

13:15

<jnewbery> ...

13:15

<jnewbery> 5. restore old backup on node - only the first 1000 keypool keys are regenerated

13:15

<jnewbery> 6. set new hd seed before the wallet has sync'ed to tip

13:15

<jnewbery> 7. the first 1000 payments are received, but payment 1001 is not because we're not able to top-up the keypool for the old seed after setting a new seed.

13:15

<ryanofsky> andrewtoth, sethdseed for an existing seed can require a rescan, for a new seed shouldn't require

13:17

<ryanofsky> jkczyz, your question about this is answered?

13:17

<raj_149> ryanofsky: i couldn't follow the rescan part. where the rescan logic is triggered by sethdseed?

13:18

<ryanofsky> raj_149, normally when sethdseed is called you are just creating a brand new hdseed, so no rescan is required

13:19

<achow101> sethdseed doesn't rescan. you would rescan separately afterwards if you have/want to

13:19

<ryanofsky> if you are setting an pre-existing seed that may have transactions associated, you have to rescan manually with a separate rpc call

13:19

<raj_149> oh i see.

13:19

<ryanofsky> it might have made sense (or might make sense) to add a rescan option to sethdseed or key birthday option like we have for other imports

13:19

<ryanofsky> but that kind of leads to next question

13:20

<ryanofsky> 4. Why was the sethdseed RPC added? Are there uses for having multiple HD seeds in the same wallet? Does this change to sethdseed affect new descriptor wallets as well as existing wallets?

13:20

<troygiorshev> the fact that it can still happen after IDB is even more motivation for this PR. Part of the point of HD wallets was that you can have a "master" wallet that can spend funds and then put a receive-only wallet on a vulnerable machine. But if you change the hd seed on the master and the receive-only receives a lot of new transactions (and generates more than 1000 new addresses) then

13:20

<troygiorshev> the master will lose them

13:20

<troygiorshev> ah sorry a little late there, did I get that right?

13:21

<raj_149> troygiorshev: that seems right as the topop wont happen for old chains currently.

13:21

<ryanofsky> troygiorshev, that makes sense in principle, but in practice we don't support a receive only wallet part that can top up, because we use hardened keys

13:22

<troygiorshev> ryanofsky: ah ok

13:22

<raj_149> ryanofsky: recieve only wallet part can remian is some other system like a merchant website.

13:22

<ryanofsky> raj_149, oh good, point actually that does seem like a good motivation then

13:23

<jonatack> As troygiorshev's and jnewbery's examples show, it's not a great idea, as a user, to reduce the gap limit.

13:23

<ryanofsky> or wait, isn't that precluded because of the hardened keys?

13:23

<jnewbery> achow101: for descriptor wallets do we still only use hardened keys, or is it possible to use unhardened keys too?

13:24

<achow101> jnewbery: descriptor wallets will use unhardened derivation

13:24

<jkczyz> ryanofsky: yes, I believe so. My main qualm is the reason that "we no longer need to wait for IBD to finish before sethdseed can work" is not given

13:24

<ryanofsky> (question 4 is about future of sethdseed and descriptor wallets)

13:25

<jnewbery> if anyone is unfamiliar with hardened/unhardened key derivation, it's defined in BIP32 here: https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki#child-key-derivation-ckd-functions

100

13:25

<jonatack> achow101: by default we only make descriptors that use unhardened derivation, yes?

101

13:25

<ryanofsky> jkczyz, yeah it think that may be phrased as a binary statement when it was always a matter of precaution. but with this change there is no longer any benefit to disabling sethdseed during ibd

102

13:26

<achow101> jonatack: yes. descriptor wallets use bip44/49/84

103

13:26

<jnewbery> specifically of note is that for hardened keys it's not possible to derive a child public key from a parent public key: https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki#public-parent-key--public-child-key

104

13:26

<ryanofsky> because after this change, no matter when sethdseed is call, during ibd or before, behavior for the previous hdseed will be the same

105

13:26

<raj_149> ryanofsky: that seems correct. Corollary question, what one needs to do to have a recieve only wallet in a merchant site and still get it tracked by core?

106

13:27

<achow101> raj_149: you have to export tons of addresses and constantly do that as addresses get used

107

13:28

<jnewbery> achow101: why wouldn't you be able to use the xpub for an unhardened descriptor?

108

13:29

<jonatack> raj_149: i think it will finally become possible to export an xpub to, say, btcpayserver from a bitcoin core wallet, with descriptor wallets

109

13:29

<raj_149> jonatack: is there any specific reason why core never creates unhardened keys and only hardened ones?

110

13:30

<achow101> jnewbery: exports aren't implemented for descriptors yet. but once that's done, yes, you can use the xpub

111

13:30

<jonatack> raj_149: see https://bitcoin.stackexchange.com/questions/90135/how-do-i-export-an-xpub-from-bitcoin-core-for-use-in-btcpayserver

112

13:30

<achow101> but also sethdseed is disabled for descriptor wallets

113

13:31

<raj_149> jonatack: thanks, answers it.

114

13:31

<ryanofsky> just to wrap up question 4, I couldn't think of real uses cases for sethdseed anymore, and didn't check but suspect it was just added before there was multiwallet support

115

13:31

<jonatack> raj_149: for security reasons, bitcoin core legacy wallets don't support xpub key derivation because if an attacker knows the private key of any of the child keys and the xpub, he can compute the private key of all child keys

116

13:31

<jonatack> right

117

13:32

<achow101> ryanofsky: it's original purpose was for the upgrade from non-HD to HD, but that got cut at some point

118

13:32

<ryanofsky> anybody want to summarize purpose of CHDChain, CKeyMetadata, and KeyOriginInfo for question 5?

119

13:33

<ryanofsky> achow101, oh that makes sense. and of course it makes a lot of sense this is dropped for descriptor wallets

120

13:34

<raj_149> CHDChain: contans data to construct the full chain. CKeyMetadata: Contains metadata for a particular key KeyOriginInfo: contains origin and path to get to this key.

121

13:35

<ryanofsky> yeah important fields in CHDChain for this are nExternalChainCounter and nInternalChainCounter

122

13:35

<jonatack> if helpful to anyone, achow101 wrote a really helpful doc lately about descriptor wallets: https://gist.github.com/achow101/94d889715afd49181f8efdca1f9faa25

123

13:35

<jonatack> achow101: hope it wasn't a secret doc :p

124

13:35

<raj_149> ryanofsky: my idea is they just contains the max used index for the chain. is that correct?

125

13:36

<ryanofsky> and the key indices in KeyOriginInfo::path field are used to derive these on startup

126

13:36

<raj_149> jonatack: cant help anymore if it was, already starred it. :P

127

13:36

<ryanofsky> raj_149, right they're the max indices that summarize the state of the keypool

128

13:37

<ryanofsky> question 6 is about differences between the new TopUpInactiveHDChain method and existing TopUp method, anything notable there?

129

13:37

<ryanofsky> https://github.com/ryanofsky/bitcoin/blob/review.17681.5/src/wallet/scriptpubkeyman.cpp#L293 vs https://github.com/ryanofsky/bitcoin/blob/review.17681.5/src/wallet/scriptpubkeyman.cpp#L1194

130

13:38

<ryanofsky> or opinions on whether it's good to have such similar methods or merge them?

131

13:38

<raj_149> ryanofsky: the new one has the bool internal parameter, so tops up only a single chain.

132

13:39

<jnewbery> ryanofsky: The call to AddKeypoolPubkeyWithDB() has been removed from TopUpInactiveHDChain() in the latest version

133

13:40

<jonatack> jnewbery: that change thanks to recent review :D

134

13:40

<ryanofsky> raj_149, that's interesting, yeah i guess it gets to a basic difference that TopUpInactive is told specifically what to top up, while TopUp just figures it out itself

135

13:41

<ryanofsky> jnewbery, oh actually forgot I was pasting links to an old version

136

13:41

<raj_149> ryanofsky: if TopUp can have some way to know the chain in context, cant it handle the inactive topup part also?

137

13:42

<ryanofsky> raj_149, yes to merge the functions in an efficient way TopUp would have to be passed more information

138

13:42

<jkczyz> Maybe pass kp_size and chain to a common helper? Haven't looked in detail if it is possible

139

13:42

<achow101> iirc there was an iteration (before the pr was opened) where TopUp was being used for both, but it ended a bit too messy

140

13:43

<ryanofsky> yeah, point of the question was just get people looking at these functions, and see how new code compares to existing

141

13:43

<ryanofsky> if these are more or less clear we can go on to AddInactiveHDChain in question 7

142

13:43

<ryanofsky> 7. When is AddInactiveHDChain called? Where do the inactive HD chain objects come from? Is AddInactiveHDChain called everywhere it needs to be called?

143

13:44

<ryanofsky> there's a twist to this question in that I think there's a bug in the current version of the pr related to this

144

13:45

<jonatack> Good question. Only called from LoadWallet atm.

145

13:46

<jnewbery> ryanofsky: should this be called from sethdseed too?

146

13:46

<ryanofsky> jnewbery, yes that's what i was thinking at least

147

13:48

<jnewbery> I think you're right

148

13:48

<ryanofsky> is it clear to people what AddInactiveHDChain does and when it should be called? anyone want to summarize?

149

13:49

<raj_149> ryanofsky: it adds a chain to m_inactive_hd_chains. But if sethdchain requires also m_inactive_hd_chains to be updated and its not right now. How the test is passing?

150

13:51

<fjahr> the test does us loadwallet to check persistence

151

13:51

<ryanofsky> right the test isn't just calling sethdseed, it's also loading & unloading which could mask the bug

152

13:52

<ryanofsky> maybe skip through some other questions, unless people have thoughts / opinions on them

153

13:52

<jnewbery> right. The test doesn't do any operations or checking between sethdseed and unload/reload wallet

154

13:53

<ryanofsky> one question that might be interesting is 10: A previous version of this PR had a subtle bug on this line. What was the bug and what were the effects?

155

13:53

<ryanofsky> previous pr version with bug: https://github.com/ryanofsky/bitcoin/commits/review.17681.4

156

13:53

<ryanofsky> previous line of code with the bug: https://github.com/ryanofsky/bitcoin/blob/review.17681.4/src/wallet/walletdb.cpp#L446

157

13:54

<raj_149> ryanofsky: it was a bit wise and instead of or. Resulting into nothing happening. Is that correct?

158

13:54

<fjahr> yes but I thing only internal keys would not be extended because of this bug

159

13:55

<jonatack> raj_149: yes. a good catch by ryanofsky. i missed it during my first review.

160

13:55

<ryanofsky> raj_149 & fjahr that's right, shows why it's good to look carefully at arithmetic operations, and new code cleans this up quite a bit

161

13:55

<fjahr> test fails if the two lines with unload and load are commented out btw

162

13:56

<fjahr> *the functional test I mean, going back to previous question

163

13:56

<raj_149> can anyone give a one liner on what happens at loading and unloading of wallet?

164

13:57

<jnewbery> raj_149: there's a lot going on during load/unload. I don't think it's possible to summarize in one line

165

13:58

<ryanofsky> raj_149, i'd say the important thing happening is that m_inactive_hd_chains is populated on loading, by looping over all the keys and finding the max internal/external indices

166

13:59

<ryanofsky> maybe people have thoughts in general on the code / pr

167

13:59

<ryanofsky> my thoughts are that i'm glad this stuff is going away with descriptor wallets :)

168

13:59

<jnewbery> ryanofsky: +1

169

14:00

<raj_149> ryanofsky: i need to understand descriptor wallet better. But overall the motivation made sense. The wallets need to handle old chains sensibly.

170

14:00

<jnewbery> ok, that's time. Thanks for hosting ryanofsky!

171

14:00

<fjahr> Thanks ryanofsky, great job!

172

14:00

<jnewbery> if anyone wants to host in the coming weeks, please message me

173

14:00

<ryanofsky> Thanks everyone!

174

14:00

<jonatack> thanks ryanofsky!

175

14:00

<andrewtoth> thanks ryanofsky!

176

14:01

<troygiorshev> thanks ryanofsky!

177

14:01

<theStack> thanks ryanofsky

178

14:01

<michaelfolkson> Thanks!

179

14:01

<raj_149> thanks everybody. It was a great session. For the first time it felt like i understood enough to have meaningful conversation. thats motivating. :)