Batch write imported stuff in importmulti (wallet)

May 29, 2019

https://github.com/bitcoin/bitcoin/pull/15741

Host: ariard - PR author: achow101

Meeting Log

10:58

<jnewbery> Reminder that this week's meeting starts in two hours. We'll be looking at #15741

10:58

<jnewbery> https://bitcoin-core-review-club.github.io/

10:59

<jnewbery> ariard is going to host the meeting this week

11:11

<hrofu> first PR review for me, should be fun

12:07

<jnewbery> hrofu: welcome!

12:33

<jonatack_> anyone want to share useful bash one-liners for reviewing?

12:33

<jonatack_> here's one i use for compiling, though it could be smarter:

12:34

<jonatack_> ./autogen.sh ; export BDB_PREFIX='<path-to-/db4>' ; ./configure BDB_LIBS="-L${BDB_PREFIX}/lib -ldb_cxx-4.8" BDB_CFLAGS="-I${BDB_PREFIX}/include" --enable-lcov --enable-gprof ; compiledb make -j"$(($(nproc)+1))"

12:35

<jonatack_> set as a bash alias

12:35

<jonatack_> for linux

12:35

<pinheadmz> in general, when reviewing a new PR, what are some of your strategies? read first? download first? I notice this PR doesnt have tests. I usually liek to start by reading tests because they are just simpler to understadn

12:37

<jonatack_> i git pull, open the pr branch, launch the compilation, and while that's running open it up and read it in local dev

12:43

<jonatack_> if anyone is looking for a recent PR to review that's not too long, there's https://github.com/bitcoin/bitcoin/pull/15996 that could use review

12:44

<jb55> looks like #15741 was merged...

12:45

<jonatack_> there has been progress on the last two PR review club PRs too...

12:45

<jonatack_> https://github.com/bitcoin/bitcoin/pull/15450 is active again

12:46

<jonatack_> and https://github.com/bitcoin/bitcoin/pull/15834

12:54

<jonatack_> pinheadmz: harding posted a good writeup on his review process during the first meeting https://gist.github.com/harding/168f82e7986a1befb0e785957fb600dd

12:55

<pinheadmz> jonatack_: tnx!

12:55

<jonatack_> pinheadmz: and my messy WIP beginner study notes are here: https://github.com/jonatack/bitcoin-development/blob/master/how-to-review-bitcoin-core-prs.txt

12:56

<jnewbery> let's discuss review/test techniques during the meeting, so people don't miss out

12:56

<pinheadmz> ah great resource too, thanks again

12:56

<jonatack_> jnewbery: sure. didn't want to derail.

12:56

<jnewbery> it's definitely on-topic!

12:57

<jnewbery> we can split the meeting today - spend some time on the PR and some on more general review techniques

12:57

<jonatack_> nice

12:57

<jnewbery> A couple of you have asked for more general advice

12:58

<ariard> yes, would be great to have more feedback on review techniques!

12:58

<sosthene> Hi all

13:00

<jnewbery> hi!

13:00

<kanzure> hi

13:00

<lightlike> hello

13:00

<ariard> hi!

13:00

<b10c> Hi!

13:01

<merehap> Hi!

13:01

<fanquake> hi

13:01

<amiti> hi

13:01

<jonatack_> hi!

13:01

<achow101> heyo

13:01

<ariard> jnewbery: starting?

13:01

<dmkathayat_> Hi!

13:01

* jb55 waves

13:01

<michaelfolkson> Hey everyone

13:02

<jnewbery> ok, we'll talk about a couple of items today. One is PR #15741, led by ariard. I see we have the PR author here. Then we'll cover general review tips and techniques

13:02

<handcart> https://github.com/bitcoin/bitcoin/pull/15741

13:02

<jnewbery> let's start with the PR. You've got the floor, ariard!

13:02

<ariard> ok so 15741 was about a performance improvement on importmulti in case of many items

13:03

<ariard> on the process, I tried first to identify which kind of PR it was

13:03

<ariard> doc, code style, buf fix, new feature, test addition

13:03

<handcart> https://bitcoincore.org/en/doc/0.16.0/rpc/wallet/importmulti/

13:04

<ariard> because IMO knowing this fact is going to guide how you read commits first time, how much time you will need for review and which kind of tests needed

13:04

<ariard> then I asked to myself if scope was well-defined (because sometimes it's a refactor but in fact cover a wallet change)

13:05

<ariard> after that I read each commit a first time (just to have a general idea of how changes related to each other)

13:05

<ariard> read again each commit, check if function match their comment description, if there is new data structure, how to reduce complexity of them, ...

13:06

<ariard> if it's user facing commit, have an idea and how to test it manually

13:06

<ariard> then on the PR content, the problem solved there is how to optimize disk access while importing item

13:07

<ariard> Basically, if I get it well, core wallet has in-memory berkeley database, and before to PR, it was dumping them on-disk when object reference was out-of-scope

13:08

<ariard> IMO, I think it wasn't that much a problem before introduction of ranged descriptors where each of them, once expanded is gonna be a new item

13:08

<ariard> Thanks to PR, on-disk writes are now batched every 1000 database write

13:09

<handcart> "importing 2x 1000 keys seems about 6 times faster."

13:09

<ariard> Wallet main class are WalletBatch (an abstract wrapper), BerkeleyBatch and BerkeleyDatabase on top of Berkekeley Database API

13:10

<jb55> my 10000 key import went from 8 mintues to 3 seconds xD

13:10

<jnewbery> description of the wallet database classes is here: https://github.com/bitcoin/bitcoin/blob/c7cfd20a77ce57d200b3b9e5e0dfb0d63818abdc/src/wallet/walletdb.h#L21

13:10

<achow101> it was a problem if you wanted to import several thousand individual items with importmulti. just that making that command is annoying so no one actually tried. with ranged descriptors, importing such a large number is very easy

13:10

<ariard> Main modifications where in method WriteIC of WalletBatch to Flush every 1000

13:11

<ariard> I agree, and that why I only tested with ranged descriptors, having a thousand individual items if there is that much users with this number of items

13:11

<ariard> (maybe large exchanges)

13:12

<lightlike> do you know a good resource to learn more about ranged descriptors? I didn't find anything good by a quick google search.

13:12

<ariard> what else ? some memory exhaustion where removed from UpgradeKeyMetadata as there is no more a risk to have too large data structure in memory

13:12

<jnewbery> In my experience, exchanges don't use the Bitcoin core wallet

13:12

<jb55> this is a big deal now because you can dump output descriptors with HWI to track your hw wallet key balances without third party vendors. before it was too annoying because importmulti was so slow

13:12

<jb55> https://github.com/bitcoin/bitcoin/blob/master/doc/descriptors.md

13:12

<jnewbery> lightlike: https://github.com/bitcoin/bitcoin/blob/c7cfd20a77ce57d200b3b9e5e0dfb0d63818abdc/doc/descriptors.md

13:12

<jnewbery> thanks jb55. Beat me to it!

13:12

<jb55> sorry :D

13:13

<lightlike> jb55, jnewbery: thanks!

13:13

<ariard> given that PR was already reviewed by a lot of people, I only manually test importmulti with ranged descriptors on my laptop to see performance diff between PR and master

13:13

<achow101> ariard: The UpgradeKeyMetadata change was to unify the usage of batching. there shouldn't be any memory exhaustion issues there

13:13

<ariard> and that kinda of all I ave to say!

13:13

<jnewbery> thanks for the summary ariard

13:14

<jnewbery> I had a question. How should we test this? Is there any way to add automated tests?

13:14

<ariard> achow101: ah and the comment was something like "avoid creating overlarge in-memory batches" what is the issue with having too large in-memory batches ?

13:15

<achow101> ariard: that was the 1000 item limit which was kept.

13:16

<achow101> jnewbery: since this is a performance change, I don't think there really is a way to add automated tests. there isn't necessarily a "correct" or "incorrect" to test for

13:16

<jb55> yeah I was about to type the same thing, more of a benchmark suite thing right?

13:16

<ariard> achow101: okay do you try with a different cap like 100 or 500 ? (thinking that's easy to test too I should have)

13:16

<ariard> *did

13:17

<michaelfolkson> Also some basic questions. Why would you import so many pubkeys? Can't you just import a HD seed?

13:17

<achow101> ariard: no, but I did try larger numbers. I think it hit the point of diminishing returns very quickly. I tried 10000 and there was basically no difference

13:18

<jnewbery> achow101: yes, I agree it's difficult to automate any testing for performance

13:18

<achow101> michaelfolkson: you can't import xpubs and BIP 39 seeds to Bitcoin Core. importing a seed is also not possible for watch only wallets

13:18

<achow101> this is effectively importing an xpub

13:19

<jnewbery> We do have benchmarks, but they're more microbenchmarks (eg how fast does this sha function run?)

13:19

<jnewbery> There are a few notes on profiling bitcoind in https://github.com/bitcoin/bitcoin/blob/c7cfd20a77ce57d200b3b9e5e0dfb0d63818abdc/test/README.md and https://github.com/bitcoin/bitcoin/blob/c7cfd20a77ce57d200b3b9e5e0dfb0d63818abdc/test/functional/README.md

13:20

<jnewbery> so if you want to test performance changes manually, you can use perf and get a profile of where cpu time is being spent

13:21

<achow101> the way I tested is listed at the top of the PR. I used the time command and just timed the same importmulti command on the master branch and on the pr branch

13:23

<jonatack_> achow101: wdyt, would testing that importmulti import writes are happening in batches rather than atomically be feasible and useful?

13:23

<jnewbery> That's a really good habit to get into when writing PRs: giving reviewers tips on how to review and test the changes

13:24

<jonatack_> yes, "how to review this" is super helpful

100

13:24

<sosthene> I'm still not sure about the use case for importmulti sorry, it is like importing an xpub to make a watch-only wallet? Are there other situations we need to import so many keys?

101

13:24

<lightlike> I was wondering whether there is a way to test that there was no functional impact of the PR, and it is just faster. Naively, like comparing wallets created by the import before and after.

102

13:24

<achow101> jonatack_: I haven't the faintest idea how you would even check for that

103

13:24

<michaelfolkson> Cool, thanks <achow101>. And before batching pubkeys was introduced, they were written to the database individually. That means each one was flushed to disk individually? So next to no memory requirements and the database being closed and reopened each for each write. Have I got it right?

104

13:25

<achow101> michaelfolkson: yeah, pretty mush. it's also a lot more than just pubkeys. Scripts, metadata, etc.

105

13:26

<achow101> lightlike: test/functional/wallet_importmulti.py are the tests for importmulti itself and they're pretty comprehensive.

106

13:27

<achow101> sosthene: yes, it is largely used for importing things for watch only wallets. for me, I'm using it with the HWI tool in order to use my hardware wallet with Bitcoin Core

107

13:27

<jb55> ^

108

13:28

<jnewbery> lightlike: the wallet files are bdb, so if you want to compare contents, you'll need a tool to open them

109

13:28

<jb55> so you can imagine now with a gui tool + HWI users could use trezor directly with their full node. last thing we need is hw wallet PSBT support for spending

110

13:29

<jnewbery> I have https://github.com/jnewbery/bitcointools which can crack open a wallet.dat file. It's probably not complete (PRs welcome!)

111

13:29

<jb55> something like: hey core I want to spend this watch-only output, give me a PSBT. I then give that PSBT to my hw wallet for signing.

112

13:29

<lightlike> jnewbery: yes i know, i tried the dumpwallet rpc but that didnt really work well.

113

13:29

<jnewbery> so you could run wallet_importmulti.py --nocleanup before the changes, copy the wallet.dat files, then do the same after the changes and compare the wallet.dat files

114

13:30

<achow101> my goto has been using the db_dump utility provided by bdb and just dumping and then diff'ing the output of the wallet files. probably not the best method though...

115

13:30

<jnewbery> dumpwallet might be easier - you could add a dumpwallet call to the end of the test

116

13:30

<jnewbery> achow101: is ordering static with db_dump?

117

13:30

<jnewbery> (I've never used it)

118

13:31

<achow101> dunno. the order might have been an artifact of using the same import command though

119

13:32

<michaelfolkson> It was commented (Sjors) that this moves complexity out of the RPC codebase. Why does this PR simplify the RPC codebase?

120

13:32

<amiti> I have a question about what happened with this conversation thread: https://github.com/bitcoin/bitcoin/pull/15741#discussion_r272755099

121

13:32

<michaelfolkson> Thanks for those Bitcoin tools <jnewbery> I wasn't aware of that.

122

13:32

<amiti> Was a specific reason to force the write identified? It seems like the merged code didn’t include it. What was the final thought process here?

123

13:33

<achow101> michaelfolkson: part of the pr was moving a bunch of the import stuff from importmulti's handler into the CWallet class. so imports are handled by the wallet and not the rpc. the batching is also handled by WalletBatch instead of by the caller (e.g. rpc) as was done before

124

13:35

<jnewbery> Generally speaking we want to the RPC code to be as thin a layer as possible. bitcoind has other interfaces (GUI, REST, ZMQ), so the less code is in the interface modules, the more can be reused

125

13:36

<jonatack_> achow101 wrote "As soon as a batch goes out of scope, every single update is committed to the file. If you make 1500 updates, the first 1000 will be written after you do the 1000th write operation, and then the last 500 will be written when the batch object is deleted."

126

13:36

<jonatack_> maybe something there could be tested idk

127

13:36

<jonatack_> (not to be critical, this is a great improvement)

128

13:37

<achow101> amiti: in the line after, SetWalletFlag is called which itself does a write to the database. however we only want to do that after we are sure that all of the keymetadata has been upgraded so we force a write to ensure that the updated metadata is written before setting that flag

129

13:39

<michaelfolkson> <jonatack_> You mean write a test to ensure that items are actually flushed to disk after 1000 writes?

130

13:41

<jonatack_> TBH i didn't review this PR properly so nvm if it's unhelpful

131

13:42

<michaelfolkson> I struggle to draw the line between writing a test where it is obvious from the code that it does what it should and deciding commented code is sufficient with no test

132

13:43

<jnewbery> achow101: > in the line after, SetWalletFlag is called...

133

13:44

<jnewbery> what line? Can you give a link? I'm struggling to follow (since the original thread was on a commit that doesn't exist anymore)

134

13:44

<achow101> jnewbery: https://github.com/bitcoin/bitcoin/blob/master/src/wallet/wallet.cpp#L398

135

13:44

<jnewbery> thanks!

136

13:45

<achow101> michaelfolkson: generally, if it is testable, then there should be a test. even if the code is obviously correct, that could change or something else could change which cauess functionality to break

137

13:45

<jnewbery> ok, I expect there are more questions, but to make sure we don't overrun, let's move on to more general tips for test/review.

138

13:45

<achow101> and comments go out of date which can be a problem

139

13:45

<jnewbery> Thanks for presenting, ariard, and thanks for being here to answer questions, achow101!

140

13:46

<jonatack_> * clapping *

141

13:46

<achow101> np

142

13:46

<michaelfolkson> Cool, thanks guys

143

13:46

<sosthene> thanks!

144

13:46

<jnewbery> If you have more questions about this or the wallet in general, feel free to ask in #bitcoin-core-dev

145

13:46

<jnewbery> *about this PR or the wallet in general

146

13:47

<jnewbery> ok, onto general tips/techniques. I can give an overview of my workflow, or we can go straight into questions

147

13:47

<jnewbery> whichever you all prefer

148

13:47

<jb55> sure let's hear your workflow

149

13:47

<michaelfolkson> I vote overview

150

13:47

<sosthene> agree

151

13:47

<jonatack_> yes

152

13:48

<ariard> yes overview

153

13:48

<jnewbery> ok, well first off, I'll always download the PR branch to my machine so I can build and review locally. I don't use the github webpage to review, just to leave comments

154

13:49

<jnewbery> I've got a short script that checks out the PR branch and queries the github API to add a comment to that branch locally

155

13:49

<jnewbery> that just makes it easier when I have a bunch of PRs checked out locally that I can run a `git branch` command and see what they are

156

13:50

<jnewbery> once I have the branch locally, I'll set off a build in a VM while I look through the changes

157

13:50

<jnewbery> first, I run something like `git log --oneline upstream/master..`

158

13:50

<jnewbery> that gives me a list of all the commits in the PR branch, one per line

159

13:51

<jnewbery> and then I use a one-liner:

160

13:51

<jnewbery> for commit in `git log master..HEAD --oneline | cut -d' ' -f1 | tac`; do git log -1 $commit; git difftool ${commit}{^,} --dir-diff; done

161

13:51

<jnewbery> which I have saved as git-review

162

13:52

<jnewbery> that steps through the commits one-by-one, printing the commit log to the console then opening my difftool program

163

13:52

<jnewbery> I'll look at the diff, and when I quit the difftool program, git-review will step forward to the next commit

164

13:52

<jnewbery> First run through, I'll just skim everything, reading the commit logs and looking at the overall changes, so I get an idea of what the PR is doing

165

13:53

<jnewbery> Then I'll go through again, but look at each commit in more detail, reviewing every line in detail

166

13:53

<jnewbery> hmm, what else?

167

13:54

<jnewbery> I make extensive use of the functional tests by adding `import pdb; pdb.set_trace()` break points, and then manually running RPC commands on the nodes under test

168

13:54

<jb55> jnewbery: do you pull the PR description from the api or just use the website?

169

13:54

<jnewbery> I pull the PR description from the github API and then use it to label the branch

170

13:54

<jb55> label in what sense?

171

13:54

<jnewbery> Here's what my `git branch` output looks like:

172

13:54

<jnewbery> → gb

173

13:55

<jnewbery> master fe47ae168 upstream/master --

174

13:55

<jnewbery> pr10102 3440513a4 -- [ryanofsky] [experimental] Multiprocess bitcoin - https://github.com/bitcoin/bitcoin/pull/10102

175

13:55

<jnewbery> pr10823 03fa5a1b4 -- [greenaddress] Allow all mempool txs to be replaced after a configurable timeout (default 6h) - https://github.com/bitcoin/bitcoin/pull/10823

176

13:55

<jnewbery> pr12360 ab740a047 -- [jnewbery] Bury bip9 deployments - https://github.com/bitcoin/bitcoin/pull/12360

177

13:55

<jnewbery> pr13756 66f3e9780 -- [kallewoof] wallet: "avoid_reuse" wallet flag for improved privacy - https://github.com/bitcoin/bitcoin/pull/13756

178

13:55

<jnewbery> ...

179

13:55

<jnewbery> There's an attribute of a git branch called `description` that you can update manually

180

13:55

<jb55> whoa

181

13:55

<jb55> I did not know that

182

13:56

<fanquake> jnewbery what OS VM are you building in? using depends?

183

13:56

<jb55> I personally use https://gist.github.com/piscisaureus/3342247 + git checkout -b pr${PR} refs/pull/origin/${PR}

184

13:56

<fanquake> Running make check and the functional (--extended?) test after compiling?

185

13:56

<jnewbery> my `gb` doesn't map onto `git branch` exactly. I do some formatting on the output so it includes the description

186

13:57

<jnewbery> fanquake: I'm using vagrant, just the standard ubuntu bionic image

187

13:57

<jnewbery> I have a vagrant file that installs all the dependencies. I can share that if people want it

188

13:58

<michaelfolkson> Yes please

189

13:58

<jnewbery> Yes, I'll run make check and the functional tests. Not usually extended because I'm too impatient

190

13:59

<michaelfolkson> You're running tests using Travis

191

13:59

<michaelfolkson> ?

192

13:59

<jnewbery> I, like most developers, am pretty embarrassed about my workflow. It's just a bunch of stuff that's built up over the years and I'm sure there's plenty of space for optimizations

193

13:59

<jnewbery> Yes, I have travis set up on jnewbery/bitcoin, so whenever I push a branch to my github repo it'll get built in travis

194

14:00

<jnewbery> Although there's not much reason to look at that for reviewing, since the branch is already built in bitcoin/bitcoin travis

195

14:00

<jnewbery> Here's my vagrant config: https://github.com/jnewbery/btc-dev YMMV

196

14:01

<jnewbery> That's time. Sorry for the monologue!

197

14:01

<jnewbery> We can do more general techniques next week

198

14:01

<michaelfolkson> That was great, thank you. I'll look over this and may have some questions next week

199

14:01

<jonatack__> Thank you!

200

14:01

<jnewbery> I'm still looking for suggestions for future PRs to cover and volunteers to lead discussion

201

14:02

<merehap> Thanks!

202

14:02

<jonatack__> ariard: great job

203

14:02

<jnewbery> Thanks everyone, and thanks again ariard and achow101!

204

14:02

<sosthene> Thanks, I need to digest all the links sent today!

205

14:02

<b10c> Thanks!

206

14:02

<lightlike> thanks!

207

14:02

<jonatack__> achow101: thank you for coming

208

14:02

<michaelfolkson> <sosthene> Me too lol

209

14:03

<ariard> you're welcome, thanks for workflow jnewbery, a assumeutxo PR for a coming week ?

210

14:08

<jnewbery> ariard: the next assumeutxo PR is https://github.com/bitcoin/bitcoin/pull/15976 . It's a refactor but it touches consensus/validation code, so it needs careful review

211

14:09

<jnewbery> There's another good resoure on profiling here: https://github.com/bitcoin/bitcoin/pull/12649/files#diff-d47c2f6882badae58f3881a6ce0a430cR89

212

14:59

<jnewbery> I've put my git-pr and git-br tools up at https://gist.github.com/jnewbery/1d45a6b5e14b3f3fefe5942d0cc2608d. YMMV, no representations or warranties, expressed or implied, etc, etc