Batch write imported stuff in importmulti (wallet)

Host: ariard  -  PR author: achow101

Meeting Log

  110:58 <jnewbery> Reminder that this week's meeting starts in two hours. We'll be looking at #15741
  210:58 <jnewbery>
  310:59 <jnewbery> ariard is going to host the meeting this week
  411:11 <hrofu> first PR review for me, should be fun
  512:07 <jnewbery> hrofu: welcome!
  612:33 <jonatack_> anyone want to share useful bash one-liners for reviewing?
  712:33 <jonatack_> here's one i use for compiling, though it could be smarter:
  812:34 <jonatack_> ./ ; export BDB_PREFIX='<path-to-/db4>' ; ./configure BDB_LIBS="-L${BDB_PREFIX}/lib -ldb_cxx-4.8" BDB_CFLAGS="-I${BDB_PREFIX}/include" --enable-lcov --enable-gprof ; compiledb make -j"$(($(nproc)+1))"
  912:35 <jonatack_> set as a bash alias
 1012:35 <jonatack_> for linux
 1112:35 <pinheadmz> in general, when reviewing a new PR, what are some of your strategies? read first? download first? I notice this PR doesnt have tests. I usually liek to start by reading tests because they are just simpler to understadn
 1212:37 <jonatack_> i git pull, open the pr branch, launch the compilation, and while that's running open it up and read it in local dev
 1312:43 <jonatack_> if anyone is looking for a recent PR to review that's not too long, there's that could use review
 1412:44 <jb55> looks like #15741 was merged...
 1512:45 <jonatack_> there has been progress on the last two PR review club PRs too...
 1612:45 <jonatack_> is active again
 1712:46 <jonatack_> and
 1812:54 <jonatack_> pinheadmz: harding posted a good writeup on his review process during the first meeting
 1912:55 <pinheadmz> jonatack_: tnx!
 2012:55 <jonatack_> pinheadmz: and my messy WIP beginner study notes are here:
 2112:56 <jnewbery> let's discuss review/test techniques during the meeting, so people don't miss out
 2212:56 <pinheadmz> ah great resource too, thanks again
 2312:56 <jonatack_> jnewbery: sure. didn't want to derail.
 2412:56 <jnewbery> it's definitely on-topic!
 2512:57 <jnewbery> we can split the meeting today - spend some time on the PR and some on more general review techniques
 2612:57 <jonatack_> nice
 2712:57 <jnewbery> A couple of you have asked for more general advice
 2812:58 <ariard> yes, would be great to have more feedback on review techniques!
 2912:58 <sosthene> Hi all
 3013:00 <jnewbery> hi!
 3113:00 <kanzure> hi
 3213:00 <lightlike> hello
 3313:00 <ariard> hi!
 3413:00 <b10c> Hi!
 3513:01 <merehap> Hi!
 3613:01 <fanquake> hi
 3713:01 <amiti> hi
 3813:01 <jonatack_> hi!
 3913:01 <achow101> heyo
 4013:01 <ariard> jnewbery: starting?
 4113:01 <dmkathayat_> Hi!
 4213:01 * jb55 waves
 4313:01 <michaelfolkson> Hey everyone
 4413:02 <jnewbery> ok, we'll talk about a couple of items today. One is PR #15741, led by ariard. I see we have the PR author here. Then we'll cover general review tips and techniques
 4513:02 <handcart>
 4613:02 <jnewbery> let's start with the PR. You've got the floor, ariard!
 4713:02 <ariard> ok so 15741 was about a performance improvement on importmulti in case of many items
 4813:03 <ariard> on the process, I tried first to identify which kind of PR it was
 4913:03 <ariard> doc, code style, buf fix, new feature, test addition
 5013:03 <handcart>
 5113:04 <ariard> because IMO knowing this fact is going to guide how you read commits first time, how much time you will need for review and which kind of tests needed
 5213:04 <ariard> then I asked to myself if scope was well-defined (because sometimes it's a refactor but in fact cover a wallet change)
 5313:05 <ariard> after that I read each commit a first time (just to have a general idea of how changes related to each other)
 5413:05 <ariard> read again each commit, check if function match their comment description, if there is new data structure, how to reduce complexity of them, ...
 5513:06 <ariard> if it's user facing commit, have an idea and how to test it manually
 5613:06 <ariard> then on the PR content, the problem solved there is how to optimize disk access while importing item
 5713:07 <ariard> Basically, if I get it well, core wallet has in-memory berkeley database, and before to PR, it was dumping them on-disk when object reference was out-of-scope
 5813:08 <ariard> IMO, I think it wasn't that much a problem before introduction of ranged descriptors where each of them, once expanded is gonna be a new item
 5913:08 <ariard> Thanks to PR, on-disk writes are now batched every 1000 database write
 6013:09 <handcart> "importing 2x 1000 keys seems about 6 times faster."
 6113:09 <ariard> Wallet main class are WalletBatch (an abstract wrapper), BerkeleyBatch and BerkeleyDatabase on top of Berkekeley Database API
 6213:10 <jb55> my 10000 key import went from 8 mintues to 3 seconds xD
 6313:10 <jnewbery> description of the wallet database classes is here:
 6413:10 <achow101> it was a problem if you wanted to import several thousand individual items with importmulti. just that making that command is annoying so no one actually tried. with ranged descriptors, importing such a large number is very easy
 6513:10 <ariard> Main modifications where in method WriteIC of WalletBatch to Flush every 1000
 6613:11 <ariard> I agree, and that why I only tested with ranged descriptors, having a thousand individual items if there is that much users with this number of items
 6713:11 <ariard> (maybe large exchanges)
 6813:12 <lightlike> do you know a good resource to learn more about ranged descriptors? I didn't find anything good by a quick google search.
 6913:12 <ariard> what else ? some memory exhaustion where removed from UpgradeKeyMetadata as there is no more a risk to have too large data structure in memory
 7013:12 <jnewbery> In my experience, exchanges don't use the Bitcoin core wallet
 7113:12 <jb55> this is a big deal now because you can dump output descriptors with HWI to track your hw wallet key balances without third party vendors. before it was too annoying because importmulti was so slow
 7213:12 <jb55>
 7313:12 <jnewbery> lightlike:
 7413:12 <jnewbery> thanks jb55. Beat me to it!
 7513:12 <jb55> sorry :D
 7613:13 <lightlike> jb55, jnewbery: thanks!
 7713:13 <ariard> given that PR was already reviewed by a lot of people, I only manually test importmulti with ranged descriptors on my laptop to see performance diff between PR and master
 7813:13 <achow101> ariard: The UpgradeKeyMetadata change was to unify the usage of batching. there shouldn't be any memory exhaustion issues there
 7913:13 <ariard> and that kinda of all I ave to say!
 8013:13 <jnewbery> thanks for the summary ariard
 8113:14 <jnewbery> I had a question. How should we test this? Is there any way to add automated tests?
 8213:14 <ariard> achow101: ah and the comment was something like "avoid creating overlarge in-memory batches" what is the issue with having too large in-memory batches ?
 8313:15 <achow101> ariard: that was the 1000 item limit which was kept.
 8413:16 <achow101> jnewbery: since this is a performance change, I don't think there really is a way to add automated tests. there isn't necessarily a "correct" or "incorrect" to test for
 8513:16 <jb55> yeah I was about to type the same thing, more of a benchmark suite thing right?
 8613:16 <ariard> achow101: okay do you try with a different cap like 100 or 500 ? (thinking that's easy to test too I should have)
 8713:16 <ariard> *did
 8813:17 <michaelfolkson> Also some basic questions. Why would you import so many pubkeys? Can't you just import a HD seed?
 8913:17 <achow101> ariard: no, but I did try larger numbers. I think it hit the point of diminishing returns very quickly. I tried 10000 and there was basically no difference
 9013:18 <jnewbery> achow101: yes, I agree it's difficult to automate any testing for performance
 9113:18 <achow101> michaelfolkson: you can't import xpubs and BIP 39 seeds to Bitcoin Core. importing a seed is also not possible for watch only wallets
 9213:18 <achow101> this is effectively importing an xpub
 9313:19 <jnewbery> We do have benchmarks, but they're more microbenchmarks (eg how fast does this sha function run?)
 9413:19 <jnewbery> There are a few notes on profiling bitcoind in and
 9513:20 <jnewbery> so if you want to test performance changes manually, you can use perf and get a profile of where cpu time is being spent
 9613:21 <achow101> the way I tested is listed at the top of the PR. I used the time command and just timed the same importmulti command on the master branch and on the pr branch
 9713:23 <jonatack_> achow101: wdyt, would testing that importmulti import writes are happening in batches rather than atomically be feasible and useful?
 9813:23 <jnewbery> That's a really good habit to get into when writing PRs: giving reviewers tips on how to review and test the changes
 9913:24 <jonatack_> yes, "how to review this" is super helpful
10013:24 <sosthene> I'm still not sure about the use case for importmulti sorry, it is like importing an xpub to make a watch-only wallet? Are there other situations we need to import so many keys?
10113:24 <lightlike> I was wondering whether there is a way to test that there was no functional impact of the PR, and it is just faster. Naively, like comparing wallets created by the import before and after.
10213:24 <achow101> jonatack_: I haven't the faintest idea how you would even check for that
10313:24 <michaelfolkson> Cool, thanks <achow101>. And before batching pubkeys was introduced, they were written to the database individually. That means each one was flushed to disk individually? So next to no memory requirements and the database being closed and reopened each for each write. Have I got it right?
10413:25 <achow101> michaelfolkson: yeah, pretty mush. it's also a lot more than just pubkeys. Scripts, metadata, etc.
10513:26 <achow101> lightlike: test/functional/ are the tests for importmulti itself and they're pretty comprehensive.
10613:27 <achow101> sosthene: yes, it is largely used for importing things for watch only wallets. for me, I'm using it with the HWI tool in order to use my hardware wallet with Bitcoin Core
10713:27 <jb55> ^
10813:28 <jnewbery> lightlike: the wallet files are bdb, so if you want to compare contents, you'll need a tool to open them
10913:28 <jb55> so you can imagine now with a gui tool + HWI users could use trezor directly with their full node. last thing we need is hw wallet PSBT support for spending
11013:29 <jnewbery> I have which can crack open a wallet.dat file. It's probably not complete (PRs welcome!)
11113:29 <jb55> something like: hey core I want to spend this watch-only output, give me a PSBT. I then give that PSBT to my hw wallet for signing.
11213:29 <lightlike> jnewbery: yes i know, i tried the dumpwallet rpc but that didnt really work well.
11313:29 <jnewbery> so you could run --nocleanup before the changes, copy the wallet.dat files, then do the same after the changes and compare the wallet.dat files
11413:30 <achow101> my goto has been using the db_dump utility provided by bdb and just dumping and then diff'ing the output of the wallet files. probably not the best method though...
11513:30 <jnewbery> dumpwallet might be easier - you could add a dumpwallet call to the end of the test
11613:30 <jnewbery> achow101: is ordering static with db_dump?
11713:30 <jnewbery> (I've never used it)
11813:31 <achow101> dunno. the order might have been an artifact of using the same import command though
11913:32 <michaelfolkson> It was commented (Sjors) that this moves complexity out of the RPC codebase. Why does this PR simplify the RPC codebase?
12013:32 <amiti> I have a question about what happened with this conversation thread:
12113:32 <michaelfolkson> Thanks for those Bitcoin tools <jnewbery> I wasn't aware of that.
12213:32 <amiti> Was a specific reason to force the write identified? It seems like the merged code didn’t include it. What was the final thought process here?
12313:33 <achow101> michaelfolkson: part of the pr was moving a bunch of the import stuff from importmulti's handler into the CWallet class. so imports are handled by the wallet and not the rpc. the batching is also handled by WalletBatch instead of by the caller (e.g. rpc) as was done before
12413:35 <jnewbery> Generally speaking we want to the RPC code to be as thin a layer as possible. bitcoind has other interfaces (GUI, REST, ZMQ), so the less code is in the interface modules, the more can be reused
12513:36 <jonatack_> achow101 wrote "As soon as a batch goes out of scope, every single update is committed to the file. If you make 1500 updates, the first 1000 will be written after you do the 1000th write operation, and then the last 500 will be written when the batch object is deleted."
12613:36 <jonatack_> maybe something there could be tested idk
12713:36 <jonatack_> (not to be critical, this is a great improvement)
12813:37 <achow101> amiti: in the line after, SetWalletFlag is called which itself does a write to the database. however we only want to do that after we are sure that all of the keymetadata has been upgraded so we force a write to ensure that the updated metadata is written before setting that flag
12913:39 <michaelfolkson> <jonatack_> You mean write a test to ensure that items are actually flushed to disk after 1000 writes?
13013:41 <jonatack_> TBH i didn't review this PR properly so nvm if it's unhelpful
13113:42 <michaelfolkson> I struggle to draw the line between writing a test where it is obvious from the code that it does what it should and deciding commented code is sufficient with no test
13213:43 <jnewbery> achow101: > in the line after, SetWalletFlag is called...
13313:44 <jnewbery> what line? Can you give a link? I'm struggling to follow (since the original thread was on a commit that doesn't exist anymore)
13413:44 <achow101> jnewbery:
13513:44 <jnewbery> thanks!
13613:45 <achow101> michaelfolkson: generally, if it is testable, then there should be a test. even if the code is obviously correct, that could change or something else could change which cauess functionality to break
13713:45 <jnewbery> ok, I expect there are more questions, but to make sure we don't overrun, let's move on to more general tips for test/review.
13813:45 <achow101> and comments go out of date which can be a problem
13913:45 <jnewbery> Thanks for presenting, ariard, and thanks for being here to answer questions, achow101!
14013:46 <jonatack_> * clapping *
14113:46 <achow101> np
14213:46 <michaelfolkson> Cool, thanks guys
14313:46 <sosthene> thanks!
14413:46 <jnewbery> If you have more questions about this or the wallet in general, feel free to ask in #bitcoin-core-dev
14513:46 <jnewbery> *about this PR or the wallet in general
14613:47 <jnewbery> ok, onto general tips/techniques. I can give an overview of my workflow, or we can go straight into questions
14713:47 <jnewbery> whichever you all prefer
14813:47 <jb55> sure let's hear your workflow
14913:47 <michaelfolkson> I vote overview
15013:47 <sosthene> agree
15113:47 <jonatack_> yes
15213:48 <ariard> yes overview
15313:48 <jnewbery> ok, well first off, I'll always download the PR branch to my machine so I can build and review locally. I don't use the github webpage to review, just to leave comments
15413:49 <jnewbery> I've got a short script that checks out the PR branch and queries the github API to add a comment to that branch locally
15513:49 <jnewbery> that just makes it easier when I have a bunch of PRs checked out locally that I can run a `git branch` command and see what they are
15613:50 <jnewbery> once I have the branch locally, I'll set off a build in a VM while I look through the changes
15713:50 <jnewbery> first, I run something like `git log --oneline upstream/master..`
15813:50 <jnewbery> that gives me a list of all the commits in the PR branch, one per line
15913:51 <jnewbery> and then I use a one-liner:
16013:51 <jnewbery> for commit in `git log master..HEAD --oneline | cut -d' ' -f1 | tac`; do git log -1 $commit; git difftool ${commit}{^,} --dir-diff; done
16113:51 <jnewbery> which I have saved as git-review
16213:52 <jnewbery> that steps through the commits one-by-one, printing the commit log to the console then opening my difftool program
16313:52 <jnewbery> I'll look at the diff, and when I quit the difftool program, git-review will step forward to the next commit
16413:52 <jnewbery> First run through, I'll just skim everything, reading the commit logs and looking at the overall changes, so I get an idea of what the PR is doing
16513:53 <jnewbery> Then I'll go through again, but look at each commit in more detail, reviewing every line in detail
16613:53 <jnewbery> hmm, what else?
16713:54 <jnewbery> I make extensive use of the functional tests by adding `import pdb; pdb.set_trace()` break points, and then manually running RPC commands on the nodes under test
16813:54 <jb55> jnewbery: do you pull the PR description from the api or just use the website?
16913:54 <jnewbery> I pull the PR description from the github API and then use it to label the branch
17013:54 <jb55> label in what sense?
17113:54 <jnewbery> Here's what my `git branch` output looks like:
17213:54 <jnewbery> → gb
17313:55 <jnewbery> master fe47ae168 upstream/master --
17413:55 <jnewbery> pr10102 3440513a4 -- [ryanofsky] [experimental] Multiprocess bitcoin -
17513:55 <jnewbery> pr10823 03fa5a1b4 -- [greenaddress] Allow all mempool txs to be replaced after a configurable timeout (default 6h) -
17613:55 <jnewbery> pr12360 ab740a047 -- [jnewbery] Bury bip9 deployments -
17713:55 <jnewbery> pr13756 66f3e9780 -- [kallewoof] wallet: "avoid_reuse" wallet flag for improved privacy -
17813:55 <jnewbery> ...
17913:55 <jnewbery> There's an attribute of a git branch called `description` that you can update manually
18013:55 <jb55> whoa
18113:55 <jb55> I did not know that
18213:56 <fanquake> jnewbery what OS VM are you building in? using depends?
18313:56 <jb55> I personally use + git checkout -b pr${PR} refs/pull/origin/${PR}
18413:56 <fanquake> Running make check and the functional (--extended?) test after compiling?
18513:56 <jnewbery> my `gb` doesn't map onto `git branch` exactly. I do some formatting on the output so it includes the description
18613:57 <jnewbery> fanquake: I'm using vagrant, just the standard ubuntu bionic image
18713:57 <jnewbery> I have a vagrant file that installs all the dependencies. I can share that if people want it
18813:58 <michaelfolkson> Yes please
18913:58 <jnewbery> Yes, I'll run make check and the functional tests. Not usually extended because I'm too impatient
19013:59 <michaelfolkson> You're running tests using Travis
19113:59 <michaelfolkson> ?
19213:59 <jnewbery> I, like most developers, am pretty embarrassed about my workflow. It's just a bunch of stuff that's built up over the years and I'm sure there's plenty of space for optimizations
19313:59 <jnewbery> Yes, I have travis set up on jnewbery/bitcoin, so whenever I push a branch to my github repo it'll get built in travis
19414:00 <jnewbery> Although there's not much reason to look at that for reviewing, since the branch is already built in bitcoin/bitcoin travis
19514:00 <jnewbery> Here's my vagrant config: YMMV
19614:01 <jnewbery> That's time. Sorry for the monologue!
19714:01 <jnewbery> We can do more general techniques next week
19814:01 <michaelfolkson> That was great, thank you. I'll look over this and may have some questions next week
19914:01 <jonatack__> Thank you!
20014:01 <jnewbery> I'm still looking for suggestions for future PRs to cover and volunteers to lead discussion
20114:02 <merehap> Thanks!
20214:02 <jonatack__> ariard: great job
20314:02 <jnewbery> Thanks everyone, and thanks again ariard and achow101!
20414:02 <sosthene> Thanks, I need to digest all the links sent today!
20514:02 <b10c> Thanks!
20614:02 <lightlike> thanks!
20714:02 <jonatack__> achow101: thank you for coming
20814:02 <michaelfolkson> <sosthene> Me too lol
20914:03 <ariard> you're welcome, thanks for workflow jnewbery, a assumeutxo PR for a coming week ?
21014:08 <jnewbery> ariard: the next assumeutxo PR is . It's a refactor but it touches consensus/validation code, so it needs careful review
21114:09 <jnewbery> There's another good resoure on profiling here:
21214:59 <jnewbery> I've put my git-pr and git-br tools up at YMMV, no representations or warranties, expressed or implied, etc, etc