<pinheadmz> in general, when reviewing a new PR, what are some of your strategies? read first? download first? I notice this PR doesnt have tests. I usually liek to start by reading tests because they are just simpler to understadn
<jnewbery> ok, we'll talk about a couple of items today. One is PR #15741, led by ariard. I see we have the PR author here. Then we'll cover general review tips and techniques
<ariard> because IMO knowing this fact is going to guide how you read commits first time, how much time you will need for review and which kind of tests needed
<ariard> read again each commit, check if function match their comment description, if there is new data structure, how to reduce complexity of them, ...
<ariard> Basically, if I get it well, core wallet has in-memory berkeley database, and before to PR, it was dumping them on-disk when object reference was out-of-scope
<achow101> it was a problem if you wanted to import several thousand individual items with importmulti. just that making that command is annoying so no one actually tried. with ranged descriptors, importing such a large number is very easy
<ariard> I agree, and that why I only tested with ranged descriptors, having a thousand individual items if there is that much users with this number of items
<ariard> what else ? some memory exhaustion where removed from UpgradeKeyMetadata as there is no more a risk to have too large data structure in memory
<jb55> this is a big deal now because you can dump output descriptors with HWI to track your hw wallet key balances without third party vendors. before it was too annoying because importmulti was so slow
<ariard> given that PR was already reviewed by a lot of people, I only manually test importmulti with ranged descriptors on my laptop to see performance diff between PR and master
<ariard> achow101: ah and the comment was something like "avoid creating overlarge in-memory batches" what is the issue with having too large in-memory batches ?
<achow101> jnewbery: since this is a performance change, I don't think there really is a way to add automated tests. there isn't necessarily a "correct" or "incorrect" to test for
<achow101> ariard: no, but I did try larger numbers. I think it hit the point of diminishing returns very quickly. I tried 10000 and there was basically no difference
<achow101> the way I tested is listed at the top of the PR. I used the time command and just timed the same importmulti command on the master branch and on the pr branch
<sosthene> I'm still not sure about the use case for importmulti sorry, it is like importing an xpub to make a watch-only wallet? Are there other situations we need to import so many keys?
<lightlike> I was wondering whether there is a way to test that there was no functional impact of the PR, and it is just faster. Naively, like comparing wallets created by the import before and after.
<michaelfolkson> Cool, thanks <achow101>. And before batching pubkeys was introduced, they were written to the database individually. That means each one was flushed to disk individually? So next to no memory requirements and the database being closed and reopened each for each write. Have I got it right?
<achow101> sosthene: yes, it is largely used for importing things for watch only wallets. for me, I'm using it with the HWI tool in order to use my hardware wallet with Bitcoin Core
<jb55> so you can imagine now with a gui tool + HWI users could use trezor directly with their full node. last thing we need is hw wallet PSBT support for spending
<jnewbery> so you could run wallet_importmulti.py --nocleanup before the changes, copy the wallet.dat files, then do the same after the changes and compare the wallet.dat files
<achow101> my goto has been using the db_dump utility provided by bdb and just dumping and then diff'ing the output of the wallet files. probably not the best method though...
<amiti> Was a specific reason to force the write identified? It seems like the merged code didn’t include it. What was the final thought process here?
<achow101> michaelfolkson: part of the pr was moving a bunch of the import stuff from importmulti's handler into the CWallet class. so imports are handled by the wallet and not the rpc. the batching is also handled by WalletBatch instead of by the caller (e.g. rpc) as was done before
<jnewbery> Generally speaking we want to the RPC code to be as thin a layer as possible. bitcoind has other interfaces (GUI, REST, ZMQ), so the less code is in the interface modules, the more can be reused
<jonatack_> achow101 wrote "As soon as a batch goes out of scope, every single update is committed to the file. If you make 1500 updates, the first 1000 will be written after you do the 1000th write operation, and then the last 500 will be written when the batch object is deleted."
<achow101> amiti: in the line after, SetWalletFlag is called which itself does a write to the database. however we only want to do that after we are sure that all of the keymetadata has been upgraded so we force a write to ensure that the updated metadata is written before setting that flag
<michaelfolkson> I struggle to draw the line between writing a test where it is obvious from the code that it does what it should and deciding commented code is sufficient with no test
<achow101> michaelfolkson: generally, if it is testable, then there should be a test. even if the code is obviously correct, that could change or something else could change which cauess functionality to break
<jnewbery> ok, well first off, I'll always download the PR branch to my machine so I can build and review locally. I don't use the github webpage to review, just to leave comments
<jnewbery> First run through, I'll just skim everything, reading the commit logs and looking at the overall changes, so I get an idea of what the PR is doing
<jnewbery> I make extensive use of the functional tests by adding `import pdb; pdb.set_trace()` break points, and then manually running RPC commands on the nodes under test
<jnewbery> I, like most developers, am pretty embarrassed about my workflow. It's just a bunch of stuff that's built up over the years and I'm sure there's plenty of space for optimizations