Remove calls to Chain::Lock methods (wallet)

https://github.com/bitcoin/bitcoin/pull/17954

Host: ryanofsky  -  PR author: ryanofsky

The PR branch HEAD was 1125fd92b at the time of this review club meeting.

Notes

This PR is one change in a sequence of PRs making the bitcoin wallet code more asynchronous and independent of node code, so the node can be more robust and performant, and not get locked up while wallet code is executing. The changes are also supposed to be architecture improvements, making the wallet/node interface more high-level and less tied to implementation details of the node and wallet. They also get wallet code closer to being able to work offline or in a different process without constantly having to query the node for information. Notable PRs making wallet code more asynchronous are:

  • #10286 by TheBlueMatt – changing the wallet to process BlockConnected, BlockDisconnected, and other notifications from the node asynchronously, adding CWallet::m_last_block_processed field

  • #15931 by ariard – changing the wallet to cache block heights, adding CWallet::m_last_block_processed_height and CWalletTx::m_confirm.block_height fields, and using cached heights to avoid needing to call Chain::Lock::getBlockDepth() to determine transaction status.

  • #17443 by ariard, pending review – using cached block heights to avoid needing to call Chain::checkFinalTx to determine transaction status

  • #17954 by ryanofsky, this pr – using cached block heights and CWallet::m_last_block_processed in various wallet functions to avoid the need to call Chain::Lock, and to make the wallet return internally-consistent information when m_last_block_processed lags behind the current node tip

  • #16426 by ariard, pending review – removing the interfaces::Chain::Lock class altogether to remove a way wallets could lock up the node unintentionally or longer than needed

All interactions between the wallet and node code happen though the interfaces::Chain and interfaces::Chain::Lock interfaces. Generally the less the wallet needs to call these interface methods, the more efficient both the node and the wallet can be. Also the simpler wallet code can be because it can avoid the headache of having to reconcile its cached state with latest information returned from the node, in cases where CWallet::m_last_block_processed has fallen behind the current node tip.

Questions

  1. Did you review the PRs? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)

  2. What happens in the interfaces::Chain::Lock chain constructor and destructor? What is the effect of writing auto locked_chain = pwallet->chain().lock();?

  3. What’s the difference between calling interfaces::Chain::getHeight() and CWallet::GetLastBlockHeight()? When will the values be the same, and when will they be different?

  4. In commit “wallet: Avoid use of Chain::Lock in listsinceblock” the call to locked_chain->getHeight() is replaced with pwallet->GetLastBlockHeight(). If listsinceblock is called with a non-null block hash, how does this change prevent transactions that are older than the specified block from being mistakenly printed?

  5. Trickier: In the same commit, the call to locked_chain->getBlockHash(last_height) is replaced with pwallet->GetLastBlockHash(). How does this change result in a more accurate lastblock value that won’t result in missing transactions when passed to a subsequent listsinceblock call?

  6. In the same commit, what is the coinbaseKey.MakeNewKey(true) line doing, and why does the test fail if it is removed?

  7. In commit “wallet: Avoid use of Chain::Lock in importmulti”, what was the removed fRescan = false statement intended to do? Why is it safe to drop?

Meeting Log

  112:59 <luke-jr> hi
  213:00 <emilengler> hi
  313:00 <SirRichard> hi
  413:00 <jnewbery> hi
  513:00 <emzy> hi
  613:00 <pbleam> hi
  713:00 <nehan_> hi
  813:00 <ariard_> hi
  913:00 <ajonas> hi
 1013:00 <ryanofsky> #startmeeting
 1113:00 <ryanofsky> hi
 1213:01 <rockzombie2> hi
 1313:01 <ryanofsky> Welcome to the review meeting!
 1413:01 <felixweis> Hi
 1513:01 <ryanofsky> Link to notes and PR is https://bitcoincore.reviews/17954.html
 1613:01 <ryanofsky> Everyone can say hi (who didn't already) to let people know you're here
 1713:02 <ryanofsky> Reminder that everyone is here to learn, and all types of questions are encouraged. Anything that seems confusing is probably confusing to other people, and good to spend time figuring out together
 1813:02 <emilengler> Yeah then I have a question!
 1913:02 <ryanofsky> I have some prompt questions at the link above, but it is fine not to get through them if there are other things people want to talk about
 2013:02 <ryanofsky> No need to ask to ask, just ask!
 2113:02 <jonatack> hi
 2213:03 <andrewtoth> hi
 2313:03 <emilengler> I'm still a bit confused and not really that much into thread safe programming therefore reviewing this PR is hard
 2413:03 <emilengler> Can someone give a brief introduction to this?
 2513:03 <ryanofsky> Anyone want to jump in with a short summary?
 2613:05 <ryanofsky> I can say that the motivation for this is to be able to get rid of the Chain::Lock interface. Because that interface is low level and lets wallets slow down and potentially lock the node
 2713:06 <andrewtoth> It seems to be taking a piece of #16426 which we reviewed here previously, and splitting it out into its own PR no?
 2813:06 <luke-jr> I guess the important thing for reviewers to check, is that the code doesn't touch anything it actually needs the lock for
 2913:06 <rockzombie2> My limited understanding (not specifically related to bitcoin): Sharing resources among threads can cause confilcts in state. One solution is to "lock" a resource. There are many ways to be "thread-safe". Some more topics for further research that might help in understanding: Dining Philosophers problem, Spin-locks, Software Transactional Memory.
 3013:06 <ryanofsky> absolutely yes, yes, and yes
 3113:07 <luke-jr> I suggest looking at each commit individually
 3213:07 <luke-jr> (there's 11)
 3313:07 <nehan_> is it an invariant that wallet tip is always behind or at chain tip?
 3413:07 <ariard_> nehan_: no it can be also forked of chain tip
 3513:08 <luke-jr> or even ahead, if reindexing or such
 3613:08 <ariard_> that where you may have trouble and the initial reason of Russ working on #17954
 3713:08 <ryanofsky> yes, this gets to the question: What’s the difference between calling interfaces::Chain::getHeight() and CWallet::GetLastBlockHeight()? When will the values be the same, and when will they be different?
 3813:09 <ryanofsky> the next question looks into a specific example: In commit “wallet: Avoid use of Chain::Lock in listsinceblock” the call to locked_chain->getHeight() is replaced with pwallet->GetLastBlockHeight(). If listsinceblock is called with a non-null block hash, how does this change prevent transactions that are older than the specified block from being mistakenly printed?
 3913:09 <ryanofsky> commit: https://github.com/bitcoin-core-review-club/bitcoin/commit/e276b68
 4013:11 <ryanofsky> the changes to listsinceblock are small, but the existing code is dense, people may have questions just about that
 4113:14 <ryanofsky> the key to which transactions are printed is the abs(tx.GetDepthInMainChain()) < depth check on line 1616
 4213:16 <ryanofsky> the depth value is computed differently now with wallet tip: pwallet->GetLastBlockHeight(), instead of node tip: locked_chain->getHeight()
 4313:18 <ryanofsky> if a new block is connected, and the wallet didn't catch up yet. the height variable will be bigger or smaller? and the depth value?
 4413:19 <ariard_> ryanofsky: there is also a behavior change I think by returning lastblock=findAncestorByHeight?
 4513:20 <ryanofsky> yes, there are two behavior changes in the function
 4613:20 <ryanofsky> the lastblock behavior change is more complicated to think about since it involves multiple listsinceblock calls
 4713:21 <nehan_> if the wallet is behind chain tip, depth will be smaller?
 4813:21 <ryanofsky> the transaction listing height/depth change is simpler, because it can be understood looking at a single listsinceblock call
 4913:22 <ryanofsky> nehan_: yes depth is smaller, so older transactions that weren't requested won't be printed
 5013:23 <ryanofsky> This is a corner case race condition. Goal of PR is not to change behavior, but to clean up the API and fix some of these little races
 5113:23 <rockzombie2> Sorry, I'm still very new to this review process, but I'm hoping I can ramp up quickly by attending these reviews because I want to be helpful. As for testing your behavior changes, what are the cases I should try to test (and how exactly can I cover that test case in a multi-threaded environment?)
 5213:24 <nehan_> ryanofsky: what do you mean by "older transactions that weren't requested won't be printed"?
 5313:25 <nehan_> you mean in the new behavior, it will print less?
 5413:25 <nehan_> but i'm not sure how those "weren't requested"
 5513:25 <ryanofsky> rockzombie2: great question, anybody have any suggestions for testing? i will say bitcoin core doesn't currently have great for testing infrastrucuture to force code to run in order and simulate races
 5613:25 <ryanofsky> but we are always adding new testing features
 5713:26 <ryanofsky> nehan_: listsinceblock is typically called with a specific block hash, so you only want to return transactions after that block
 5813:26 <luke-jr> or if that block isn't in the main chain, transactions reorg'd out
 5913:27 <nehan_> ryanofsky: yes. and in the new behavior, in the case where the wallet is behind chain, it will print fewer transactions. but it's not clear to me why those "weren't requested"
 6013:27 <ryanofsky> nehan_: yes I actually left something out
 6113:28 <ryanofsky> in order for there to be a race condition, the "auto locked_chain = pwallet->chain().lock();" line above has to be removed, which is removed in a later PR
 6213:28 <ryanofsky> the problem is in the comparison tx.GetDepthInMainChain() < depth
 6313:29 <ryanofsky> you don't want to compute the depth on the left side and the depth on the right side with two different tips
 6413:30 <nehan_> ryanofsky: ah ok, thanks. but isn't this fixed in the new code because both of things will be under cs_wallet?
 6513:30 <nehan_> ryanofsky: are you saying there *was* a race condition?
 6613:31 <ryanofsky> i should check on what the later PRs are doing, but it is saying there would be a race condition
 6713:32 <ryanofsky> this might be a bad example, because it is talking about a theoretical race condition
 6813:32 <ryanofsky> but hopefully it at least suggests what the problem is, because this example is simpler than other cases like the one ariard was talking about that involves separate RPC calls
 6913:33 <jonatack> rockzombie2: WRT what cases to try to test, one thing you can try is look at the cases ryanofsky describes in the commit message, then look at the relevant unit and functional tests to see if the case is tested, and how (and if not, try adding it)
 7013:34 <ariard_> you shouldn't have a race condition even after chain lock removal because now you're relying on same tip for both computation
 7113:34 <ariard_> like finding the common ancestor and the comparison tx.GetDepthInMainChain() < depth
 7213:35 <ryanofsky> ariard_: yes you'd have to remove the wallet lock too, which i can't rememeber if we do
 7313:35 <nehan_> ariard_: this was my interpretation as well
 7413:35 <jonatack> for instance, in commit e276b682, look at the tests that are added in that commit, and at the those in test/functional/wallet_listsinceblock.py
 7513:36 <ariard_> ryanofsky: both are under cs_wallet lock so you can't get a block connected/disconnected in between
 7613:36 <rockzombie2> Thanks jonatack, I'll look into that
 7713:37 <ryanofsky> yes, this race condition will only happen we stop holding cs_wallet throughout the call
 7813:37 <jonatack> rockzombie2: fwiw, wallet_listsinceblock.py has some *really* interesting tests imo
 7913:37 <nehan_> how important is it to preserve all the little behavioral quirks in the wallet as you make these changes to pull out cs_main? do you know who relies on listsinceblock?
 8013:38 <ryanofsky> probably not too important, there was an interesting issue opened about listsinceblock and how it's intended to be used, trying to find it
 8113:39 <ryanofsky> the changes within listsinceblock are just meant to be minimal, if it would be simpler to change behavior in some way, that's be reasonable to do
 8213:40 <jonatack> fwiw i tried to add test coverage for block heights to listsinceblock.py https://github.com/bitcoin/bitcoin/pull/17535
 8313:40 <jonatack> but the PR went nowhere. it seems difficult nowadays to add functional test coverage.
 8413:40 <ryanofsky> i think this was the interesting listsinceblock issue I remembered: https://github.com/bitcoin/bitcoin/pull/9622
 8513:42 <ryanofsky> maybe someone wants to reason through the next question about listsinceblock: In the same commit, the call to locked_chain->getBlockHash(last_height) is replaced with pwallet->GetLastBlockHash(). How does this change result in a more accurate lastblock value that won’t result in missing transactions when passed to a subsequent listsinceblock call?
 8613:43 <ariard_> rockzombie2: you should test this on the cli, by doing reorg and calling RPC in a while loop, at least I remember doing this while working on some refactoring
 8713:45 <ryanofsky> i guess this question is also assuming that cs_main isn't locked throughout the call as is the case now
 8813:46 <ariard_> if wallet tip has been reorg out that avoid hitting the assert
 8913:46 <ryanofsky> or actually, no even with cs_main held, the node tip could be ahead of the wallet last block processed at the moment lock was acquired
 9013:48 <ariard_> but int that case you would find the last_height inside ChainActive so not a issue?
 9113:48 <ryanofsky> ariard yes, that is one case, but in the case of a reorg locked_chain->getBlockHash(last_height) could return a completely unrelated block
 9213:50 <ryanofsky> ariard_, yes i think you're right, there can only be missing transactions if there's a reorg. the other case would result in hitting an assert
 9313:51 <ryanofsky> IMO, code is easier to reason about if you only have to think about the wallet last block processed, and return information relative to that
 9413:51 <ryanofsky> not caring about the node tip or active chain
 9513:51 <ariard_> yes I agree on this, caring about both node tip and wallet tip is headache guarantee
 9613:53 <ryanofsky> ~10 minutes left, any questions about wallet/node in general, or maybe other parts of the PR?
 9713:55 <jonatack> ryanofsky: when you wrote "In the same commit, the call to locked_chain->getBlockHash(last_height) is replaced with pwallet->GetLastBlockHash()"...
 9813:55 <rockzombie2> I have a general question about the github Bitcoin group. Who exactly are all of those contributors? Like, how did they qualify and become the main body that accepts PRs to bitcoin core?
 9913:56 <jonatack> are you referring to the code in UniValue listsinceblock in rpcwallet.cpp, or?
10013:57 <ryanofsky> jonatack: just the commit with that title https://github.com/bitcoin-core-review-club/bitcoin/commit/e276b68
10113:57 <luke-jr> rockzombie2: in theory, it's just a matter of anyone doing the reviews needed
10213:57 <ryanofsky> line 1641: ... pwallet->chain().findAncestorByHeight(pwallet->GetLastBlockHash() ...
10313:58 <ryanofsky> rockzombie2: yes, there's no qualification process, decisions to merge are based on merits of the individual PR
10413:59 <rockzombie2> I see
10513:59 <jonatack> ryanofsky: ok, i thought you were describing a diff, and i wasn't finding it
10613:59 <jonatack> rockzombie2: see the Peer Review section of CONTRIBUTING.md
10714:00 <ryanofsky> jonattack: ok hopefully clearer now, but let me know if it's not. there is one new line replacing two deleted lines there, and I was asking about that change
10814:01 <ryanofsky> Looks like time is up, thanks everybody for participating!
10914:01 <ryanofsky> #endmeeting
11014:01 <nehan_> thanks
11114:01 <rockzombie2> Thank you!
11214:01 <jonatack> ryanofsky: there are two instances in that commit of one line replacing two others :p
11314:01 <ryanofsky> Thanks for finding mistakes in my questions, too. Those were definitely on purpose
11414:02 <jonatack> i suppose you mean the first instance
11514:03 <emzy> thanks ryanofsky
11614:03 <jonatack> thanks for hosting ryanofsky, will try to finish review of that PR
11714:03 <jnewbery> Thanks for hosting, ryanofksy! I'll publish the meeting log to the website soon
11814:03 <ryanofsky> jonattack: just the place where "locked_chain->getBlockHash(last_height)..." is replaced
11914:04 <jonatack> ryanofsky: thanks, was convinced we were discussing the other one :)
12014:05 <jonatack> somehow misread and misunderstood, all good
12114:06 <ryanofsky> Ah well, soon enough github will have an integrated IRC and we will avoid these problem
12214:06 <jonatack> more reliance on github? goodness, i certainly hope not
12314:07 <frenchNoob> Thanks