Remove calls to Chain::Lock methods (wallet)

https://github.com/bitcoin/bitcoin/17954

Host: ryanofsky

The PR branch HEAD was 1125fd92b at the time of this review club meeting.

Notes

This PR is one change in a sequence of PRs making the bitcoin wallet code more asynchronous and independent of node code, so the node can be more robust and performant, and not get locked up while wallet code is executing. The changes are also supposed to be architecture improvements, making the wallet/node interface more high-level and less tied to implementation details of the node and wallet. They also get wallet code closer to being able to work offline or in a different process without constantly having to query the node for information. Notable PRs making wallet code more asynchronous are:

  • #10286 by TheBlueMatt – changing the wallet to process BlockConnected, BlockDisconnected, and other notifications from the node asynchronously, adding CWallet::m_last_block_processed field

  • #15931 by ariard – changing the wallet to cache block heights, adding CWallet::m_last_block_processed_height and CWalletTx::m_confirm.block_height fields, and using cached heights to avoid needing to call Chain::Lock::getBlockDepth() to determine transaction status.

  • #17443 by ariard, pending review – using cached block heights to avoid needing to call Chain::checkFinalTx to determine transaction status

  • #17954 by ryanofsky, this pr – using cached block heights and CWallet::m_last_block_processed in various wallet functions to avoid the need to call Chain::Lock, and to make the wallet return internally-consistent information when m_last_block_processed lags behind the current node tip

  • #16426 by ariard, pending review – removing the interfaces::Chain::Lock class altogether to remove a way wallets could lock up the node unintentionally or longer than needed

All interactions between the wallet and node code happen though the interfaces::Chain and interfaces::Chain::Lock interfaces. Generally the less the wallet needs to call these interface methods, the more efficient both the node and the wallet can be. Also the simpler wallet code can be because it can avoid the headache of having to reconcile its cached state with latest information returned from the node, in cases where CWallet::m_last_block_processed has fallen behind the current node tip.

Questions

  1. Did you review the PRs? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)

  2. What happens in the interfaces::Chain::Lock chain constructor and destructor? What is the effect of writing auto locked_chain = pwallet->chain().lock();?

  3. What’s the difference between calling interfaces::Chain::getHeight() and CWallet::GetLastBlockHeight()? When will the values be the same, and when will they be different?

  4. In commit “wallet: Avoid use of Chain::Lock in listsinceblock” the call to locked_chain->getHeight() is replaced with pwallet->GetLastBlockHeight(). If listsinceblock is called with a non-null block hash, how does this change prevent transactions that are older than the specified block from being mistakenly printed?

  5. Trickier: In the same commit, the call to locked_chain->getBlockHash(last_height) is replaced with pwallet->GetLastBlockHash(). How does this change result in a more accurate lastblock value that won’t result in missing transactions when passed to a subsequent listsinceblock call?

  6. In the same commit, what is the coinbaseKey.MakeNewKey(true) line doing, and why does the test fail if it is removed?

  7. In commit “wallet: Avoid use of Chain::Lock in importmulti”, what was the removed fRescan = false statement intended to do? Why is it safe to drop?

Meeting Log

12:59 < luke-jr> hi
13:00 < emilengler> hi
13:00 < SirRichard> hi
13:00 < jnewbery> hi
13:00 < emzy> hi
13:00 < pbleam> hi
13:00 < nehan_> hi
13:00 < ariard_> hi
13:00 < ajonas> hi
13:00 < ryanofsky> #startmeeting
13:00 < ryanofsky> hi
13:01 < rockzombie2> hi
13:01 < ryanofsky> Welcome to the review meeting!
13:01 < felixweis> Hi
13:01 < ryanofsky> Link to notes and PR is https://bitcoincore.reviews/17954.html
13:01 < ryanofsky> Everyone can say hi (who didn't already) to let people know you're here
13:02 < ryanofsky> Reminder that everyone is here to learn, and all types of questions are encouraged. Anything that seems confusing is probably confusing to other people, and good to spend time figuring out together
13:02 < emilengler> Yeah then I have a question!
13:02 < ryanofsky> I have some prompt questions at the link above, but it is fine not to get through them if there are other things people want to talk about
13:02 < ryanofsky> No need to ask to ask, just ask!
13:02 < jonatack> hi
13:03 < andrewtoth> hi
13:03 < emilengler> I'm still a bit confused and not really that much into thread safe programming therefore reviewing this PR is hard
13:03 < emilengler> Can someone give a brief introduction to this?
13:03 < ryanofsky> Anyone want to jump in with a short summary?
13:05 < ryanofsky> I can say that the motivation for this is to be able to get rid of the Chain::Lock interface. Because that interface is low level and lets wallets slow down and potentially lock the node
13:06 < andrewtoth> It seems to be taking a piece of #16426 which we reviewed here previously, and splitting it out into its own PR no?
13:06 < luke-jr> I guess the important thing for reviewers to check, is that the code doesn't touch anything it actually needs the lock for
13:06 < rockzombie2> My limited understanding (not specifically related to bitcoin): Sharing resources among threads can cause confilcts in state. One solution is to "lock" a resource. There are many ways to be "thread-safe". Some more topics for further research that might help in understanding: Dining Philosophers problem, Spin-locks, Software Transactional Memory.
13:06 < ryanofsky> absolutely yes, yes, and yes
13:07 < luke-jr> I suggest looking at each commit individually
13:07 < luke-jr> (there's 11)
13:07 < nehan_> is it an invariant that wallet tip is always behind or at chain tip?
13:07 < ariard_> nehan_: no it can be also forked of chain tip
13:08 < luke-jr> or even ahead, if reindexing or such
13:08 < ariard_> that where you may have trouble and the initial reason of Russ working on #17954
13:08 < ryanofsky> yes, this gets to the question: What’s the difference between calling interfaces::Chain::getHeight() and CWallet::GetLastBlockHeight()? When will the values be the same, and when will they be different?
13:09 < ryanofsky> the next question looks into a specific example: In commit “wallet: Avoid use of Chain::Lock in listsinceblock” the call to locked_chain->getHeight() is replaced with pwallet->GetLastBlockHeight(). If listsinceblock is called with a non-null block hash, how does this change prevent transactions that are older than the specified block from being mistakenly printed?
13:09 < ryanofsky> commit: https://github.com/bitcoin/bitcoin/pull/17954/commits/e276b6821430ec2c18aba55137daf98bae770054
13:11 < ryanofsky> the changes to listsinceblock are small, but the existing code is dense, people may have questions just about that
13:14 < ryanofsky> the key to which transactions are printed is the abs(tx.GetDepthInMainChain()) < depth check on line 1616
13:16 < ryanofsky> the depth value is computed differently now with wallet tip: pwallet->GetLastBlockHeight(), instead of node tip: locked_chain->getHeight()
13:18 < ryanofsky> if a new block is connected, and the wallet didn't catch up yet. the height variable will be bigger or smaller? and the depth value?
13:19 < ariard_> ryanofsky: there is also a behavior change I think by returning lastblock=findAncestorByHeight?
13:20 < ryanofsky> yes, there are two behavior changes in the function
13:20 < ryanofsky> the lastblock behavior change is more complicated to think about since it involves multiple listsinceblock calls
13:21 < nehan_> if the wallet is behind chain tip, depth will be smaller?
13:21 < ryanofsky> the transaction listing height/depth change is simpler, because it can be understood looking at a single listsinceblock call
13:22 < ryanofsky> nehan_: yes depth is smaller, so older transactions that weren't requested won't be printed
13:23 < ryanofsky> This is a corner case race condition. Goal of PR is not to change behavior, but to clean up the API and fix some of these little races
13:23 < rockzombie2> Sorry, I'm still very new to this review process, but I'm hoping I can ramp up quickly by attending these reviews because I want to be helpful. As for testing your behavior changes, what are the cases I should try to test (and how exactly can I cover that test case in a multi-threaded environment?)
13:24 < nehan_> ryanofsky: what do you mean by "older transactions that weren't requested won't be printed"?
13:25 < nehan_> you mean in the new behavior, it will print less?
13:25 < nehan_> but i'm not sure how those "weren't requested"
13:25 < ryanofsky> rockzombie2: great question, anybody have any suggestions for testing? i will say bitcoin core doesn't currently have great for testing infrastrucuture to force code to run in order and simulate races
13:25 < ryanofsky> but we are always adding new testing features
13:26 < ryanofsky> nehan_: listsinceblock is typically called with a specific block hash, so you only want to return transactions after that block
13:26 < luke-jr> or if that block isn't in the main chain, transactions reorg'd out
13:27 < nehan_> ryanofsky: yes. and in the new behavior, in the case where the wallet is behind chain, it will print fewer transactions. but it's not clear to me why those "weren't requested"
13:27 < ryanofsky> nehan_: yes I actually left something out
13:28 < ryanofsky> in order for there to be a race condition, the "auto locked_chain = pwallet->chain().lock();" line above has to be removed, which is removed in a later PR
13:28 < ryanofsky> the problem is in the comparison tx.GetDepthInMainChain() < depth
13:29 < ryanofsky> you don't want to compute the depth on the left side and the depth on the right side with two different tips
13:30 < nehan_> ryanofsky: ah ok, thanks. but isn't this fixed in the new code because both of things will be under cs_wallet?
13:30 < nehan_> ryanofsky: are you saying there *was* a race condition?
13:31 < ryanofsky> i should check on what the later PRs are doing, but it is saying there would be a race condition
13:32 < ryanofsky> this might be a bad example, because it is talking about a theoretical race condition
13:32 < ryanofsky> but hopefully it at least suggests what the problem is, because this example is simpler than other cases like the one ariard was talking about that involves separate RPC calls
13:33 < jonatack> rockzombie2: WRT what cases to try to test, one thing you can try is look at the cases ryanofsky describes in the commit message, then look at the relevant unit and functional tests to see if the case is tested, and how (and if not, try adding it)
13:34 < ariard_> you shouldn't have a race condition even after chain lock removal because now you're relying on same tip for both computation
13:34 < ariard_> like finding the common ancestor and the comparison tx.GetDepthInMainChain() < depth
13:35 < ryanofsky> ariard_: yes you'd have to remove the wallet lock too, which i can't rememeber if we do
13:35 < nehan_> ariard_: this was my interpretation as well
13:35 < jonatack> for instance, in commit e276b682, look at the tests that are added in that commit, and at the those in test/functional/wallet_listsinceblock.py
13:36 < ariard_> ryanofsky: both are under cs_wallet lock so you can't get a block connected/disconnected in between
13:36 < rockzombie2> Thanks jonatack, I'll look into that
13:37 < ryanofsky> yes, this race condition will only happen we stop holding cs_wallet throughout the call
13:37 < jonatack> rockzombie2: fwiw, wallet_listsinceblock.py has some *really* interesting tests imo
13:37 < nehan_> how important is it to preserve all the little behavioral quirks in the wallet as you make these changes to pull out cs_main? do you know who relies on listsinceblock?
13:38 < ryanofsky> probably not too important, there was an interesting issue opened about listsinceblock and how it's intended to be used, trying to find it
13:39 < ryanofsky> the changes within listsinceblock are just meant to be minimal, if it would be simpler to change behavior in some way, that's be reasonable to do
13:40 < jonatack> fwiw i tried to add test coverage for block heights to listsinceblock.py https://github.com/bitcoin/bitcoin/pull/17535
13:40 < jonatack> but the PR went nowhere. it seems difficult nowadays to add functional test coverage.
13:40 < ryanofsky> i think this was the interesting listsinceblock issue I remembered: https://github.com/bitcoin/bitcoin/pull/9622
13:42 < ryanofsky> maybe someone wants to reason through the next question about listsinceblock: In the same commit, the call to locked_chain->getBlockHash(last_height) is replaced with pwallet->GetLastBlockHash(). How does this change result in a more accurate lastblock value that won’t result in missing transactions when passed to a subsequent listsinceblock call?
13:43 < ariard_> rockzombie2: you should test this on the cli, by doing reorg and calling RPC in a while loop, at least I remember doing this while working on some refactoring
13:45 < ryanofsky> i guess this question is also assuming that cs_main isn't locked throughout the call as is the case now
13:46 < ariard_> if wallet tip has been reorg out that avoid hitting the assert
13:46 < ryanofsky> or actually, no even with cs_main held, the node tip could be ahead of the wallet last block processed at the moment lock was acquired
13:48 < ariard_> but int that case you would find the last_height inside ChainActive so not a issue?
13:48 < ryanofsky> ariard yes, that is one case, but in the case of a reorg locked_chain->getBlockHash(last_height) could return a completely unrelated block
13:50 < ryanofsky> ariard_, yes i think you're right, there can only be missing transactions if there's a reorg. the other case would result in hitting an assert
13:51 < ryanofsky> IMO, code is easier to reason about if you only have to think about the wallet last block processed, and return information relative to that
13:51 < ryanofsky> not caring about the node tip or active chain
13:51 < ariard_> yes I agree on this, caring about both node tip and wallet tip is headache guarantee
13:53 < ryanofsky> ~10 minutes left, any questions about wallet/node in general, or maybe other parts of the PR?
13:55 < jonatack> ryanofsky: when you wrote "In the same commit, the call to locked_chain->getBlockHash(last_height) is replaced with pwallet->GetLastBlockHash()"...
13:55 < rockzombie2> I have a general question about the github Bitcoin group. Who exactly are all of those contributors? Like, how did they qualify and become the main body that accepts PRs to bitcoin core?
13:56 < jonatack> are you referring to the code in UniValue listsinceblock in rpcwallet.cpp, or?
13:57 < ryanofsky> jonatack: just the commit with that title https://github.com/bitcoin/bitcoin/pull/17954/commits/e276b6821430ec2c18aba55137daf98bae770054
13:57 < luke-jr> rockzombie2: in theory, it's just a matter of anyone doing the reviews needed
13:57 < ryanofsky> line 1641: ... pwallet->chain().findAncestorByHeight(pwallet->GetLastBlockHash() ...
13:58 < ryanofsky> rockzombie2: yes, there's no qualification process, decisions to merge are based on merits of the individual PR
13:59 < rockzombie2> I see
13:59 < jonatack> ryanofsky: ok, i thought you were describing a diff, and i wasn't finding it
13:59 < jonatack> rockzombie2: see the Peer Review section of CONTRIBUTING.md
14:00 < ryanofsky> jonattack: ok hopefully clearer now, but let me know if it's not. there is one new line replacing two deleted lines there, and I was asking about that change
14:01 < ryanofsky> Looks like time is up, thanks everybody for participating!
14:01 < ryanofsky> #endmeeting
14:01 < nehan_> thanks
14:01 < rockzombie2> Thank you!
14:01 < jonatack> ryanofsky: there are two instances in that commit of one line replacing two others :p
14:01 < ryanofsky> Thanks for finding mistakes in my questions, too. Those were definitely on purpose
14:02 < jonatack> i suppose you mean the first instance
14:03 < emzy> thanks ryanofsky
14:03 < jonatack> thanks for hosting ryanofsky, will try to finish review of that PR
14:03 < jnewbery> Thanks for hosting, ryanofksy! I'll publish the meeting log to the website soon
14:03 < ryanofsky> jonattack: just the place where "locked_chain->getBlockHash(last_height)..." is replaced
14:04 < jonatack> ryanofsky: thanks, was convinced we were discussing the other one :)
14:05 < jonatack> somehow misread and misunderstood, all good
14:06 < ryanofsky> Ah well, soon enough github will have an integrated IRC and we will avoid these problem
14:06 < jonatack> more reliance on github? goodness, i certainly hope not
14:07 < frenchNoob> Thanks