Remove calls to Chain::Lock methods (wallet)
The PR branch HEAD was 1125fd92b at the time of this review club meeting.
This PR is one change in a sequence of PRs making the bitcoin wallet code more asynchronous and independent of node code, so the node can be more robust and performant, and not get locked up while wallet code is executing. The changes are also supposed to be architecture improvements, making the wallet/node interface more high-level and less tied to implementation details of the node and wallet. They also get wallet code closer to being able to work offline or in a different process without constantly having to query the node for information. Notable PRs making wallet code more asynchronous are:
#10286 by TheBlueMatt – changing the wallet to process
BlockDisconnected, and other notifications from the node asynchronously, adding
#15931 by ariard – changing the wallet to cache block heights, adding
CWalletTx::m_confirm.block_heightfields, and using cached heights to avoid needing to call
Chain::Lock::getBlockDepth()to determine transaction status.
#17443 by ariard, pending review – using cached block heights to avoid needing to call Chain::checkFinalTx to determine transaction status
#17954 by ryanofsky, this pr – using cached block heights and
CWallet::m_last_block_processedin various wallet functions to avoid the need to call
Chain::Lock, and to make the wallet return internally-consistent information when
m_last_block_processedlags behind the current node tip
#16426 by ariard, pending review – removing the
interfaces::Chain::Lockclass altogether to remove a way wallets could lock up the node unintentionally or longer than needed
All interactions between the wallet and node code happen though the
interfaces. Generally the less the wallet needs to call these interface methods,
the more efficient both the node and the wallet can be. Also the simpler wallet
code can be because it can avoid the headache of having to reconcile its cached
state with latest information returned from the node, in cases where
CWallet::m_last_block_processed has fallen behind the current node tip.
Did you review the PRs? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)
What happens in the
interfaces::Chain::Lockchain constructor and destructor? What is the effect of writing
auto locked_chain = pwallet->chain().lock();?
What’s the difference between calling
CWallet::GetLastBlockHeight()? When will the values be the same, and when will they be different?
In commit “wallet: Avoid use of Chain::Lock in listsinceblock” the call to
locked_chain->getHeight()is replaced with
listsinceblockis called with a non-null block hash, how does this change prevent transactions that are older than the specified block from being mistakenly printed?
Trickier: In the same commit, the call to
locked_chain->getBlockHash(last_height)is replaced with
pwallet->GetLastBlockHash(). How does this change result in a more accurate
lastblockvalue that won’t result in missing transactions when passed to a subsequent
In the same commit, what is the
coinbaseKey.MakeNewKey(true)line doing, and why does the test fail if it is removed?
In commit “wallet: Avoid use of Chain::Lock in importmulti”, what was the removed
fRescan = falsestatement intended to do? Why is it safe to drop?
12:59 < luke-jr> hi 13:00 < emilengler> hi 13:00 < SirRichard> hi 13:00 < jnewbery> hi 13:00 < emzy> hi 13:00 < pbleam> hi 13:00 < nehan_> hi 13:00 < ariard_> hi 13:00 < ajonas> hi 13:00 < ryanofsky> #startmeeting 13:00 < ryanofsky> hi 13:01 < rockzombie2> hi 13:01 < ryanofsky> Welcome to the review meeting! 13:01 < felixweis> Hi 13:01 < ryanofsky> Link to notes and PR is https://bitcoincore.reviews/17954.html 13:01 < ryanofsky> Everyone can say hi (who didn't already) to let people know you're here 13:02 < ryanofsky> Reminder that everyone is here to learn, and all types of questions are encouraged. Anything that seems confusing is probably confusing to other people, and good to spend time figuring out together 13:02 < emilengler> Yeah then I have a question! 13:02 < ryanofsky> I have some prompt questions at the link above, but it is fine not to get through them if there are other things people want to talk about 13:02 < ryanofsky> No need to ask to ask, just ask! 13:02 < jonatack> hi 13:03 < andrewtoth> hi 13:03 < emilengler> I'm still a bit confused and not really that much into thread safe programming therefore reviewing this PR is hard 13:03 < emilengler> Can someone give a brief introduction to this? 13:03 < ryanofsky> Anyone want to jump in with a short summary? 13:05 < ryanofsky> I can say that the motivation for this is to be able to get rid of the Chain::Lock interface. Because that interface is low level and lets wallets slow down and potentially lock the node 13:06 < andrewtoth> It seems to be taking a piece of #16426 which we reviewed here previously, and splitting it out into its own PR no? 13:06 < luke-jr> I guess the important thing for reviewers to check, is that the code doesn't touch anything it actually needs the lock for 13:06 < rockzombie2> My limited understanding (not specifically related to bitcoin): Sharing resources among threads can cause confilcts in state. One solution is to "lock" a resource. There are many ways to be "thread-safe". Some more topics for further research that might help in understanding: Dining Philosophers problem, Spin-locks, Software Transactional Memory. 13:06 < ryanofsky> absolutely yes, yes, and yes 13:07 < luke-jr> I suggest looking at each commit individually 13:07 < luke-jr> (there's 11) 13:07 < nehan_> is it an invariant that wallet tip is always behind or at chain tip? 13:07 < ariard_> nehan_: no it can be also forked of chain tip 13:08 < luke-jr> or even ahead, if reindexing or such 13:08 < ariard_> that where you may have trouble and the initial reason of Russ working on #17954 13:08 < ryanofsky> yes, this gets to the question: What’s the difference between calling interfaces::Chain::getHeight() and CWallet::GetLastBlockHeight()? When will the values be the same, and when will they be different? 13:09 < ryanofsky> the next question looks into a specific example: In commit “wallet: Avoid use of Chain::Lock in listsinceblock” the call to locked_chain->getHeight() is replaced with pwallet->GetLastBlockHeight(). If listsinceblock is called with a non-null block hash, how does this change prevent transactions that are older than the specified block from being mistakenly printed? 13:09 < ryanofsky> commit: https://github.com/bitcoin/bitcoin/pull/17954/commits/e276b6821430ec2c18aba55137daf98bae770054 13:11 < ryanofsky> the changes to listsinceblock are small, but the existing code is dense, people may have questions just about that 13:14 < ryanofsky> the key to which transactions are printed is the abs(tx.GetDepthInMainChain()) < depth check on line 1616 13:16 < ryanofsky> the depth value is computed differently now with wallet tip: pwallet->GetLastBlockHeight(), instead of node tip: locked_chain->getHeight() 13:18 < ryanofsky> if a new block is connected, and the wallet didn't catch up yet. the height variable will be bigger or smaller? and the depth value? 13:19 < ariard_> ryanofsky: there is also a behavior change I think by returning lastblock=findAncestorByHeight? 13:20 < ryanofsky> yes, there are two behavior changes in the function 13:20 < ryanofsky> the lastblock behavior change is more complicated to think about since it involves multiple listsinceblock calls 13:21 < nehan_> if the wallet is behind chain tip, depth will be smaller? 13:21 < ryanofsky> the transaction listing height/depth change is simpler, because it can be understood looking at a single listsinceblock call 13:22 < ryanofsky> nehan_: yes depth is smaller, so older transactions that weren't requested won't be printed 13:23 < ryanofsky> This is a corner case race condition. Goal of PR is not to change behavior, but to clean up the API and fix some of these little races 13:23 < rockzombie2> Sorry, I'm still very new to this review process, but I'm hoping I can ramp up quickly by attending these reviews because I want to be helpful. As for testing your behavior changes, what are the cases I should try to test (and how exactly can I cover that test case in a multi-threaded environment?) 13:24 < nehan_> ryanofsky: what do you mean by "older transactions that weren't requested won't be printed"? 13:25 < nehan_> you mean in the new behavior, it will print less? 13:25 < nehan_> but i'm not sure how those "weren't requested" 13:25 < ryanofsky> rockzombie2: great question, anybody have any suggestions for testing? i will say bitcoin core doesn't currently have great for testing infrastrucuture to force code to run in order and simulate races 13:25 < ryanofsky> but we are always adding new testing features 13:26 < ryanofsky> nehan_: listsinceblock is typically called with a specific block hash, so you only want to return transactions after that block 13:26 < luke-jr> or if that block isn't in the main chain, transactions reorg'd out 13:27 < nehan_> ryanofsky: yes. and in the new behavior, in the case where the wallet is behind chain, it will print fewer transactions. but it's not clear to me why those "weren't requested" 13:27 < ryanofsky> nehan_: yes I actually left something out 13:28 < ryanofsky> in order for there to be a race condition, the "auto locked_chain = pwallet->chain().lock();" line above has to be removed, which is removed in a later PR 13:28 < ryanofsky> the problem is in the comparison tx.GetDepthInMainChain() < depth 13:29 < ryanofsky> you don't want to compute the depth on the left side and the depth on the right side with two different tips 13:30 < nehan_> ryanofsky: ah ok, thanks. but isn't this fixed in the new code because both of things will be under cs_wallet? 13:30 < nehan_> ryanofsky: are you saying there *was* a race condition? 13:31 < ryanofsky> i should check on what the later PRs are doing, but it is saying there would be a race condition 13:32 < ryanofsky> this might be a bad example, because it is talking about a theoretical race condition 13:32 < ryanofsky> but hopefully it at least suggests what the problem is, because this example is simpler than other cases like the one ariard was talking about that involves separate RPC calls 13:33 < jonatack> rockzombie2: WRT what cases to try to test, one thing you can try is look at the cases ryanofsky describes in the commit message, then look at the relevant unit and functional tests to see if the case is tested, and how (and if not, try adding it) 13:34 < ariard_> you shouldn't have a race condition even after chain lock removal because now you're relying on same tip for both computation 13:34 < ariard_> like finding the common ancestor and the comparison tx.GetDepthInMainChain() < depth 13:35 < ryanofsky> ariard_: yes you'd have to remove the wallet lock too, which i can't rememeber if we do 13:35 < nehan_> ariard_: this was my interpretation as well 13:35 < jonatack> for instance, in commit e276b682, look at the tests that are added in that commit, and at the those in test/functional/wallet_listsinceblock.py 13:36 < ariard_> ryanofsky: both are under cs_wallet lock so you can't get a block connected/disconnected in between 13:36 < rockzombie2> Thanks jonatack, I'll look into that 13:37 < ryanofsky> yes, this race condition will only happen we stop holding cs_wallet throughout the call 13:37 < jonatack> rockzombie2: fwiw, wallet_listsinceblock.py has some *really* interesting tests imo 13:37 < nehan_> how important is it to preserve all the little behavioral quirks in the wallet as you make these changes to pull out cs_main? do you know who relies on listsinceblock? 13:38 < ryanofsky> probably not too important, there was an interesting issue opened about listsinceblock and how it's intended to be used, trying to find it 13:39 < ryanofsky> the changes within listsinceblock are just meant to be minimal, if it would be simpler to change behavior in some way, that's be reasonable to do 13:40 < jonatack> fwiw i tried to add test coverage for block heights to listsinceblock.py https://github.com/bitcoin/bitcoin/pull/17535 13:40 < jonatack> but the PR went nowhere. it seems difficult nowadays to add functional test coverage. 13:40 < ryanofsky> i think this was the interesting listsinceblock issue I remembered: https://github.com/bitcoin/bitcoin/pull/9622 13:42 < ryanofsky> maybe someone wants to reason through the next question about listsinceblock: In the same commit, the call to locked_chain->getBlockHash(last_height) is replaced with pwallet->GetLastBlockHash(). How does this change result in a more accurate lastblock value that won’t result in missing transactions when passed to a subsequent listsinceblock call? 13:43 < ariard_> rockzombie2: you should test this on the cli, by doing reorg and calling RPC in a while loop, at least I remember doing this while working on some refactoring 13:45 < ryanofsky> i guess this question is also assuming that cs_main isn't locked throughout the call as is the case now 13:46 < ariard_> if wallet tip has been reorg out that avoid hitting the assert 13:46 < ryanofsky> or actually, no even with cs_main held, the node tip could be ahead of the wallet last block processed at the moment lock was acquired 13:48 < ariard_> but int that case you would find the last_height inside ChainActive so not a issue? 13:48 < ryanofsky> ariard yes, that is one case, but in the case of a reorg locked_chain->getBlockHash(last_height) could return a completely unrelated block 13:50 < ryanofsky> ariard_, yes i think you're right, there can only be missing transactions if there's a reorg. the other case would result in hitting an assert 13:51 < ryanofsky> IMO, code is easier to reason about if you only have to think about the wallet last block processed, and return information relative to that 13:51 < ryanofsky> not caring about the node tip or active chain 13:51 < ariard_> yes I agree on this, caring about both node tip and wallet tip is headache guarantee 13:53 < ryanofsky> ~10 minutes left, any questions about wallet/node in general, or maybe other parts of the PR? 13:55 < jonatack> ryanofsky: when you wrote "In the same commit, the call to locked_chain->getBlockHash(last_height) is replaced with pwallet->GetLastBlockHash()"... 13:55 < rockzombie2> I have a general question about the github Bitcoin group. Who exactly are all of those contributors? Like, how did they qualify and become the main body that accepts PRs to bitcoin core? 13:56 < jonatack> are you referring to the code in UniValue listsinceblock in rpcwallet.cpp, or? 13:57 < ryanofsky> jonatack: just the commit with that title https://github.com/bitcoin/bitcoin/pull/17954/commits/e276b6821430ec2c18aba55137daf98bae770054 13:57 < luke-jr> rockzombie2: in theory, it's just a matter of anyone doing the reviews needed 13:57 < ryanofsky> line 1641: ... pwallet->chain().findAncestorByHeight(pwallet->GetLastBlockHash() ... 13:58 < ryanofsky> rockzombie2: yes, there's no qualification process, decisions to merge are based on merits of the individual PR 13:59 < rockzombie2> I see 13:59 < jonatack> ryanofsky: ok, i thought you were describing a diff, and i wasn't finding it 13:59 < jonatack> rockzombie2: see the Peer Review section of CONTRIBUTING.md 14:00 < ryanofsky> jonattack: ok hopefully clearer now, but let me know if it's not. there is one new line replacing two deleted lines there, and I was asking about that change 14:01 < ryanofsky> Looks like time is up, thanks everybody for participating! 14:01 < ryanofsky> #endmeeting 14:01 < nehan_> thanks 14:01 < rockzombie2> Thank you! 14:01 < jonatack> ryanofsky: there are two instances in that commit of one line replacing two others :p 14:01 < ryanofsky> Thanks for finding mistakes in my questions, too. Those were definitely on purpose 14:02 < jonatack> i suppose you mean the first instance 14:03 < emzy> thanks ryanofsky 14:03 < jonatack> thanks for hosting ryanofsky, will try to finish review of that PR 14:03 < jnewbery> Thanks for hosting, ryanofksy! I'll publish the meeting log to the website soon 14:03 < ryanofsky> jonattack: just the place where "locked_chain->getBlockHash(last_height)..." is replaced 14:04 < jonatack> ryanofsky: thanks, was convinced we were discussing the other one :) 14:05 < jonatack> somehow misread and misunderstood, all good 14:06 < ryanofsky> Ah well, soon enough github will have an integrated IRC and we will avoid these problem 14:06 < jonatack> more reliance on github? goodness, i certainly hope not 14:07 < frenchNoob> Thanks