Bitcoin Core PR Review Club

Remove calls to Chain::Lock methods (wallet)

Feb 19, 2020

https://github.com/bitcoin/bitcoin/pull/17954

Host: ryanofsky - PR author: ryanofsky

The PR branch HEAD was 1125fd92b at the time of this review club meeting.

Notes

This PR is one change in a sequence of PRs making the bitcoin wallet code more asynchronous and independent of node code, so the node can be more robust and performant, and not get locked up while wallet code is executing. The changes are also supposed to be architecture improvements, making the wallet/node interface more high-level and less tied to implementation details of the node and wallet. They also get wallet code closer to being able to work offline or in a different process without constantly having to query the node for information. Notable PRs making wallet code more asynchronous are:

#10286 by TheBlueMatt – changing the wallet to process BlockConnected, BlockDisconnected, and other notifications from the node asynchronously, adding CWallet::m_last_block_processed field
#15931 by ariard – changing the wallet to cache block heights, adding CWallet::m_last_block_processed_height and CWalletTx::m_confirm.block_height fields, and using cached heights to avoid needing to call Chain::Lock::getBlockDepth() to determine transaction status.
#17443 by ariard, pending review – using cached block heights to avoid needing to call Chain::checkFinalTx to determine transaction status
#17954 by ryanofsky, this pr – using cached block heights and CWallet::m_last_block_processed in various wallet functions to avoid the need to call Chain::Lock, and to make the wallet return internally-consistent information when m_last_block_processed lags behind the current node tip
#16426 by ariard, pending review – removing the interfaces::Chain::Lock class altogether to remove a way wallets could lock up the node unintentionally or longer than needed

All interactions between the wallet and node code happen though the interfaces::Chain and interfaces::Chain::Lock interfaces. Generally the less the wallet needs to call these interface methods, the more efficient both the node and the wallet can be. Also the simpler wallet code can be because it can avoid the headache of having to reconcile its cached state with latest information returned from the node, in cases where CWallet::m_last_block_processed has fallen behind the current node tip.

Questions

Did you review the PRs? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)
What happens in the interfaces::Chain::Lock chain constructor and destructor? What is the effect of writing auto locked_chain = pwallet->chain().lock();?
What’s the difference between calling interfaces::Chain::getHeight() and CWallet::GetLastBlockHeight()? When will the values be the same, and when will they be different?
In commit “wallet: Avoid use of Chain::Lock in listsinceblock” the call to locked_chain->getHeight() is replaced with pwallet->GetLastBlockHeight(). If listsinceblock is called with a non-null block hash, how does this change prevent transactions that are older than the specified block from being mistakenly printed?
Trickier: In the same commit, the call to locked_chain->getBlockHash(last_height) is replaced with pwallet->GetLastBlockHash(). How does this change result in a more accurate lastblock value that won’t result in missing transactions when passed to a subsequent listsinceblock call?
In the same commit, what is the coinbaseKey.MakeNewKey(true) line doing, and why does the test fail if it is removed?
In commit “wallet: Avoid use of Chain::Lock in importmulti”, what was the removed fRescan = false statement intended to do? Why is it safe to drop?

Meeting Log

12:59

<luke-jr> hi

13:00

<emilengler> hi

13:00

<SirRichard> hi

13:00

<jnewbery> hi

13:00

<emzy> hi

13:00

<pbleam> hi

13:00

<nehan_> hi

13:00

<ariard_> hi

13:00

<ajonas> hi

13:00

<ryanofsky> #startmeeting

13:00

<ryanofsky> hi

13:01

<rockzombie2> hi

13:01

<ryanofsky> Welcome to the review meeting!

13:01

<felixweis> Hi

13:01

<ryanofsky> Link to notes and PR is https://bitcoincore.reviews/17954.html

13:01

<ryanofsky> Everyone can say hi (who didn't already) to let people know you're here

13:02

<ryanofsky> Reminder that everyone is here to learn, and all types of questions are encouraged. Anything that seems confusing is probably confusing to other people, and good to spend time figuring out together

13:02

<emilengler> Yeah then I have a question!

13:02

<ryanofsky> I have some prompt questions at the link above, but it is fine not to get through them if there are other things people want to talk about

13:02

<ryanofsky> No need to ask to ask, just ask!

13:02

<jonatack> hi

13:03

<andrewtoth> hi

13:03

<emilengler> I'm still a bit confused and not really that much into thread safe programming therefore reviewing this PR is hard

13:03

<emilengler> Can someone give a brief introduction to this?

13:03

<ryanofsky> Anyone want to jump in with a short summary?

13:05

<ryanofsky> I can say that the motivation for this is to be able to get rid of the Chain::Lock interface. Because that interface is low level and lets wallets slow down and potentially lock the node

13:06

<andrewtoth> It seems to be taking a piece of #16426 which we reviewed here previously, and splitting it out into its own PR no?

13:06

<luke-jr> I guess the important thing for reviewers to check, is that the code doesn't touch anything it actually needs the lock for

13:06

<rockzombie2> My limited understanding (not specifically related to bitcoin): Sharing resources among threads can cause confilcts in state. One solution is to "lock" a resource. There are many ways to be "thread-safe". Some more topics for further research that might help in understanding: Dining Philosophers problem, Spin-locks, Software Transactional Memory.

13:06

<ryanofsky> absolutely yes, yes, and yes

13:07

<luke-jr> I suggest looking at each commit individually

13:07

<luke-jr> (there's 11)

13:07

<nehan_> is it an invariant that wallet tip is always behind or at chain tip?

13:07

<ariard_> nehan_: no it can be also forked of chain tip

13:08

<luke-jr> or even ahead, if reindexing or such

13:08

<ariard_> that where you may have trouble and the initial reason of Russ working on #17954

13:08

<ryanofsky> yes, this gets to the question: What’s the difference between calling interfaces::Chain::getHeight() and CWallet::GetLastBlockHeight()? When will the values be the same, and when will they be different?

13:09

<ryanofsky> the next question looks into a specific example: In commit “wallet: Avoid use of Chain::Lock in listsinceblock” the call to locked_chain->getHeight() is replaced with pwallet->GetLastBlockHeight(). If listsinceblock is called with a non-null block hash, how does this change prevent transactions that are older than the specified block from being mistakenly printed?

13:09

<ryanofsky> commit: https://github.com/bitcoin-core-review-club/bitcoin/commit/e276b68

13:11

<ryanofsky> the changes to listsinceblock are small, but the existing code is dense, people may have questions just about that

13:14

<ryanofsky> the key to which transactions are printed is the abs(tx.GetDepthInMainChain()) < depth check on line 1616

13:16

<ryanofsky> the depth value is computed differently now with wallet tip: pwallet->GetLastBlockHeight(), instead of node tip: locked_chain->getHeight()

13:18

<ryanofsky> if a new block is connected, and the wallet didn't catch up yet. the height variable will be bigger or smaller? and the depth value?

13:19

<ariard_> ryanofsky: there is also a behavior change I think by returning lastblock=findAncestorByHeight?

13:20

<ryanofsky> yes, there are two behavior changes in the function

13:20

<ryanofsky> the lastblock behavior change is more complicated to think about since it involves multiple listsinceblock calls

13:21

<nehan_> if the wallet is behind chain tip, depth will be smaller?

13:21

<ryanofsky> the transaction listing height/depth change is simpler, because it can be understood looking at a single listsinceblock call

13:22

<ryanofsky> nehan_: yes depth is smaller, so older transactions that weren't requested won't be printed

13:23

<ryanofsky> This is a corner case race condition. Goal of PR is not to change behavior, but to clean up the API and fix some of these little races

13:23

<rockzombie2> Sorry, I'm still very new to this review process, but I'm hoping I can ramp up quickly by attending these reviews because I want to be helpful. As for testing your behavior changes, what are the cases I should try to test (and how exactly can I cover that test case in a multi-threaded environment?)

13:24

<nehan_> ryanofsky: what do you mean by "older transactions that weren't requested won't be printed"?

13:25

<nehan_> you mean in the new behavior, it will print less?

13:25

<nehan_> but i'm not sure how those "weren't requested"

13:25

<ryanofsky> rockzombie2: great question, anybody have any suggestions for testing? i will say bitcoin core doesn't currently have great for testing infrastrucuture to force code to run in order and simulate races

13:25

<ryanofsky> but we are always adding new testing features

13:26

<ryanofsky> nehan_: listsinceblock is typically called with a specific block hash, so you only want to return transactions after that block

13:26

<luke-jr> or if that block isn't in the main chain, transactions reorg'd out

13:27

<nehan_> ryanofsky: yes. and in the new behavior, in the case where the wallet is behind chain, it will print fewer transactions. but it's not clear to me why those "weren't requested"

13:27

<ryanofsky> nehan_: yes I actually left something out

13:28

<ryanofsky> in order for there to be a race condition, the "auto locked_chain = pwallet->chain().lock();" line above has to be removed, which is removed in a later PR

13:28

<ryanofsky> the problem is in the comparison tx.GetDepthInMainChain() < depth

13:29

<ryanofsky> you don't want to compute the depth on the left side and the depth on the right side with two different tips

13:30

<nehan_> ryanofsky: ah ok, thanks. but isn't this fixed in the new code because both of things will be under cs_wallet?

13:30

<nehan_> ryanofsky: are you saying there *was* a race condition?

13:31

<ryanofsky> i should check on what the later PRs are doing, but it is saying there would be a race condition

13:32

<ryanofsky> this might be a bad example, because it is talking about a theoretical race condition

13:32

<ryanofsky> but hopefully it at least suggests what the problem is, because this example is simpler than other cases like the one ariard was talking about that involves separate RPC calls

13:33

<jonatack> rockzombie2: WRT what cases to try to test, one thing you can try is look at the cases ryanofsky describes in the commit message, then look at the relevant unit and functional tests to see if the case is tested, and how (and if not, try adding it)

13:34

<ariard_> you shouldn't have a race condition even after chain lock removal because now you're relying on same tip for both computation

13:34

<ariard_> like finding the common ancestor and the comparison tx.GetDepthInMainChain() < depth

13:35

<ryanofsky> ariard_: yes you'd have to remove the wallet lock too, which i can't rememeber if we do

13:35

<nehan_> ariard_: this was my interpretation as well

13:35

<jonatack> for instance, in commit e276b682, look at the tests that are added in that commit, and at the those in test/functional/wallet_listsinceblock.py

13:36

<ariard_> ryanofsky: both are under cs_wallet lock so you can't get a block connected/disconnected in between

13:36

<rockzombie2> Thanks jonatack, I'll look into that

13:37

<ryanofsky> yes, this race condition will only happen we stop holding cs_wallet throughout the call

13:37

<jonatack> rockzombie2: fwiw, wallet_listsinceblock.py has some *really* interesting tests imo

13:37

<nehan_> how important is it to preserve all the little behavioral quirks in the wallet as you make these changes to pull out cs_main? do you know who relies on listsinceblock?

13:38

<ryanofsky> probably not too important, there was an interesting issue opened about listsinceblock and how it's intended to be used, trying to find it

13:39

<ryanofsky> the changes within listsinceblock are just meant to be minimal, if it would be simpler to change behavior in some way, that's be reasonable to do

13:40

<jonatack> fwiw i tried to add test coverage for block heights to listsinceblock.py https://github.com/bitcoin/bitcoin/pull/17535

13:40

<jonatack> but the PR went nowhere. it seems difficult nowadays to add functional test coverage.

13:40

<ryanofsky> i think this was the interesting listsinceblock issue I remembered: https://github.com/bitcoin/bitcoin/pull/9622

13:42

<ryanofsky> maybe someone wants to reason through the next question about listsinceblock: In the same commit, the call to locked_chain->getBlockHash(last_height) is replaced with pwallet->GetLastBlockHash(). How does this change result in a more accurate lastblock value that won’t result in missing transactions when passed to a subsequent listsinceblock call?

13:43

<ariard_> rockzombie2: you should test this on the cli, by doing reorg and calling RPC in a while loop, at least I remember doing this while working on some refactoring

13:45

<ryanofsky> i guess this question is also assuming that cs_main isn't locked throughout the call as is the case now

13:46

<ariard_> if wallet tip has been reorg out that avoid hitting the assert

13:46

<ryanofsky> or actually, no even with cs_main held, the node tip could be ahead of the wallet last block processed at the moment lock was acquired

13:48

<ariard_> but int that case you would find the last_height inside ChainActive so not a issue?

13:48

<ryanofsky> ariard yes, that is one case, but in the case of a reorg locked_chain->getBlockHash(last_height) could return a completely unrelated block

13:50

<ryanofsky> ariard_, yes i think you're right, there can only be missing transactions if there's a reorg. the other case would result in hitting an assert

13:51

<ryanofsky> IMO, code is easier to reason about if you only have to think about the wallet last block processed, and return information relative to that

13:51

<ryanofsky> not caring about the node tip or active chain

13:51

<ariard_> yes I agree on this, caring about both node tip and wallet tip is headache guarantee

13:53

<ryanofsky> ~10 minutes left, any questions about wallet/node in general, or maybe other parts of the PR?

13:55

<jonatack> ryanofsky: when you wrote "In the same commit, the call to locked_chain->getBlockHash(last_height) is replaced with pwallet->GetLastBlockHash()"...

13:55

<rockzombie2> I have a general question about the github Bitcoin group. Who exactly are all of those contributors? Like, how did they qualify and become the main body that accepts PRs to bitcoin core?

13:56

<jonatack> are you referring to the code in UniValue listsinceblock in rpcwallet.cpp, or?

13:57

<ryanofsky> jonatack: just the commit with that title https://github.com/bitcoin-core-review-club/bitcoin/commit/e276b68

13:57

<luke-jr> rockzombie2: in theory, it's just a matter of anyone doing the reviews needed

13:57

<ryanofsky> line 1641: ... pwallet->chain().findAncestorByHeight(pwallet->GetLastBlockHash() ...

13:58

<ryanofsky> rockzombie2: yes, there's no qualification process, decisions to merge are based on merits of the individual PR

13:59

<rockzombie2> I see

13:59

<jonatack> ryanofsky: ok, i thought you were describing a diff, and i wasn't finding it

13:59

<jonatack> rockzombie2: see the Peer Review section of CONTRIBUTING.md

14:00

<ryanofsky> jonattack: ok hopefully clearer now, but let me know if it's not. there is one new line replacing two deleted lines there, and I was asking about that change

14:01

<ryanofsky> Looks like time is up, thanks everybody for participating!

14:01

<ryanofsky> #endmeeting

14:01

<nehan_> thanks

14:01

<rockzombie2> Thank you!

14:01

<jonatack> ryanofsky: there are two instances in that commit of one line replacing two others :p

14:01

<ryanofsky> Thanks for finding mistakes in my questions, too. Those were definitely on purpose

14:02

<jonatack> i suppose you mean the first instance

14:03

<emzy> thanks ryanofsky

14:03

<jonatack> thanks for hosting ryanofsky, will try to finish review of that PR

14:03

<jnewbery> Thanks for hosting, ryanofksy! I'll publish the meeting log to the website soon

14:03

<ryanofsky> jonattack: just the place where "locked_chain->getBlockHash(last_height)..." is replaced

14:04

<jonatack> ryanofsky: thanks, was convinced we were discussing the other one :)

14:05

<jonatack> somehow misread and misunderstood, all good

14:06

<ryanofsky> Ah well, soon enough github will have an integrated IRC and we will avoid these problem

14:06

<jonatack> more reliance on github? goodness, i certainly hope not

14:07

<frenchNoob> Thanks