Reverse cs_main, cs_wallet lock order and reduce cs_main locking (
wallet) Dec 4, 2019
The wallet is an integral part of the bitcoind full node for historic
reasons. See the
where the wallet lock
cs_wallet is taken after
The wallet was split from the node in
The new interface
interfaces::Chain lets the wallet access chain and
mempool state for its own operations. Interface methods allow the wallet to
do things like access the block height to know a transaction’s confirmation
depth or estimate the feerate.
interfaces::Chain methods impose synchronisation between
wallet and node because
Chain::Lock needs to be taken by the wallet
before its own lock.
Switching the lock order from
cs_main would avoid needlessly stalling the node when there is no need to
access chain data.
PR 15931 removed
GetDepthInMainChain’s requirement to lock the chain and simplified the
reasoning on locks and events ordering.
A long-term goal is to separate major components of the Bitcoin Core node (the
wallet, the P2P layer, the GUI) to let them evolve independently and only
communicate asynchronously through well-defined interfaces: See
ryanofsky’s github issue or
eklitzke’s blog post for
To learn more about lock management in Bitcoin Core:
Did you review the PR?
Concept ACK, approach ACK, ACK, or
(Don’t forget to put your PR review on GitHub.)
Did you take any steps beyond reading the code? Did you run the tests on
your local machine with
While reading the code, which commit(s) attracted your attention? What
should be carefully examined? What could go wrong?
What could happen if a deadlock was introduced in the code?
Does this PR change the wallet behavior?
Why does the lock order need to be inverted all at once?
Do you expect this PR to have any impact on performance?
Can you think of any other way to remove the
cs_main lock from the wallet
What further changes could be build on top of this PR?
1 13:00 <ariard> #startmeeting
2 13:00 <emilengler> hello
10 13:01 <ariard> who had a chance to review the PR ?
12 13:01 <emilengler> negative
13 13:01 <ariard> jonatack: cool what was your review process?
14 13:02 <fjahr> started but not finished
15 13:02 <ariard> others did you read PR top message ? what the main problem this PR is trying to tackle ?
16 13:03 <jonatack> ariard: pulled the PR branch, opened diff locally with gitk, built with --enable-debug, ran tests, ran bitcoind, looked for warnings, read the commits one by on gitk, git grepped varois sites
18 13:03 <_andrewtoth_> it's removing a requirement of locking cs_main before taking cs_wallet lock
19 13:04 <ariard> _andrewtoth: correct, does locking cs_main still occurs afterwards?
20 13:04 <_andrewtoth_> there's a lot of backstory to this PR, as several of your refactor PRs are linked
21 13:04 <ariard> jonatack: "git grepped varois sites"?
22 13:04 <jonatack> ariard: after jkczyz commented on the build warning, i then wanted to reproduce, so switched from gcc to clang, which showed the warning for me (reassuring)
23 13:04 <_andrewtoth_> cs_main still needs to be taken yes
24 13:04 <jonatack> ariard: note to self: build with clang more often
25 13:05 <ariard> jonatack: yeah having a Mac env is great to enable all clang options
26 13:05 <ariard> jonatack: that's a bit cumbersome when you're working with locks on core, we have different lock safety mechanisms following the platforms
27 13:05 <jonatack> ariard: then after you rebased, i verified the warning was gone by re-pulling your latest rebase and re-building with clang
28 13:06 <ariard> _andrewtoth_: right where is cs_main taken now ?
29 13:06 <ariard> jonatack: you run built with --enable-debug what the flag is doing ?
30 13:06 <jonatack> ariard: it seems to me that passing `--enable-debug` with clang on debian works too... it wanted to check that!
31 13:06 <jonatack> s/it/i/
32 13:07 <_andrewtoth_> ariard: inside the Chain interface, as opposed to wallet code, right?
33 13:07 <jonatack> ariard: git grepped various call sites to see that you covered them all
34 13:08 <ariard> _andrewtoth_: correct did you dig a bit on Chain interface ?
35 13:08 <ariard> and why it's a performance improvement to not hold the locks in the wallet code ?
36 13:08 <_andrewtoth_> hmm no that part is not clear to me yet
37 13:08 <_andrewtoth_> if lock is taken inside chain anyways, why is it a benefit?
39 13:08 <ariard> jonatack: --enable-debug is a different check than clang, it's going to enable DEBUG_LOCKORDER
40 13:08 <jonatack> ariard: running with --enable-debug adds the -DDEBUG -DDEBUG_LOCKORDER flags afaik
41 13:09 <ariard> jonatack: exactly
43 13:09 <Talkless> jonatack: do you ever use --enable-werror ?
44 13:09 <Talkless> or you just save build output & grep warnings..?
45 13:09 <jonatack> Talkless: no, i need to try that. the latter one, correct :)
47 13:10 <ariard> jonatack: did you dig into src/sync.cpp to undersand effet of DEBUG_LOCKORDER?
49 13:11 <ariard> _andrewtoth_: it's a benefit because right now we hold the cs_main at beginning of every RPC call
50 13:11 <jonatack> since i reviewed the PR with the uninitialized bug that caused the patch release 0.19.0.1 and would like to catch those in the future
51 13:11 <ariard> _andrewtoth_ : and it's hold until the RPC finished
52 13:12 <ariard> with this patchset: cs_main is just hold to fetch chain data and no more
53 13:12 <_andrewtoth_> i see thanks
54 13:12 <ariard> we lock cs_main just-in-time if you prefer
55 13:13 <ariard> jonatack: oh that's an interesting list thanks
56 13:14 <ariard> jonatack: yes I think generally we have a good deal of improvements of using sanitizers/automated tests/fuzzing
57 13:14 <jonatack> ariard: sync.cpp::L31-215 took a cursory look at the effects of DEBUG_LOCKORDER, will spend more time in that file
58 13:14 <jonatack> ariard: yes!
59 13:14 <ariard> While where reading the code which commit(s) attracted your attention ?
60 13:15 <_andrewtoth_> there are comments mentioning this could potentially improve IBD. How would it? Wouldn't it only affect performance if wallet rpcs are called during IBD?
61 13:15 <ariard> What should be carefully examined ? What could go wrong ?
62 13:16 <ariard> _andrewtoth_ : have a look on validationinterface, right now wallet lock cs_main in BlockConnected/BlockDisconnected
63 13:16 <jonatack> ariard: the last commit that removes the locks seems potentially the most dangerous
64 13:16 <ariard> I mean wallet is an indirect consumer of validationinterface through the Chain interface
65 13:17 <ariard> jonatack: yes DEBUG_LOCKORDER enable a process struct to pair lock order and check at any new tacking if it doesn't invert
66 13:18 <ariard> and process struct have pointers back to a per-thread struct which trace lock position in file
67 13:18 <ariard> jonatack: yes last commit is the most painful one to review, forgetting a lock somewhere could cause deadlock
68 13:21 <jonatack> ariard: did you experience any deadlocks while working on this PR?
69 13:21 <ariard> okay next question : Did this PR changes the wallet behavior?
70 13:21 <jonatack> (if yes, how did they show themselves?)
71 13:22 <ariard> jonatack: ahah ofc but it was in the wallet test framework because we don't take locks through the Chain interface there
72 13:22 <ariard> but the wild way of LOCK(cs_main)
73 13:23 <ariard> jonatack: they showed up thanks to --enable-debug
74 13:23 <_andrewtoth_> ariard: I think it only changes it if cs_main is locked by something else, then it has to wait. Otherwise, no it does not
75 13:23 <_andrewtoth_> *doesn't have to wait
76 13:24 <_andrewtoth_> up until it needs to get something from the chain
77 13:25 <jonatack> ariard: What gave you the initial idea to begin working on this series of PRs? Something you read, that someone said, or the idea came to you on its own?
78 13:25 <ariard> _andrewtoth_: yes that should change thread concurrency which should be better, I was more thinking about wallet behavior if the sense of different return values or breaking some internal features
79 13:26 <_andrewtoth_> then the answer is no
80 13:27 <jonatack> Given that so few tests needed to be changed, I surmised no wallet behavior change other than perhaps performance. Did you bench it?
81 13:27 <ariard> jonatack: I wish a more modular core project with clean, well-functional interfaces, and this serie of PR was a good beginning
82 13:28 <jonatack> ariard: "a good beginning"... hehe, yes, i'd say so! +1
83 13:28 <ariard> _andrewtoth_: well actually it may have change if wallet do something like A=getheight(), B=getheight(), if A == B do something, if A != B do something else
84 13:29 <_andrewtoth_> ahh yes
85 13:29 <ariard> _andrewtoth_: I've verified all the callsites which are making assumptions on a particular tip and don't find anything wrong
86 13:29 <ariard> most of them are related to the rescan code, which currently don't lock the cs_main and verify every previous value returned from the Chain interface
87 13:30 <jonatack> ariard: did you use gdb to verify, or?
88 13:30 <ariard> you should grep all the "getBlockHeight" "getBlockTime"
89 13:30 <jonatack> or just grepping
90 13:31 <ariard> and try to think about what have changed in the sense of wallet view of the blockchain has changed or not
91 13:31 <ariard> jonatack: yeah just grep "get*" in wallet code
92 13:31 <ariard> jonatack: I don't see whagt gdb would discover there
93 13:32 <ariard> next question : Why does the lock order need to be inverted all at once?
94 13:32 <ariard> (easy one)
95 13:33 <jkczyz> otherwise there could be a deadlock
96 13:33 <jonatack> ariard: to check values, as you mentioned verifying values returned. It's interesting to see how others debug bitcoin core. For example, i've learned a lot from watching achow101's live coding sessions on twitch tv
97 13:34 <ariard> jkczyz: correct
98 13:34 <jonatack> as soon as he gets stuck, he often jumps right into gdb.
99 13:34 <ariard> jonatack: what were your biggest learnings ?
100 13:35 <ariard> jonatack: honestly for debugging it's more adding more printers and take pen and papers and think what I got wrong
101 13:35 <ariard> like drawing control flow, see where there is write/read on the modified data structures
102 13:36 <ariard> depend what kind of bugs you're tracking but IMO gdb is overkill for a project like bitcoin core
103 13:36 <jonatack> ariard: from watching achow101 (so far): compiler-driven debugging, gdb use, inspecting wallet files using Berkeley DB special commands (located in db4/bin), using flame graphs.
104 13:36 <ariard> jonatack: yes we should all keep an eye on flame graph when we modify part of the code which touch performance
105 13:37 <jonatack> ariard: yes, i tend to begin with prints and asserts but have not found myself deep in the weeds yet in my PRs since they have been simple ones
106 13:37 <jonatack> so more for reviewing others' PRs
107 13:37 <ariard> but I think how you're debugging is also super related to what part of codebase you're working on, P2P != wallet
108 13:38 <ariard> jonatack: when reviewing others' PRs I like to go through modified changes and tweak them to see if there is a better way of doing it
109 13:38 <_andrewtoth_> jonatack: what is compiler-driven debugging?
110 13:38 <ariard> that force you to understand the code
111 13:38 <jonatack> ariard: yes, i noticed that you have always been assidious and good wrt drawing control flow
112 13:39 <jonatack> assiduous*
113 13:39 <jonatack> tweaking others' code is a great way to go +1
114 13:40 <jonatack> compiler-driven debugging: relying on compiler errors to see what's not working and fix it
115 13:40 <ariard> next question: Do you expect this PR to have any impact on performace? But we already talk a bit about this, so let's readjust the question how could we make wallet code better to service concurrently multiple RPC calls ?
116 13:41 <jkczyz> moving the code to a more modular state will also ease unit testing
117 13:42 <jkczyz> ariard: perhaps using a per-wallet lock would help
118 13:42 <ariard> jkczyz: ofc I think we were considering with Marco to disable block writing just to fuzz p2p/validation stack without burning your disks
119 13:42 <jonatack> yes. could enable shifting more testing from functional to unit (faster), or better coverage
120 13:43 <ariard> jkczyz: it's already the case
121 13:43 <jkczyz> ah, didn't realize that
122 13:44 <jonatack> and modularity could help improve fuzzing and PBT testing as well
123 13:44 <ariard> jkczyz: I think the way of thinking what are the main data structures of the wallet, like coins, keys, address, what can be accessed concurrently and for what we need a synchronized view
124 13:44 <ariard> like you don't want both RPC building a transaction spending the same coin
125 13:45 <ariard> and you don't want to return same address twice
126 13:45 <ariard> but all read rpc like displaying coins received could be concurrent
127 13:45 <ariard> jonatack: PBT testing ?
128 13:46 <jonatack> ariard: property-based testing
129 13:46 <ariard> a bit off-topic but having more fine-grained lock in wallet is an interesting topic to dig in
130 13:46 <ariard> if people are looking for cool PRs to work on (I don't plan to do this)
131 13:47 <ariard> next question: Can you think of any other way to remove the cs_main lock from the wallet code?
132 13:47 <jkczyz> I see what you are saying. I was thinking of different wallets. For the same wallet, should concurrent RPCs that "write" be disallowed then? Or at least while holding a higer-level lock?
133 13:47 <jonatack> (chris stewart has a large PR to add a bunch of PBT that needs review and we ought to add PBT coverage for instance to the BIP157-158 changes, there may be low-hanging fruit there)
134 13:48 <jkczyz> (or rather should require holding a higher-level lock)
135 13:49 <ariard> jkczyz: "should concurrent RPcs that "write" be disallowed then" I think you should have a cs_coins lock to hold to avoid concurrent write
136 13:49 <ariard> so yes should require holding a higher-level lock but not necessary the actual cs_wallet one
137 13:49 <ariard> jonatack: what the PR already, would try to review it
139 13:51 <jonatack> "Add more property based tests for basic bitcoin data structures"
140 13:51 <jonatack> it's on my review list too
141 13:52 <jonatack> ariard: "having more fine-grained lock in wallet"... interesting
142 13:53 <ariard> jonatack: yes you need #16426, because without that would be useless
143 13:53 <jonatack> ariard: Do you envision other building other changes on this PR in addition to those you describe in your PR body?
144 13:53 <ariard> next question: What further changes could be build on top of this PR?
145 13:54 <jonatack> snap! :D
146 13:55 <ariard> jonatack: okay I have some changes building on top of it on the rescan logic
147 13:55 <ariard> well it's not exactly on top of it but it's going to be easier to review without having to care about cs_main
148 13:56 <ariard> and more generally drying-up the Chain interface until it makes sense to have other types of client using it
149 13:56 <ariard> like lightning daemons ;)
150 13:57 <jonatack> afaict #16426 should get in soon if no bugs are found during review, better to merge it not too late in this release cycle to have time to catch anything unexpected
151 13:57 <ariard> jonatack: it's really biased but yes that's a blocker to more broader improvements both on the wallet-side on interface-sides
152 13:57 <ariard> and many of them can be worked in parallel after it
153 13:59 <ariard> and the Chain interface itself maybe should be splitted in different ones, like at least one for fee estimation
154 14:00 <ariard> #endmeeting
155 14:00 <jonatack> ariard: interesting
156 14:01 <_andrewtoth_> ariard: thanks!
157 14:01 <ariard> okay thanks all for participation, will hang here a bit more if you have questions, and once #16426 merged I will open issues to keep track what can be do as follow-ups
158 14:01 <jonatack> ariard: about lightning, fwiw i'll ping you soon to talk about the anchor outputs RFC PR 688
159 14:01 <jonatack> thanks ariard and everyone!
160 14:02 <mattcruzz> ariard: Thanks for hosting, there's some interesting topics I need dig into now
161 14:02 <ariard> yes to everyone don't censor yourself for reviewing, everyone is doing mistake and you're not going to break anything by doing a review
162 14:02 <ariard> quite the contrary :)
164 14:04 <jonatack> "Process separation"
165 14:05 <jonatack> and very good point about not censoring yourself and reviewing!
166 14:06 <fanquake> ariard if you want open a follow ups issue in advance of that being merged, feel free. Better to capture all the issues etc now, rather than digging through comments post-merge.
167 14:06 <ariard> fanquake: one issue per-idea or big issues and listing everything ?
168 14:08 <fanquake> 1 issue per big idea/refactor is probably fine. If there's nits/small follows up, probably best to combine them all into a single issue.
169 14:09 <fanquake> Main point is just not to lose PR commentary/things that should be done as follow ups post merge.