Reduce cs_main scope, guard block index 'nFile' under a local mutex (refactoring, resource usage)

Mar 15, 2023

https://github.com/bitcoin/bitcoin/pull/27006

Host: stickies-v - PR author: furszy

The PR branch HEAD was acddd4204654812a0e741e04a758be0f362c5ccb at the time of this review club meeting.

Notes

Once a block is fully validated, it is saved to disk in one of the blk<nFile>.dat files in your datadir.
Blocks are received, validated and stored in an unpredictable order (and not sequentially based on block height), so we need to keep track of which file each block is stored in, in order to be able to access it quickly. This is tracked in CBlockIndex by its members nFile nDataPos and nUndoPos. In master, all of these members are guarded by the ::cs_main mutex. We have discussed how blocks are downloaded and stored in previous meetings #24858 and #25880.
::cs_main is a recursive mutex which is used to ensure that validation is carried out in an atomic way. Although in recent years a lot of effort has been made to reduce usage of ::cs_main, it is still heavily used across the codebase.
Having a single (global) mutex can allow for reduced code complexity and simplify reasoning about race conditions. However, it often also leads to (sometimes significant) performance issues when multiple threads are waiting for the same mutex even though they don’t need synchronization and are not accessing any of the same variables.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
SharedLock is added as a new mutex type to complement the UniqueLock we already have. Why does a UniqueLock not suffice here? How are the implementations of UniqueLock and SharedLock different?
Do you expect this PR to have any visible impact on performance? If so, for which process(es) (in a very general sense) and by how much (order of magnitude)? Were you able to verify/benchmark this in practice?
This PR changes CBlockIndex::nIndex to default to -1 instead of 0. How can/did you verify that this change is safe?
nFile, nDataPos and nUndoPos change from being guarded by ::cs_main to being guarded by g_cs_blockindex_data. Why is it that we lock exactly these 3 variables with g_cs_blockindex_data? What would be the concerns and benefits of using a different mutex for each of those 3 variables?
Are there any other ways to ensure the data integrity of nFile, nDataPos and nUndoPos?
With this PR, does the number of times that a mutex is acquired increase, stay constant, or decrease - or does it depend on the program flow?

Meeting Log

17:00

<stickies-v> #startmeeting

17:00

<LarryRuane_> hi

17:00

<DaveBeer> hi

17:00

<effexzi> Hello every1

17:00

<abubakarsadiq> hello

17:00

<stickies-v> welcome everyone! Today we're looking at #27006, authored by furszy. The notes and questions are available on https://bitcoincore.reviews/27006

17:00

<pakaro> hi

17:01

<stickies-v> anyone joining us for the first time today? even if you're just lurking, feel free to say hi!

17:02

<jbes> hello

17:02

<pablomartin> hi!... ill be just lurking, didnt have time to check the pr for today, sorry...

17:02

<stickies-v> no problem! always welcome to just lurk around

17:02

<stickies-v> who got the chance to review the PR or read the notes? (y/n)

17:03

<pakaro> y

17:03

<DaveBeer> y, read notes & questions, went through code once

17:03

<pakaro> tACK - compiled, ran bitcoind, unit tests, functional tests without bdb nor usdt-trace. All passed. I got a setup error when I tried to run sharedlock_tests on its own, i didn't spend much/anytime figuring that part out though.

17:03

<abubakarsadiq> I read the notes and briefly look at the code

17:04

<LarryRuane_> y for most part, but wow this is complex stuff with all the layers of code, derived classes, macros, etc! (sync.h mainly)

17:04

<LarryRuane_> (and also the use of templates)

17:04

<stickies-v> for those who looked at the PR: would you give it a Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

17:05

<stickies-v> pakaro: interesting, but the sharedlock_tests passed when running it with test_runner.py?

17:05

<LarryRuane_> definitely concept ACK, close to code ACK but I must study more

17:05

<pakaro> stickies-v yes, let me recheck that now

17:06

<LarryRuane_> stickies-v: would you say it's useful for reviewers to build using clang? (so the annotations are checked)?

17:06

<stickies-v> LarryRuane_: yes definitely quite a few layers of complexity. It looks very verbose at first but I suppose it does make using the mutexes quite a bit more straightforward

17:06

<LarryRuane_> yes the RAII aspect of the locking (that we do in practice) is very cool

17:07

<stickies-v> mmm good question. definitely never hurts to do that! but CI should also catch those problems (normally)

17:07

<stickies-v> alright let's start with building some understanding. a lot of abstract concepts here, so I think it won't hurt to dive a bit deeper

17:08

<DaveBeer> concept ACK, for code ACK I would be concerned with global variables usage, but I'm not familiar with bitcoin core codebase much as of yet and maybe global variables are reasonable here (I understand the PR doesn't change this approach and is as good as prior solution)

17:08

<stickies-v> SharedLock is added as a new mutex type to complement the UniqueLock we already have. Why does a UniqueLock not suffice here?

17:09

<DaveBeer> we want to allow multiple threads access in read only mode, while maintaining exclusive access for writers

17:09

<stickies-v> DaveBeer: if I understand the PR comments historically, the initial implementation actually didn't use a global mutex variable, but because there are so many CBlockIndex objects this was the new approach chosen: https://github.com/bitcoin/bitcoin/pull/27006#discussion_r1092197196

17:09

<LarryRuane_> for example one of the things i'm confused by is that `UniqueLock` is templated https://github.com/bitcoin/bitcoin/blob/8c4958bd4c06026dc108bc7f5f063d1f389d279b/src/sync.h#L151 but i don't see the angle brackets where it is used, why aren't those needed?

17:09

<LarryRuane_> (and of course the new `SharedLock` is the same)

17:10

<DaveBeer> stickies-v: thanks for the background, I'll have a look at it

17:11

<LarryRuane_> i.e. it's used here: https://github.com/bitcoin/bitcoin/blob/8c4958bd4c06026dc108bc7f5f063d1f389d279b/src/sync.h#L258 but there is no template argument given

17:11

<LarryRuane_> (sorry if i'm sidetracking too much, feel free to ignore!)

17:12

<pakaro> stickies-v nvm, problem was PEBKAC [problem exists between keyboard and computer]

17:12

<stickies-v> LarryRuane_: great question: the MutexType is deduced from when the lock is constructed: a lock is always constructed ON a mutex: https://github.com/bitcoin/bitcoin/blob/8c4958bd4c06026dc108bc7f5f063d1f389d279b/src/sync.h#L258

17:12

<stickies-v> perhaps it's good to first clarify what the difference is (in cpp terms) between a lock and a mutex?

17:12

<stickies-v> who can give that an ELI5?

17:12

<LarryRuane_> oh i see, thanks, those deduced types are kinda tricky!

17:14

<DaveBeer> mutex is the actual object being locked, while lock is RAII wrapper manipulating the underlying mutex

17:14

<LarryRuane_> a mutex is the lowest-level syncronization primitive, just prevents two threads from running the protected ranges of code at the same time ... locks are higher-level constructs that _use_ a mutex

17:15

<stickies-v> DaveBeer LarryRuane_: a pretty smart 5 year old, but that sounds about right

17:16

<stickies-v> in simpler terms: a mutex is an object that we use to help control access to one or multiple resources. a mutex can have one or multiple locks, and whoever has the lock(s) can access the underlying resources

17:16

<stickies-v> (hope i didn't simplify it too much)

17:16

<LarryRuane_> haha i know, that's the best i could do! ... so like you can see there's no templating or anything here https://en.cppreference.com/w/cpp/thread/mutex or here https://en.cppreference.com/w/cpp/thread/shared_mutex (these are the low-level primitives)

17:17

<LarryRuane_> so there's only those 2 types of mutexes, right?

17:17

<LarryRuane_> (regular (non-shared) and shared)

17:18

<stickies-v> well there's also the RecursiveMutex (which `::cs_main` is an instance of)

17:18

<DaveBeer> couple more, see https://en.cppreference.com/w/cpp/header/mutex

17:18

<stickies-v> ahh nice one DaveBeer!

17:18

<LarryRuane_> stickies-v: can a particular mutex have more than one lock associated with it at the same time? i think so, right?

17:19

<stickies-v> that's kinda the heart of the question here - who's got an idea?

17:20

<DaveBeer> mutex can have any number of locks associated with it (associated with meaning ready to work with, not being locked at the same time)

17:20

<LarryRuane_> oh i see, there are a few types listed here https://en.cppreference.com/w/cpp/thread (scroll down to Mutual exclusion)

17:21

<LarryRuane_> (oops @DaveBeer already found a better link)

17:22

<DaveBeer> LarryRuane_: actually your link also includes the shared_mutex, which is in its own header (I linked <mutex> header which does not include sahred_mutex)

17:22

<LarryRuane_> DaveBeer: "mutex can have any number of locks associated with it" -- I think you're right, multiple threads can be inside a `LOCK(::cs_main)` for example, and all of those are separate locks, but all refer to a single mutex

17:23

<stickies-v> "multiple threads can be inside a `LOCK(::cs_main)`" isnt' the whole point of LOCK that there can't be multiple threads accessing it at the same time?

17:24

<LarryRuane_> i may be confused here, but yes, you're correct, but multiple threads can be waiting at the same time (and each wait is a separate lock (?))

17:24

<DaveBeer> they are all inside LOCK(...) trying to accessing it, but only one succeeds, I think that's how LarryRuane_ meant it

17:25

<stickies-v> okay, makes sense - sorry. I think the term "lock" is quite overloaded so that doensn't make it easier

17:26

<stickies-v> so this PR adds the `SharedMutex` definition (based on std::shared_mutex)

17:26

<LarryRuane_> stickies-v: i was surprised this didn't already exist, tbh

17:26

<stickies-v> the interesting thing about a shared mutex is that it can have both shared and exclusive locks, whereas an exclusive mutex can have just exclusive locks

17:26

<stickies-v> (same!)

17:27

<pakaro> +1 LarryRuane_ RW locks have until now, not been a part of Core at all??

17:27

<stickies-v> so then the next conceptual question is: if the whole point of a mutex is to prevent multiple threads from accessing the same data and messing everything up for everyone... what's the point of allowing a mutex to have shared locks?!

17:28

<DaveBeer> Read vs Write usecases, reads (and only reads) can be shared, write must be exclusive

17:28

<pakaro> To allow read access?

17:28

<jbes> only read access?

17:29

<DaveBeer> it is safe to read memory location from multiple threads as long as it is not being written at the same time

17:29

<stickies-v> exactly - it's not necessarily the only use case, but probably the most common one?

17:30

<LarryRuane_> yes i think in some other projects it's actually called read-write locks (rather than shared-exclusive)

17:30

<pakaro> If a read-lock take hold of a resource, it can allow other read-locks to also hold the resource, but it will deny a write-lock, is that right?

17:30

<DaveBeer> +1 pakaro

17:30

<stickies-v> pakaro: yes, indeed. an exclusive lock can only be acquired if there are no other locks (shared or exclusive) acquired

17:30

<LarryRuane_> i think the "shared, exclusive" terminology is preferred because it's more general

17:30

<abubakar> +1 pakaro

17:30

<stickies-v> and a shared lock can only be acquired if there are no exclusive locks acquired

17:31

<LarryRuane_> one thing other projects have is the ability to "upgrade" a lock from shared to exclusive, but i don't think we have that in bitcoin core

17:31

<stickies-v> and then the last sub-question on the conceptual part: why is this PR now introducing the shared mutex and lock into bitcoin core?

17:32

<LarryRuane_> you might wonder, why not just drop the shared lock and acquire the exclusive lock ... you can but things could change during that interval, so anything you've cached would be invalid

17:32

<jbes> whats the difference between shared lock and exclusive?

17:33

<LarryRuane_> stickies-v: for performance? multiple threads may want to read the block (and undo) files concurrently, and there's no reason to prevent this

17:33

<abubakar> to increase performance, reducing starvation so that there will be less use of cs_main.

17:33

<stickies-v> jbes: I just answered that in the 10 lines above - lmk if anything's not clear there?

17:33

<LarryRuane_> abubakar: +1, yes, also reduce contention on cs_main

17:34

<jbes> thanks

17:35

<stickies-v> LarryRuane_: yeah accessing block data concurrently is what I was looking for. i/o operations typically benefit most from concurrency, so this seems like a pretty nice win at first sight

17:35

<LarryRuane_> and i guess currently (without this PR), `cs_main` is held during the disk read? That could take quite a long time!

17:35

<stickies-v> (we could've reduced contention on cs_main using good ol' UniqueLock too, I think?)

17:36

<stickies-v> so it seems to me like the SharedLock is kinda orthogonal to the cs_main discussion?

17:36

<LarryRuane_> stickies-v: yes i was thinking that also, this PR really combines two things that could have been done separately: having a separate lock (from cs_main), and making it a shared lock ... but it's good to do both

17:37

<DaveBeer> yes stickies-v, that's my understanding as well, first improvement is reducing ::cs_main contention, second (after splitting this usecase to new dedicated mutex) is to also utilize RW locks

17:38

<stickies-v> +1 (those are also good comments to leave on the PR when you review it - helps the author to craft a more helpful PR description)

17:38

<stickies-v> i'll launch the next question already, but as always - we can keep talking about previous questions too

17:39

<stickies-v> `nFile`, `nDataPos` and `nUndoPos` change from being guarded by `::cs_main` to being guarded by `g_cs_blockindex_data`. Why is it that we lock exactly these 3 variables with `g_cs_blockindex_data`? What would be the concerns and benefits of using a different mutex for each of those 3 variables?

100

17:39

<stickies-v> (link: https://github.com/bitcoin-core-review-club/bitcoin/compare/657e3086ad8171f799a7eb4226c6d1c2dd562a39...acddd4204654812a0e741e04a758be0f362c5ccb#diff-05137bf4d07f31a6cc237b1dd772e0b38bc2a60610a7ca86827e98fc126e8407L166-R175)

101

17:40

<DaveBeer> all 3 variables contribute to single state which must be guarded as whole

102

17:40

<DaveBeer> they are not independent

103

17:40

<LarryRuane_> DaveBeer: +1, if they were separately locked, one might be updated from one thread while another updated from another thread, into an inconsistent state

104

17:41

<jbes> beginner q, what Pos stands for in the variables names?

105

17:41

<stickies-v> yeah, we could use 3 mutexes and acquire locks on all of them but... that would not be very efficient or readable

106

17:41

<stickies-v> jbes: position!

107

17:41

<LarryRuane_> jbes: that's the byte offset within the file (stands for "position", not really a very helpful name)

108

17:42

<stickies-v> they show us where in the file that the block data can be found

109

17:43

<stickies-v> Do you expect this PR to have any visible impact on performance? If so, for which process(es) (in a very general sense) and by how much (order of magnitude)? Were you able to verify/benchmark this in practice?

110

17:43

<LarryRuane_> and in case anyone's not aware, the `nFile` integer relates to the block file name, so if `nFile` is 19, that corresponds to `datadir/blocks/blk00019.dat`

111

17:44

<DaveBeer> since we are already going through the variables meaning, what is nUndoPos? I have no clue.

112

17:45

<pakaro> is the byte-offset for nDataPos and UndoPos [which i'm assuming associate with blockxxxx.dat and revxxxx.dat respectively] the same?

113

17:45

<jbes> performance will be improved if you have a cpu that can can utilize multi threading I guess..

114

17:45

<stickies-v> DaveBeer: the blk<nnnnn>.dat files store the serialized blocks, but we also store the undo data to "reverse" the impact of a block onto the UTXO set in case of a reorg etc

115

17:45

<stickies-v> that data is stored in the rev<nnnnn>.dat files, and is what nUndoPos refers to

116

17:45

<LarryRuane_> there are certain RPCs that read the block files (if the node isn't pruned), like if `txindex` is enabled, and the RPC server is multi-threaded, so those could proceed in parallel (better performance)?

117

17:46

<stickies-v> jbes: does this PR introduce any new threads?

118

17:46

<LarryRuane_> yes so `nUndoPos` is the byte offset of the undo data for a particular block within a `revxxxx.dat` file

119

17:46

<stickies-v> pakaro: no, it is not - if you inspect the blk and rev files you'll see they are different sizes, so that wouldn't quite work (and also the reasonw e have separate variables for them)

120

17:47

<DaveBeer> thanks stickies-v and LarryRuane_

121

17:48

<LarryRuane_> pakaro: but one thing to note is that if a particular block is in `blk00123.dat`, then its undo data is in `rev00123.dat` (for example) .. and I think the blocks are in the same order across those 2 files (but the block and undo data are different sizes, blocks are much bigger usually)

122

17:48

<jbes> stickies-v I thought that was(one of) the point? accessing the data via multiple threads that have one state(thats why mutex?)

123

17:48

<stickies-v> LarryRuane_: yeah RPC (well, and REST) is the first thing that came to my mind too because the HTTP server uses multiple workers. for example, I think the `getblock` method could now really benefit from batch RPC requests since *most* (not all) of that logic is now no longer exclusively locked to cs_main

124

17:50

<stickies-v> jbes: this PR does not introduce any new threads, I think! but it does allow existing multithreaded processes that were previously contended with cs_main to become more efficient. you weren't wrong, just wanted to challenge that a bit :-)

125

17:51

<jbes> thanks for clarifying

126

17:51

<LarryRuane_> there are no new threads, other than in the unit test :)

127

17:52

<stickies-v> (the challenge being that using a shared mutex does not magically introduce more multithreading - it just can make existing multithreading more efficient)

128

17:52

<LarryRuane_> the unit test, `sharedlock_tests.cpp` is a great way to see how these locks actually work

129

17:52

<LarryRuane_> it's very easy to understand

130

17:53

<stickies-v> +1, some tests are easier to understand than others but i found these to be particularly helpful, thanks for pointing that out LarryRuane_

131

17:53

<stickies-v> re performance, there's also this PR (also by PR author) that aims to introduce multiple threads to construct blockfilter index: https://github.com/bitcoin/bitcoin/pull/26966

132

17:53

<stickies-v> and that could of course also greatly benefit from this improvement

133

17:55

<LarryRuane_> stickies-v: i did have a question if you don't mind... I noticed that there are "getters" for these `CBlockIndex` variables you mentioned, but they are still public instead of private.. just wondering why the PR doesn't make them private?

134

17:55

<LarryRuane_> i actually tried that, but got compile errors because some code, such as in `txdb.cpp`, doesn't use the getters, maybe that could be a follow-up PR?

135

17:57

<stickies-v> yeah I think he didn't update all references to those members just to make the PR easier to get merged?

136

17:57

<LarryRuane_> stickies-v: +1 sounds likely

137

17:58

<stickies-v> offering the new API now allows us to gradually phase out direct access to those members whenever we need to touch that code

138

17:58

<stickies-v> haven't looked at the frequency but doesn't seem like an unreasonable approach, also from a merge conflict perspective

139

17:58

<stickies-v> last quick-fire question:

140

17:58

<stickies-v> With this PR, does the number of times that a mutex is acquired increase, stay constant, or decrease - or does it depend on the program flow?

141

17:59

<DaveBeer> stays constant imo

142

17:59

<LarryRuane_> my impression is, stays the same

143

18:00

<pakaro> if the question were the maximum number of simultaneous mutexes held, that would increase, but the number of times total should remain constant

144

18:01

<stickies-v> I didn't count, but my impression is that it goes up? previously we could (and often did) just acquire a single cs_main lock for a bunch of operations, whereas now for example in `MakeBlockInfo()` we acquire a lock twice, first by calling `index->GetFileNum();` and then again by calling `index->GetFileNum();` directly after

145

18:01

<stickies-v> https://github.com/bitcoin/bitcoin/pull/27006/files#diff-31e9d8f2b9c86cc0bdae5ea810e11ba7109ef6763e0572e80714f626f97e5f39R18-R20

146

18:02

<abubakar> but without write access so that help increase performance,I think.

147

18:02

<stickies-v> of course, even though there's a cost associated with acquiring a lock on a mutex, it can pale very very quickly in comparison to having multiple threads run concurrently, so this need not necessarily be a huge concern

148

18:03

<stickies-v> yes abubakar - absolutely!

149

18:03

<LarryRuane_> ouch, doesn't that open a timing window? what we were talking about earlier, those should be fetched atomically, right?

150

18:03

<DaveBeer> ah yes true, you are right stickies-v

151

18:03

<stickies-v> LarryRuane_: but they're read-only operations?

152

18:03

<DaveBeer> actually +10 LarryRuane_, I think that is possible problem

153

18:04

<stickies-v> OH. i see what you mean now

154

18:04

<LarryRuane_> i don't think that matters, couldn't nFile and pos both change just after we sample nFile but before we sample the pos?

155

18:04

<stickies-v> mmm. good point, that seems like a problem indeed

156

18:04

<stickies-v> nice - an actionable review club!

157

18:05

<stickies-v> alright, we're a bit over time already, so i'll wrap it up here

158

18:05

<LarryRuane_> you know, just thinking, if those variables really are all related, maybe there should be a getter than returns all three atomically (fetched under one lock operation)

159

18:05

<LarryRuane_> stickies-v: thanks! very fun and helpful review club!

160

18:05

<stickies-v> thank you everyone for attending and for the discussion - hope our collective understanding about locks, mutexes, concurrency and cs_main has improved a bit - and hope that furszy gets some helpful feedback on his PR!

161

18:05

<pablomartin> thanks stickies-v! thanks all! session was very well presented, quite knowledgeable and interesting!

162

18:05

<stickies-v> (LarryRuane_ and we already have a setter for those three, so...)

163

18:05

<DaveBeer> thanks stickies-v and furszy

164

18:06

<pakaro> thanks everyone, thanks stickies-v

165

18:06

<stickies-v> #endmeeting