Reduce cs_main scope, guard block index 'nFile' under a local mutex (refactoring, resource usage)

https://github.com/bitcoin/bitcoin/pull/27006

Host: stickies-v  -  PR author: furszy

The PR branch HEAD was acddd4204654812a0e741e04a758be0f362c5ccb at the time of this review club meeting.

Notes

  • Once a block is fully validated, it is saved to disk in one of the blk<nFile>.dat files in your datadir.

  • Blocks are received, validated and stored in an unpredictable order (and not sequentially based on block height), so we need to keep track of which file each block is stored in, in order to be able to access it quickly. This is tracked in CBlockIndex by its members nFile nDataPos and nUndoPos. In master, all of these members are guarded by the ::cs_main mutex. We have discussed how blocks are downloaded and stored in previous meetings #24858 and #25880.

  • ::cs_main is a recursive mutex which is used to ensure that validation is carried out in an atomic way. Although in recent years a lot of effort has been made to reduce usage of ::cs_main, it is still heavily used across the codebase.

  • Having a single (global) mutex can allow for reduced code complexity and simplify reasoning about race conditions. However, it often also leads to (sometimes significant) performance issues when multiple threads are waiting for the same mutex even though they don’t need synchronization and are not accessing any of the same variables.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. SharedLock is added as a new mutex type to complement the UniqueLock we already have. Why does a UniqueLock not suffice here? How are the implementations of UniqueLock and SharedLock different?

  3. Do you expect this PR to have any visible impact on performance? If so, for which process(es) (in a very general sense) and by how much (order of magnitude)? Were you able to verify/benchmark this in practice?

  4. This PR changes CBlockIndex::nIndex to default to -1 instead of 0. How can/did you verify that this change is safe?

  5. nFile, nDataPos and nUndoPos change from being guarded by ::cs_main to being guarded by g_cs_blockindex_data. Why is it that we lock exactly these 3 variables with g_cs_blockindex_data? What would be the concerns and benefits of using a different mutex for each of those 3 variables?

  6. Are there any other ways to ensure the data integrity of nFile, nDataPos and nUndoPos?

  7. With this PR, does the number of times that a mutex is acquired increase, stay constant, or decrease - or does it depend on the program flow?

Meeting Log

  117:00 <stickies-v> #startmeeting
  217:00 <LarryRuane_> hi
  317:00 <DaveBeer> hi
  417:00 <effexzi> Hello every1
  517:00 <abubakarsadiq> hello
  617:00 <stickies-v> welcome everyone! Today we're looking at #27006, authored by furszy. The notes and questions are available on https://bitcoincore.reviews/27006
  717:00 <pakaro> hi
  817:01 <stickies-v> anyone joining us for the first time today? even if you're just lurking, feel free to say hi!
  917:02 <jbes> hello
 1017:02 <pablomartin> hi!... ill be just lurking, didnt have time to check the pr for today, sorry...
 1117:02 <stickies-v> no problem! always welcome to just lurk around
 1217:02 <stickies-v> who got the chance to review the PR or read the notes? (y/n)
 1317:03 <pakaro> y
 1417:03 <DaveBeer> y, read notes & questions, went through code once
 1517:03 <pakaro> tACK - compiled, ran bitcoind, unit tests, functional tests without bdb nor usdt-trace. All passed. I got a setup error when I tried to run sharedlock_tests on its own, i didn't spend much/anytime figuring that part out though.
 1617:03 <abubakarsadiq> I read the notes and briefly look at the code
 1717:04 <LarryRuane_> y for most part, but wow this is complex stuff with all the layers of code, derived classes, macros, etc! (sync.h mainly)
 1817:04 <LarryRuane_> (and also the use of templates)
 1917:04 <stickies-v> for those who looked at the PR: would you give it a Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
 2017:05 <stickies-v> pakaro: interesting, but the sharedlock_tests passed when running it with test_runner.py?
 2117:05 <LarryRuane_> definitely concept ACK, close to code ACK but I must study more
 2217:05 <pakaro> stickies-v yes, let me recheck that now
 2317:06 <LarryRuane_> stickies-v: would you say it's useful for reviewers to build using clang? (so the annotations are checked)?
 2417:06 <stickies-v> LarryRuane_: yes definitely quite a few layers of complexity. It looks very verbose at first but I suppose it does make using the mutexes quite a bit more straightforward
 2517:06 <LarryRuane_> yes the RAII aspect of the locking (that we do in practice) is very cool
 2617:07 <stickies-v> mmm good question. definitely never hurts to do that! but CI should also catch those problems (normally)
 2717:07 <stickies-v> alright let's start with building some understanding. a lot of abstract concepts here, so I think it won't hurt to dive a bit deeper
 2817:08 <DaveBeer> concept ACK, for code ACK I would be concerned with global variables usage, but I'm not familiar with bitcoin core codebase much as of yet and maybe global variables are reasonable here (I understand the PR doesn't change this approach and is as good as prior solution)
 2917:08 <stickies-v> SharedLock is added as a new mutex type to complement the UniqueLock we already have. Why does a UniqueLock not suffice here?
 3017:09 <DaveBeer> we want to allow multiple threads access in read only mode, while maintaining exclusive access for writers
 3117:09 <stickies-v> DaveBeer: if I understand the PR comments historically, the initial implementation actually didn't use a global mutex variable, but because there are so many CBlockIndex objects this was the new approach chosen: https://github.com/bitcoin/bitcoin/pull/27006#discussion_r1092197196
 3217:09 <LarryRuane_> for example one of the things i'm confused by is that `UniqueLock` is templated https://github.com/bitcoin/bitcoin/blob/8c4958bd4c06026dc108bc7f5f063d1f389d279b/src/sync.h#L151 but i don't see the angle brackets where it is used, why aren't those needed?
 3317:09 <LarryRuane_> (and of course the new `SharedLock` is the same)
 3417:10 <DaveBeer> stickies-v: thanks for the background, I'll have a look at it
 3517:11 <LarryRuane_> i.e. it's used here: https://github.com/bitcoin/bitcoin/blob/8c4958bd4c06026dc108bc7f5f063d1f389d279b/src/sync.h#L258 but there is no template argument given
 3617:11 <LarryRuane_> (sorry if i'm sidetracking too much, feel free to ignore!)
 3717:12 <pakaro> stickies-v nvm, problem was PEBKAC [problem exists between keyboard and computer]
 3817:12 <stickies-v> LarryRuane_: great question: the MutexType is deduced from when the lock is constructed: a lock is always constructed ON a mutex: https://github.com/bitcoin/bitcoin/blob/8c4958bd4c06026dc108bc7f5f063d1f389d279b/src/sync.h#L258
 3917:12 <stickies-v> perhaps it's good to first clarify what the difference is (in cpp terms) between a lock and a mutex?
 4017:12 <stickies-v> who can give that an ELI5?
 4117:12 <LarryRuane_> oh i see, thanks, those deduced types are kinda tricky!
 4217:14 <DaveBeer> mutex is the actual object being locked, while lock is RAII wrapper manipulating the underlying mutex
 4317:14 <LarryRuane_> a mutex is the lowest-level syncronization primitive, just prevents two threads from running the protected ranges of code at the same time ... locks are higher-level constructs that _use_ a mutex
 4417:15 <stickies-v> DaveBeer LarryRuane_: a pretty smart 5 year old, but that sounds about right
 4517:16 <stickies-v> in simpler terms: a mutex is an object that we use to help control access to one or multiple resources. a mutex can have one or multiple locks, and whoever has the lock(s) can access the underlying resources
 4617:16 <stickies-v> (hope i didn't simplify it too much)
 4717:16 <LarryRuane_> haha i know, that's the best i could do! ... so like you can see there's no templating or anything here https://en.cppreference.com/w/cpp/thread/mutex or here https://en.cppreference.com/w/cpp/thread/shared_mutex (these are the low-level primitives)
 4817:17 <LarryRuane_> so there's only those 2 types of mutexes, right?
 4917:17 <LarryRuane_> (regular (non-shared) and shared)
 5017:18 <stickies-v> well there's also the RecursiveMutex (which `::cs_main` is an instance of)
 5117:18 <DaveBeer> couple more, see https://en.cppreference.com/w/cpp/header/mutex
 5217:18 <stickies-v> ahh nice one DaveBeer!
 5317:18 <LarryRuane_> stickies-v: can a particular mutex have more than one lock associated with it at the same time? i think so, right?
 5417:19 <stickies-v> that's kinda the heart of the question here - who's got an idea?
 5517:20 <DaveBeer> mutex can have any number of locks associated with it (associated with meaning ready to work with, not being locked at the same time)
 5617:20 <LarryRuane_> oh i see, there are a few types listed here https://en.cppreference.com/w/cpp/thread (scroll down to Mutual exclusion)
 5717:21 <LarryRuane_> (oops @DaveBeer already found a better link)
 5817:22 <DaveBeer> LarryRuane_: actually your link also includes the shared_mutex, which is in its own header (I linked <mutex> header which does not include sahred_mutex)
 5917:22 <LarryRuane_> DaveBeer: "mutex can have any number of locks associated with it" -- I think you're right, multiple threads can be inside a `LOCK(::cs_main)` for example, and all of those are separate locks, but all refer to a single mutex
 6017:23 <stickies-v> "multiple threads can be inside a `LOCK(::cs_main)`" isnt' the whole point of LOCK that there can't be multiple threads accessing it at the same time?
 6117:24 <LarryRuane_> i may be confused here, but yes, you're correct, but multiple threads can be waiting at the same time (and each wait is a separate lock (?))
 6217:24 <DaveBeer> they are all inside LOCK(...) trying to accessing it, but only one succeeds, I think that's how LarryRuane_ meant it
 6317:25 <stickies-v> okay, makes sense - sorry. I think the term "lock" is quite overloaded so that doensn't make it easier
 6417:26 <stickies-v> so this PR adds the `SharedMutex` definition (based on std::shared_mutex)
 6517:26 <LarryRuane_> stickies-v: i was surprised this didn't already exist, tbh
 6617:26 <stickies-v> the interesting thing about a shared mutex is that it can have both shared and exclusive locks, whereas an exclusive mutex can have just exclusive locks
 6717:26 <stickies-v> (same!)
 6817:27 <pakaro> +1 LarryRuane_ RW locks have until now, not been a part of Core at all??
 6917:27 <stickies-v> so then the next conceptual question is: if the whole point of a mutex is to prevent multiple threads from accessing the same data and messing everything up for everyone... what's the point of allowing a mutex to have shared locks?!
 7017:28 <DaveBeer> Read vs Write usecases, reads (and only reads) can be shared, write must be exclusive
 7117:28 <pakaro> To allow read access?
 7217:28 <jbes> only read access?
 7317:29 <DaveBeer> it is safe to read memory location from multiple threads as long as it is not being written at the same time
 7417:29 <stickies-v> exactly - it's not necessarily the only use case, but probably the most common one?
 7517:30 <LarryRuane_> yes i think in some other projects it's actually called read-write locks (rather than shared-exclusive)
 7617:30 <pakaro> If a read-lock take hold of a resource, it can allow other read-locks to also hold the resource, but it will deny a write-lock, is that right?
 7717:30 <DaveBeer> +1 pakaro
 7817:30 <stickies-v> pakaro: yes, indeed. an exclusive lock can only be acquired if there are no other locks (shared or exclusive) acquired
 7917:30 <LarryRuane_> i think the "shared, exclusive" terminology is preferred because it's more general
 8017:30 <abubakar> +1 pakaro
 8117:30 <stickies-v> and a shared lock can only be acquired if there are no exclusive locks acquired
 8217:31 <LarryRuane_> one thing other projects have is the ability to "upgrade" a lock from shared to exclusive, but i don't think we have that in bitcoin core
 8317:31 <stickies-v> and then the last sub-question on the conceptual part: why is this PR now introducing the shared mutex and lock into bitcoin core?
 8417:32 <LarryRuane_> you might wonder, why not just drop the shared lock and acquire the exclusive lock ... you can but things could change during that interval, so anything you've cached would be invalid
 8517:32 <jbes> whats the difference between shared lock and exclusive?
 8617:33 <LarryRuane_> stickies-v: for performance? multiple threads may want to read the block (and undo) files concurrently, and there's no reason to prevent this
 8717:33 <abubakar> to increase performance, reducing starvation so that there will be less use of cs_main.
 8817:33 <stickies-v> jbes: I just answered that in the 10 lines above - lmk if anything's not clear there?
 8917:33 <LarryRuane_> abubakar: +1, yes, also reduce contention on cs_main
 9017:34 <jbes> thanks
 9117:35 <stickies-v> LarryRuane_: yeah accessing block data concurrently is what I was looking for. i/o operations typically benefit most from concurrency, so this seems like a pretty nice win at first sight
 9217:35 <LarryRuane_> and i guess currently (without this PR), `cs_main` is held during the disk read? That could take quite a long time!
 9317:35 <stickies-v> (we could've reduced contention on cs_main using good ol' UniqueLock too, I think?)
 9417:36 <stickies-v> so it seems to me like the SharedLock is kinda orthogonal to the cs_main discussion?
 9517:36 <LarryRuane_> stickies-v: yes i was thinking that also, this PR really combines two things that could have been done separately: having a separate lock (from cs_main), and making it a shared lock ... but it's good to do both
 9617:37 <DaveBeer> yes stickies-v, that's my understanding as well, first improvement is reducing ::cs_main contention, second (after splitting this usecase to new dedicated mutex) is to also utilize RW locks
 9717:38 <stickies-v> +1 (those are also good comments to leave on the PR when you review it - helps the author to craft a more helpful PR description)
 9817:38 <stickies-v> i'll launch the next question already, but as always - we can keep talking about previous questions too
 9917:39 <stickies-v> `nFile`, `nDataPos` and `nUndoPos` change from being guarded by `::cs_main` to being guarded by `g_cs_blockindex_data`. Why is it that we lock exactly these 3 variables with `g_cs_blockindex_data`? What would be the concerns and benefits of using a different mutex for each of those 3 variables?
10017:39 <stickies-v> (link: https://github.com/bitcoin-core-review-club/bitcoin/compare/657e3086ad8171f799a7eb4226c6d1c2dd562a39...acddd4204654812a0e741e04a758be0f362c5ccb#diff-05137bf4d07f31a6cc237b1dd772e0b38bc2a60610a7ca86827e98fc126e8407L166-R175)
10117:40 <DaveBeer> all 3 variables contribute to single state which must be guarded as whole
10217:40 <DaveBeer> they are not independent
10317:40 <LarryRuane_> DaveBeer: +1, if they were separately locked, one might be updated from one thread while another updated from another thread, into an inconsistent state
10417:41 <jbes> beginner q, what Pos stands for in the variables names?
10517:41 <stickies-v> yeah, we could use 3 mutexes and acquire locks on all of them but... that would not be very efficient or readable
10617:41 <stickies-v> jbes: position!
10717:41 <LarryRuane_> jbes: that's the byte offset within the file (stands for "position", not really a very helpful name)
10817:42 <stickies-v> they show us where in the file that the block data can be found
10917:43 <stickies-v> Do you expect this PR to have any visible impact on performance? If so, for which process(es) (in a very general sense) and by how much (order of magnitude)? Were you able to verify/benchmark this in practice?
11017:43 <LarryRuane_> and in case anyone's not aware, the `nFile` integer relates to the block file name, so if `nFile` is 19, that corresponds to `datadir/blocks/blk00019.dat`
11117:44 <DaveBeer> since we are already going through the variables meaning, what is nUndoPos? I have no clue.
11217:45 <pakaro> is the byte-offset for nDataPos and UndoPos [which i'm assuming associate with blockxxxx.dat and revxxxx.dat respectively] the same?
11317:45 <jbes> performance will be improved if you have a cpu that can can utilize multi threading I guess..
11417:45 <stickies-v> DaveBeer: the blk<nnnnn>.dat files store the serialized blocks, but we also store the undo data to "reverse" the impact of a block onto the UTXO set in case of a reorg etc
11517:45 <stickies-v> that data is stored in the rev<nnnnn>.dat files, and is what nUndoPos refers to
11617:45 <LarryRuane_> there are certain RPCs that read the block files (if the node isn't pruned), like if `txindex` is enabled, and the RPC server is multi-threaded, so those could proceed in parallel (better performance)?
11717:46 <stickies-v> jbes: does this PR introduce any new threads?
11817:46 <LarryRuane_> yes so `nUndoPos` is the byte offset of the undo data for a particular block within a `revxxxx.dat` file
11917:46 <stickies-v> pakaro: no, it is not - if you inspect the blk and rev files you'll see they are different sizes, so that wouldn't quite work (and also the reasonw e have separate variables for them)
12017:47 <DaveBeer> thanks stickies-v and LarryRuane_
12117:48 <LarryRuane_> pakaro: but one thing to note is that if a particular block is in `blk00123.dat`, then its undo data is in `rev00123.dat` (for example) .. and I think the blocks are in the same order across those 2 files (but the block and undo data are different sizes, blocks are much bigger usually)
12217:48 <jbes> stickies-v I thought that was(one of) the point? accessing the data via multiple threads that have one state(thats why mutex?)
12317:48 <stickies-v> LarryRuane_: yeah RPC (well, and REST) is the first thing that came to my mind too because the HTTP server uses multiple workers. for example, I think the `getblock` method could now really benefit from batch RPC requests since *most* (not all) of that logic is now no longer exclusively locked to cs_main
12417:50 <stickies-v> jbes: this PR does not introduce any new threads, I think! but it does allow existing multithreaded processes that were previously contended with cs_main to become more efficient. you weren't wrong, just wanted to challenge that a bit :-)
12517:51 <jbes> thanks for clarifying
12617:51 <LarryRuane_> there are no new threads, other than in the unit test :)
12717:52 <stickies-v> (the challenge being that using a shared mutex does not magically introduce more multithreading - it just can make existing multithreading more efficient)
12817:52 <LarryRuane_> the unit test, `sharedlock_tests.cpp` is a great way to see how these locks actually work
12917:52 <LarryRuane_> it's very easy to understand
13017:53 <stickies-v> +1, some tests are easier to understand than others but i found these to be particularly helpful, thanks for pointing that out LarryRuane_
13117:53 <stickies-v> re performance, there's also this PR (also by PR author) that aims to introduce multiple threads to construct blockfilter index: https://github.com/bitcoin/bitcoin/pull/26966
13217:53 <stickies-v> and that could of course also greatly benefit from this improvement
13317:55 <LarryRuane_> stickies-v: i did have a question if you don't mind... I noticed that there are "getters" for these `CBlockIndex` variables you mentioned, but they are still public instead of private.. just wondering why the PR doesn't make them private?
13417:55 <LarryRuane_> i actually tried that, but got compile errors because some code, such as in `txdb.cpp`, doesn't use the getters, maybe that could be a follow-up PR?
13517:57 <stickies-v> yeah I think he didn't update all references to those members just to make the PR easier to get merged?
13617:57 <LarryRuane_> stickies-v: +1 sounds likely
13717:58 <stickies-v> offering the new API now allows us to gradually phase out direct access to those members whenever we need to touch that code
13817:58 <stickies-v> haven't looked at the frequency but doesn't seem like an unreasonable approach, also from a merge conflict perspective
13917:58 <stickies-v> last quick-fire question:
14017:58 <stickies-v> With this PR, does the number of times that a mutex is acquired increase, stay constant, or decrease - or does it depend on the program flow?
14117:59 <DaveBeer> stays constant imo
14217:59 <LarryRuane_> my impression is, stays the same
14318:00 <pakaro> if the question were the maximum number of simultaneous mutexes held, that would increase, but the number of times total should remain constant
14418:01 <stickies-v> I didn't count, but my impression is that it goes up? previously we could (and often did) just acquire a single cs_main lock for a bunch of operations, whereas now for example in `MakeBlockInfo()` we acquire a lock twice, first by calling `index->GetFileNum();` and then again by calling `index->GetFileNum();` directly after
14518:01 <stickies-v> https://github.com/bitcoin/bitcoin/pull/27006/files#diff-31e9d8f2b9c86cc0bdae5ea810e11ba7109ef6763e0572e80714f626f97e5f39R18-R20
14618:02 <abubakar> but without write access so that help increase performance,I think.
14718:02 <stickies-v> of course, even though there's a cost associated with acquiring a lock on a mutex, it can pale very very quickly in comparison to having multiple threads run concurrently, so this need not necessarily be a huge concern
14818:03 <stickies-v> yes abubakar - absolutely!
14918:03 <LarryRuane_> ouch, doesn't that open a timing window? what we were talking about earlier, those should be fetched atomically, right?
15018:03 <DaveBeer> ah yes true, you are right stickies-v
15118:03 <stickies-v> LarryRuane_: but they're read-only operations?
15218:03 <DaveBeer> actually +10 LarryRuane_, I think that is possible problem
15318:04 <stickies-v> OH. i see what you mean now
15418:04 <LarryRuane_> i don't think that matters, couldn't nFile and pos both change just after we sample nFile but before we sample the pos?
15518:04 <stickies-v> mmm. good point, that seems like a problem indeed
15618:04 <stickies-v> nice - an actionable review club!
15718:05 <stickies-v> alright, we're a bit over time already, so i'll wrap it up here
15818:05 <LarryRuane_> you know, just thinking, if those variables really are all related, maybe there should be a getter than returns all three atomically (fetched under one lock operation)
15918:05 <LarryRuane_> stickies-v: thanks! very fun and helpful review club!
16018:05 <stickies-v> thank you everyone for attending and for the discussion - hope our collective understanding about locks, mutexes, concurrency and cs_main has improved a bit - and hope that furszy gets some helpful feedback on his PR!
16118:05 <pablomartin> thanks stickies-v! thanks all! session was very well presented, quite knowledgeable and interesting!
16218:05 <stickies-v> (LarryRuane_ and we already have a setter for those three, so...)
16318:05 <DaveBeer> thanks stickies-v and furszy
16418:06 <pakaro> thanks everyone, thanks stickies-v
16518:06 <stickies-v> #endmeeting