util: generalize accounting of system-allocated memory in pool resource (utxo db and indexes)

Jun 21, 2023

https://github.com/bitcoin/bitcoin/pull/27748

Host: larryruane - PR author: LarryRuane

The PR branch HEAD was d25b54346fed931830cf3f538b96c5c346165487 at the time of this review club meeting.

Notes

This PR is a follow-on to PR 25325, which we reviewed March 8 of this year. Please review at least the notes for that review club.
The -dbcache configuration option determines the amount of memory (default 450 MiB) used for the coins cache as well as other “database” uses of memory; see CalculateCacheSizes().
Using less memory than allowed decreases the coins cache hit ratio (the fraction of lookups that find the UTXO in the cache); using more memory than specified risks crashing bitcoind on memory-restricted systems.
For this reason, it’s important to keep an accurate accounting of the amount of memory used by the cache. It doesn’t have to be perfect but should be fairly close.
When a program requests X bytes of dynamic memory from the C++ runtime library, internally it allocates slightly more for the memory allocator’s metadata (overhead). In other words, logical memory is not the same as physical memory.
This memory allocator metadata is somewhat complex and depends on several factors, such as the machine architecture and the memory model.
When sizing the cache, we want to calculate physical memory usage, that is, account for this extra allocation metadata. Unfortunately, there’s no library function that maps logical memory size to physical size.
To deal with that, Bitcoin Core includes a function, MallocUsage(), that approximates this conversion. Its argument is an allocation size, and it returns the corresponding physical size.
That source file, memusage.h, includes many overloads of the function DynamicUsage() across the various data types that we might be allocating somewhere in the system. They all make use of MallocUsage().
The pool memory resource (added by PR #25325) adds a new DynamicUsage() overload (version) that computes the overall coins cache size. This allows us to stay within the configured cache size.
This PR modifies how this calculation is done.
This DynamicUsage() overload (for the pool memory resource) is called only from CCoinsViewCache::DynamicMemoryUsage().

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
In the master branch (without this PR), why does the DynamicUsage() overload have so many templated arguments? (Hint: compare it to the overload immediately above it, on line 170.)
How did this DynamicUsage() overload work on master? What are the various values being added together in this PR?
Specifically, why is m.bucket_count() part of the DynamicUsage() calculation? Why isn’t the memory for the bucket allocation already accounted for in the resource “chunks”?
In this PR, where is the DynamicUsage() calculation moved to, and why is m.bucket_count() no longer needed? What is the advantage of not referencing m.bucket_count()?
Extra credit: What is cachedCoinsUsage and why does CCoinsViewCache::DynamicMemoryUsage() add it to memusage::DynamicUsage(cacheCoins()?

Meeting Log

17:00

<LarryRuane> #startmeeting

17:00

<stickies-v> hi

17:00

<kevkevin> hi

17:00

<LarryRuane> welcome everyone! Today we're looking at https://bitcoincore.reviews/27748

17:00

<emzy> hi

17:00

<Pins> hi

17:01

<drusilla_> Hi

17:01

<LarryRuane> I see some familiar names, is anyone here for the first time? feel free to say hi even if you'd like to just watch for now

17:02

<yashraj> hi

17:02

<pablomartin> hello

17:02

<LarryRuane> IRC isn't the most interactive medium for these discussions, so feel free to bring up earlier topics or continue earlier conversations even if we've moved on!

17:02

<drusilla_> This is my first time here , I can't hear anything on my end

17:03

<LarryRuane> welcome @drusilla_ ! There is no sound, this is OLD SCHOOL text chat only!

17:03

<kevkevin> welcome! drusilla_

17:03

<stickies-v> glad you found your way here, drusilla_ !

17:03

<efrageek> Hi!

17:04

<LarryRuane> (I think IRC was created around the 1970s haha)

17:04

<drusilla_> Thank you

17:04

<Pins> haha

17:04

<efrageek> How are you guys? New around here

17:05

<kevkevin> welcome! efrageek

17:05

<abubakarsadiq> hi lurking tday

17:05

<LarryRuane> So, after writing today's notes and questions, it occurred to me that some here might not be familiar with lower-level structures like `unordered_map` in c++, especially if you've come from higher-level languages like Ruby or Python

17:05

<LarryRuane> glad you're here, @efrageek! Feel free to just lurk or ask any questions, there are no bad questions

17:06

<efrageek> Really appreciate LarryRuane!

17:07

<efrageek> Tha problemthat you mentiond is something that I found when I started to study c++

17:07

<LarryRuane> So in bitcoin core, a super important data structure is the "coins cache" ... this is a map (we use the standard library `unordered_map`) that lets us determine if a transaction's input refers to a valid, unspent output

17:07

<stickies-v> LarryRuane: isn't an unordered_map in C++ very similar to a python dict?

17:08

<LarryRuane> Yes, exactly right, and it's sometimes called a hash table, https://en.wikipedia.org/wiki/Hash_table

17:08

<LarryRuane> So, especially during initial block download, performance is critical, we're validating tons of transactions,

17:09

<LarryRuane> and for each one, we look at all its inputs, and want to see if the output it references (the reference is called a `COutpoint`) is exists, and is unspent

17:10

<LarryRuane> I think of that name, COutpoint, as meaning it's a "pointer" to an earlier transaction's output ... but not in the usual sense of a memory pointer,

17:10

<kevkevin> can we only have one COutpoint for each input or can we have multiple COutpoints?

17:10

<LarryRuane> but the reference is by txid and index into the tx's output array

17:11

<LarryRuane> kevkevin: good question, each input refers to exactly one COutpoint

17:12

<kevkevin> ok thanks!

17:12

<LarryRuane> If the transaction is valid, we ADD all of its outputs (txid, output index) to this map, so they are available to be referenced by later transactions that want to spend those outputs

17:12

<LarryRuane> and see, it's very common that the "lifetime" of an output (how long it is unspent) is pretty short

17:13

<LarryRuane> many outputs are spent within a few blocks of when they were created! more often than you'd guess, even spent within the same block!

17:14

<LarryRuane> so, even though all the unspent outputs (UTXOs) are saved on disk (in the `chainstate` directory within your data directory, in LevelDB format), the most recent ones are saved in memory -- in this very unordered_map we're talking about today

17:14

<Pins> that's interesting

17:14

<LarryRuane> and it's a big savings if a UTXO gets spent quickly and never even has to be written to disk!

17:15

<LarryRuane> we'd like this to happen as often as possible, because it makes IBD much faster than it would otherwise be, because reading from disk is like 2 orders of magnitude, or more, slower than just reading from memory

17:16

<LarryRuane> so this is all background, sorry if most of you know this, but is that clear? did I confuse anyone?

17:16

<kevkevin> ahh ok so the unordered map is just the latest set of UTXO's, is there a max size to this unordered list and if so is that max size configurable?

17:16

<efrageek> Very clear, even for the new guy

17:16

<Pins> very clear

17:17

<LarryRuane> kevkevin: great question, the `-dbcache` setting in your configuration, default 450 MiB (450*1024*1024) determines the max mem size for this map, plus some other memory caches,

17:17

<kevkevin> ahh ok thanks!

17:17

<LarryRuane> but during IBD, really the only cache that's in use and active, is this UTXO cache (aka coins cache)

17:17

<kevkevin> makes sense to me

17:18

<kevkevin> would it make sense to increase that cache during IBD and then lower it to the -dbcache size after IBD is finished?

17:18

<LarryRuane> and also (I had to discover all this myself recently), the mempool cache is used for this map during IBD, and it has a separate config setting, `-maxmempool` i think, and its default is 300 MiB

17:19

<LarryRuane> kevkevin: yes, so that's exactly what happens,

17:19

<kevkevin> ahh ok cool!

17:19

<LarryRuane> this coins map uses 750 MiB or so during IBD, because there's no mempool yet!

17:19

<LarryRuane> and of course you can override all this

17:20

<stickies-v> kevkevin: yes, IBD is an excellent time to manually increase dbcache to a higher value, makes a big difference

17:20

<Pins> kevkevin when you have enought resource to do that

17:20

<LarryRuane> this 750 is quite small for many machines, so if you're about to do IBD, and you have a fairly high end system, increase either of those settings, and it will be MUCH faster

17:20

<LarryRuane> @stickies-v beat me to it, yes, exactly right

17:21

<yashraj> can confirm

17:21

<LarryRuane> so like i have a Raspi (myNode), and it has only 4 GB total memory, so probably best not to change its settings (their default config actually does change it slightly)

17:22

<LarryRuane> ok so let me just describe this important `std::unordered_map` a little bit (but again, feel free to continue where we were before)

17:23

<LarryRuane> it has this one large array, or actually `std::vector`, which is called the bucket array or bucket list (haha),

17:23

<LarryRuane> and each entry in this bucket array points to an *entry* in the map

17:25

<LarryRuane> and each entry contains (in this case of the coins map) the map key (a `COutpoint` like we described) and a `Coin` which is the amount of this output and its `scriptPubKey` .. did I miss anything, @stickies-v ?

17:26

<LarryRuane> so again, we're validating a tx... we loop across its inputs ... each one contains a `COutpoint` ... we look in the map using this `COutpoint` (which remember is a txid and output index) ...

17:27

<LarryRuane> if it's not there, what do we need to do? anyone?

17:27

<Pins> Mempool?

17:28

<kevkevin> well would we read the disk?

17:28

<Pins> Yes sure ... my bad

17:28

<LarryRuane> Pins: actually the mempool has entries in this map too

17:29

<LarryRuane> kevkevin: yes, we read the disk! if this output was created a long time ago, it won't be in memory but may be on disk, so we have to read the disk (slow)

17:29

<kevkevin> Would that only happen during IBD? reading from the mempool

17:29

<LarryRuane> I'm glossing over some stuff, but there are actually LAYERS of memory caches! each with an unordered map

17:29

<LarryRuane> kevkevin: during IBD, our mempool is empty .. mempool contains only very recent transactions

17:30

<LarryRuane> we might be processing blocks from 2014

17:30

<LarryRuane> trying to decide if I should talk about the layers ... that might take a little too long here

17:30

<LarryRuane> well, very quickly,

17:30

<kevkevin> oh I thought we used the mempool cache during IBD

17:31

<instagibbs> kevkevin just the memory we would have allocated to the mempool potentially

17:31

<LarryRuane> when we're validating a block, its transactions create a bunch of new UTXOs, right?

17:31

<kevkevin> ahh ok just the space is being used thanks!

17:31

<yashraj> yes

17:31

<LarryRuane> oh hi @instagibbs! We have a real expert here, much more than me!

17:31

<LarryRuane> now I'm nervous ... haha

17:31

<instagibbs> I've literally done nothing with utxo set in core haha

17:32

<LarryRuane> so when we're validating a block, we're creating these UTXO entries in the memory map ... but let's say that the LAST transaction in the block is invalid!

17:32

<LarryRuane> so what do we do now?

17:32

<LarryRuane> one way is we could go back and REMOVE all those UTXOs from this block, because the entire block is invalid

17:32

<LarryRuane> but that would be complicated and slow

17:33

<LarryRuane> so what we do instead is make a TEMPORARY map for JUST THIS ONE BLOCK

17:33

<LarryRuane> and if the block turns out to be invalid, we simply DISCARD this temporary coins (UTXO) memory cache

17:33

<LarryRuane> (we never flush these entries to disk during this one block's validation)

17:33

<instagibbs> isn't the cache flushed only once a day, or when it gets too big, in general?

17:34

<instagibbs> or is that a generalization of what you're saying

17:34

<LarryRuane> if the block IS valid, we MERGE this map down to the "main" memory map (which is much larger, this is the one limited by `-dbcache`

17:34

<instagibbs> ah :) nevermind

100

17:34

<LarryRuane> instagibbs: yes, that's right, that's the main cache... and also yes, when it reaches the size limit, we flush all of it

101

17:35

<kevkevin> LarryRuane: is the new memory map when the block is invalid one of the layers you mentioned?

102

17:35

<LarryRuane> so there's this idea of a "view" which is a (more temporary) cache built on top of (layered on top of) a bigger cache

103

17:36

<LarryRuane> kevkevin: yes

104

17:36

<instagibbs> ah, so one kind of view is for mempool(continuously running) and another view on top would be block vlaidation

105

17:36

<LarryRuane> it's pretty cool how these layers work but it took me a while to figure it out, and I'm still not very knowledgable on it, see src/coins.cpp

106

17:36

<instagibbs> two different smaller maps on top of the main cache

107

17:36

<instagibbs> is this right?

108

17:36

<LarryRuane> instagibbs: mempool! yes! that's just another layer!

109

17:37

<LarryRuane> so now we see at least THREE in-memory caching layers!

110

17:37

<LarryRuane> and then of course there's the LevelDB layer, which is at the bottom (i guess depending on how you visualize it), on disk

111

17:39

<LarryRuane> I think this relates to what some of you may know much better than me, functional programming ... where data structures are much more often immutable, and you apply changes atomically once everything is validated or no errors, else discard

112

17:39

<LarryRuane> anyway, I'm not an expert on functional programming but I think there's a connection to that style or programming

113

17:40

<LarryRuane> Okay so back to unordered_map ... it's faster than an (ordered) `std::map` ... the tradeoff being that a `std::map` allows you to iterate the entries in key-order, but at the expense of time and (maybe) storage

114

17:41

<LarryRuane> I think there are some `std::map`s in bitcoin core that could be `std::unordered_map` instead, but were coded before `std::unordered_map` existed! It was added only more recently

115

17:43

<Pins> In this specific case the entries are no ordered, right? So no reason to use std::map

116

17:43

<LarryRuane> So anyway, an unordered_map has this bucket array, and that has to be accounted for when we try to figure out how much actual memory this map is consuming, and this (finally!) relates to this PR

117

17:43

<instagibbs> utxos have no inherent (key) ordering

118

17:43

<LarryRuane> Pins: yes exactly! and this data structure is so big and important, we want to take every advantage we can

119

17:43

<Pins> perfect

120

17:44

<LarryRuane> So let's see... sorry this is all kind of disorganized, any more questions about this coins map (maps) or the entries they contain? did I miss anything?

121

17:45

<LarryRuane> Oh so, there was a recent PR merged, https://github.com/bitcoin/bitcoin/pull/25325 that made this unordered map use significantly less memory, super nice improvement!

122

17:46

<LarryRuane> it did that by having this particular unordered_map (not all unordered_maps in the system) use a custom memory allocator

123

17:46

<LarryRuane> now of course, any custom allocator must, at the bottom, use the same system memory allocator that everything else uses!

124

17:48

<LarryRuane> but this layer in between (you can think of it as) makes the unordered_map's allocations more efficient in terms of both speed and (pyysical) memory usage, even though the unordered_map (which we don't control) is doing exactly the same memory allocations

125

17:48

<instagibbs> "this particular" which one

126

17:48

<instagibbs> too many layers :)

127

17:49

<LarryRuane> but I think mainly what it did is allowed MORE coins entries to be cached in memory before we hit that physical memory limit, so there is a higher hit ratio (the percentage of times we look for a UTXO in memory and it's present, don't have to read disk)

128

17:49

<LarryRuane> instagibbs: oh right haha, the "main" memory cache, which is the base for both the mempool and the block layers

129

17:50

<instagibbs> ok, the largest in memory one, makes sense

130

17:50

<LarryRuane> so actually, those mempool and block maps also use this new memory allocator, but there's probably only a minor benefit, because there's small compared to the main one

131

17:50

<instagibbs> 👍

132

17:51

<LarryRuane> okay, whew, sorry this is so messy, but finally, what this PR does is to improve the way we calculate physical memory usage for this (well, any) coins utxo map

133

17:52

<LarryRuane> let me stop and take a breath, can anyone explain what this PR does basically?

134

17:52

<Pins> cache more coins with the same memory usage

135

17:52

<LarryRuane> (and be aware, this really isn't an important PR, I'm not sure it will even be merged! but I saw it as a good excuse for discussing this important memory data structure!)

136

17:53

<Pins> LarryRuane (y)

137

17:53

<LarryRuane> Pins: that's what 25325 did, but this PR we're reviewing (one of mine) is a small tweak on that (already merged) PR

138

17:53

<Pins> hummm

139

17:54

<LarryRuane> gosh we're almost out of time, let me just throw it open to anyone, does anyone have answers to any of the questions on https://bitcoincore.reviews/27748?

140

17:56

<LarryRuane> let me just quickly summarize.. when I saw this in PR 25325 (now in master): https://github.com/bitcoin/bitcoin/blob/681ecac5c2d462920cd32636eec15599a9bcf424/src/memusage.h#L186

141

17:56

<LarryRuane> I wondered, why is `m.bucket_count()` part of this calculation?

142

17:58

<LarryRuane> Seems like the `std::unordered_map` implementation should use the custom memory allocator to allocate the bucket array... so then its physical memory usage would be accounted for in the "chunks" (see the line just above, like 185)

143

17:58

<LarryRuane> well, it turns out that this bucket array is TOO BIG to be allocated by the custom allocator!

144

17:59

<LarryRuane> in that case, beyond a certain rather small allocation size, the custom allocator just does a normal system allocation (i.e. `new` in c++)

145

17:59

<LarryRuane> so it's correct (in master) but kind of a kludge and maybe lucky

146

18:00

<LarryRuane> in this PR (that we're reviewing) we just keep track of any system allocations that WE (the custom allocator) do, and account for the bucket array that way, without ever having to call `m.bucket_count()`

147

18:01

<LarryRuane> Well, we're out of time, thanks everyone, especially the newcomers and @instagibbs and @stickies-v for your expertise!

148

18:01

<LarryRuane> #endmeeting