util: generalize accounting of system-allocated memory in pool resource (utxo db and indexes)

https://github.com/bitcoin/bitcoin/pull/27748

Host: larryruane  -  PR author: LarryRuane

The PR branch HEAD was d25b54346fed931830cf3f538b96c5c346165487 at the time of this review club meeting.

Notes

  • This PR is a follow-on to PR 25325, which we reviewed March 8 of this year. Please review at least the notes for that review club.

  • The -dbcache configuration option determines the amount of memory (default 450 MiB) used for the coins cache as well as other “database” uses of memory; see CalculateCacheSizes().

  • Using less memory than allowed decreases the coins cache hit ratio (the fraction of lookups that find the UTXO in the cache); using more memory than specified risks crashing bitcoind on memory-restricted systems.

  • For this reason, it’s important to keep an accurate accounting of the amount of memory used by the cache. It doesn’t have to be perfect but should be fairly close.

  • When a program requests X bytes of dynamic memory from the C++ runtime library, internally it allocates slightly more for the memory allocator’s metadata (overhead). In other words, logical memory is not the same as physical memory.

  • This memory allocator metadata is somewhat complex and depends on several factors, such as the machine architecture and the memory model.

  • When sizing the cache, we want to calculate physical memory usage, that is, account for this extra allocation metadata. Unfortunately, there’s no library function that maps logical memory size to physical size.

  • To deal with that, Bitcoin Core includes a function, MallocUsage(), that approximates this conversion. Its argument is an allocation size, and it returns the corresponding physical size.

  • That source file, memusage.h, includes many overloads of the function DynamicUsage() across the various data types that we might be allocating somewhere in the system. They all make use of MallocUsage().

  • The pool memory resource (added by PR #25325) adds a new DynamicUsage() overload (version) that computes the overall coins cache size. This allows us to stay within the configured cache size.

  • This PR modifies how this calculation is done.

  • This DynamicUsage() overload (for the pool memory resource) is called only from CCoinsViewCache::DynamicMemoryUsage().

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. In the master branch (without this PR), why does the DynamicUsage() overload have so many templated arguments? (Hint: compare it to the overload immediately above it, on line 170.)

  3. How did this DynamicUsage() overload work on master? What are the various values being added together in this PR?

  4. Specifically, why is m.bucket_count() part of the DynamicUsage() calculation? Why isn’t the memory for the bucket allocation already accounted for in the resource “chunks”?

  5. In this PR, where is the DynamicUsage() calculation moved to, and why is m.bucket_count() no longer needed? What is the advantage of not referencing m.bucket_count()?

  6. Extra credit: What is cachedCoinsUsage and why does CCoinsViewCache::DynamicMemoryUsage() add it to memusage::DynamicUsage(cacheCoins()?

Meeting Log

  117:00 <LarryRuane> #startmeeting
  217:00 <stickies-v> hi
  317:00 <kevkevin> hi
  417:00 <LarryRuane> welcome everyone! Today we're looking at https://bitcoincore.reviews/27748
  517:00 <emzy> hi
  617:00 <Pins> hi
  717:01 <drusilla_> Hi
  817:01 <LarryRuane> I see some familiar names, is anyone here for the first time? feel free to say hi even if you'd like to just watch for now
  917:02 <yashraj> hi
 1017:02 <pablomartin> hello
 1117:02 <LarryRuane> IRC isn't the most interactive medium for these discussions, so feel free to bring up earlier topics or continue earlier conversations even if we've moved on!
 1217:02 <drusilla_> This is my first time here , I can't hear anything on my end
 1317:03 <LarryRuane> welcome @drusilla_ ! There is no sound, this is OLD SCHOOL text chat only!
 1417:03 <kevkevin> welcome! drusilla_
 1517:03 <stickies-v> glad you found your way here, drusilla_ !
 1617:03 <efrageek> Hi!
 1717:04 <LarryRuane> (I think IRC was created around the 1970s haha)
 1817:04 <drusilla_> Thank you
 1917:04 <Pins> haha
 2017:04 <efrageek> How are you guys? New around here
 2117:05 <kevkevin> welcome! efrageek
 2217:05 <abubakarsadiq> hi lurking tday
 2317:05 <LarryRuane> So, after writing today's notes and questions, it occurred to me that some here might not be familiar with lower-level structures like `unordered_map` in c++, especially if you've come from higher-level languages like Ruby or Python
 2417:05 <LarryRuane> glad you're here, @efrageek! Feel free to just lurk or ask any questions, there are no bad questions
 2517:06 <efrageek> Really appreciate LarryRuane!
 2617:07 <efrageek> Tha problemthat you mentiond is something that I found when I started to study c++
 2717:07 <LarryRuane> So in bitcoin core, a super important data structure is the "coins cache" ... this is a map (we use the standard library `unordered_map`) that lets us determine if a transaction's input refers to a valid, unspent output
 2817:07 <stickies-v> LarryRuane: isn't an unordered_map in C++ very similar to a python dict?
 2917:08 <LarryRuane> Yes, exactly right, and it's sometimes called a hash table, https://en.wikipedia.org/wiki/Hash_table
 3017:08 <LarryRuane> So, especially during initial block download, performance is critical, we're validating tons of transactions,
 3117:09 <LarryRuane> and for each one, we look at all its inputs, and want to see if the output it references (the reference is called a `COutpoint`) is exists, and is unspent
 3217:10 <LarryRuane> I think of that name, COutpoint, as meaning it's a "pointer" to an earlier transaction's output ... but not in the usual sense of a memory pointer,
 3317:10 <kevkevin> can we only have one COutpoint for each input or can we have multiple COutpoints?
 3417:10 <LarryRuane> but the reference is by txid and index into the tx's output array
 3517:11 <LarryRuane> kevkevin: good question, each input refers to exactly one COutpoint
 3617:12 <kevkevin> ok thanks!
 3717:12 <LarryRuane> If the transaction is valid, we ADD all of its outputs (txid, output index) to this map, so they are available to be referenced by later transactions that want to spend those outputs
 3817:12 <LarryRuane> and see, it's very common that the "lifetime" of an output (how long it is unspent) is pretty short
 3917:13 <LarryRuane> many outputs are spent within a few blocks of when they were created! more often than you'd guess, even spent within the same block!
 4017:14 <LarryRuane> so, even though all the unspent outputs (UTXOs) are saved on disk (in the `chainstate` directory within your data directory, in LevelDB format), the most recent ones are saved in memory -- in this very unordered_map we're talking about today
 4117:14 <Pins> that's interesting
 4217:14 <LarryRuane> and it's a big savings if a UTXO gets spent quickly and never even has to be written to disk!
 4317:15 <LarryRuane> we'd like this to happen as often as possible, because it makes IBD much faster than it would otherwise be, because reading from disk is like 2 orders of magnitude, or more, slower than just reading from memory
 4417:16 <LarryRuane> so this is all background, sorry if most of you know this, but is that clear? did I confuse anyone?
 4517:16 <kevkevin> ahh ok so the unordered map is just the latest set of UTXO's, is there a max size to this unordered list and if so is that max size configurable?
 4617:16 <efrageek> Very clear, even for the new guy
 4717:16 <Pins> very clear
 4817:17 <LarryRuane> kevkevin: great question, the `-dbcache` setting in your configuration, default 450 MiB (450*1024*1024) determines the max mem size for this map, plus some other memory caches,
 4917:17 <kevkevin> ahh ok thanks!
 5017:17 <LarryRuane> but during IBD, really the only cache that's in use and active, is this UTXO cache (aka coins cache)
 5117:17 <kevkevin> makes sense to me
 5217:18 <kevkevin> would it make sense to increase that cache during IBD and then lower it to the -dbcache size after IBD is finished?
 5317:18 <LarryRuane> and also (I had to discover all this myself recently), the mempool cache is used for this map during IBD, and it has a separate config setting, `-maxmempool` i think, and its default is 300 MiB
 5417:19 <LarryRuane> kevkevin: yes, so that's exactly what happens,
 5517:19 <kevkevin> ahh ok cool!
 5617:19 <LarryRuane> this coins map uses 750 MiB or so during IBD, because there's no mempool yet!
 5717:19 <LarryRuane> and of course you can override all this
 5817:20 <stickies-v> kevkevin: yes, IBD is an excellent time to manually increase dbcache to a higher value, makes a big difference
 5917:20 <Pins> kevkevin when you have enought resource to do that
 6017:20 <LarryRuane> this 750 is quite small for many machines, so if you're about to do IBD, and you have a fairly high end system, increase either of those settings, and it will be MUCH faster
 6117:20 <LarryRuane> @stickies-v beat me to it, yes, exactly right
 6217:21 <yashraj> can confirm
 6317:21 <LarryRuane> so like i have a Raspi (myNode), and it has only 4 GB total memory, so probably best not to change its settings (their default config actually does change it slightly)
 6417:22 <LarryRuane> ok so let me just describe this important `std::unordered_map` a little bit (but again, feel free to continue where we were before)
 6517:23 <LarryRuane> it has this one large array, or actually `std::vector`, which is called the bucket array or bucket list (haha),
 6617:23 <LarryRuane> and each entry in this bucket array points to an *entry* in the map
 6717:25 <LarryRuane> and each entry contains (in this case of the coins map) the map key (a `COutpoint` like we described) and a `Coin` which is the amount of this output and its `scriptPubKey` .. did I miss anything, @stickies-v ?
 6817:26 <LarryRuane> so again, we're validating a tx... we loop across its inputs ... each one contains a `COutpoint` ... we look in the map using this `COutpoint` (which remember is a txid and output index) ...
 6917:27 <LarryRuane> if it's not there, what do we need to do? anyone?
 7017:27 <Pins> Mempool?
 7117:28 <kevkevin> well would we read the disk?
 7217:28 <Pins> Yes sure ... my bad
 7317:28 <LarryRuane> Pins: actually the mempool has entries in this map too
 7417:29 <LarryRuane> kevkevin: yes, we read the disk! if this output was created a long time ago, it won't be in memory but may be on disk, so we have to read the disk (slow)
 7517:29 <kevkevin> Would that only happen during IBD? reading from the mempool
 7617:29 <LarryRuane> I'm glossing over some stuff, but there are actually LAYERS of memory caches! each with an unordered map
 7717:29 <LarryRuane> kevkevin: during IBD, our mempool is empty .. mempool contains only very recent transactions
 7817:30 <LarryRuane> we might be processing blocks from 2014
 7917:30 <LarryRuane> trying to decide if I should talk about the layers ... that might take a little too long here
 8017:30 <LarryRuane> well, very quickly,
 8117:30 <kevkevin> oh I thought we used the mempool cache during IBD
 8217:31 <instagibbs> kevkevin just the memory we would have allocated to the mempool potentially
 8317:31 <LarryRuane> when we're validating a block, its transactions create a bunch of new UTXOs, right?
 8417:31 <kevkevin> ahh ok just the space is being used thanks!
 8517:31 <yashraj> yes
 8617:31 <LarryRuane> oh hi @instagibbs! We have a real expert here, much more than me!
 8717:31 <LarryRuane> now I'm nervous ... haha
 8817:31 <instagibbs> I've literally done nothing with utxo set in core haha
 8917:32 <LarryRuane> so when we're validating a block, we're creating these UTXO entries in the memory map ... but let's say that the LAST transaction in the block is invalid!
 9017:32 <LarryRuane> so what do we do now?
 9117:32 <LarryRuane> one way is we could go back and REMOVE all those UTXOs from this block, because the entire block is invalid
 9217:32 <LarryRuane> but that would be complicated and slow
 9317:33 <LarryRuane> so what we do instead is make a TEMPORARY map for JUST THIS ONE BLOCK
 9417:33 <LarryRuane> and if the block turns out to be invalid, we simply DISCARD this temporary coins (UTXO) memory cache
 9517:33 <LarryRuane> (we never flush these entries to disk during this one block's validation)
 9617:33 <instagibbs> isn't the cache flushed only once a day, or when it gets too big, in general?
 9717:34 <instagibbs> or is that a generalization of what you're saying
 9817:34 <LarryRuane> if the block IS valid, we MERGE this map down to the "main" memory map (which is much larger, this is the one limited by `-dbcache`
 9917:34 <instagibbs> ah :) nevermind
10017:34 <LarryRuane> instagibbs: yes, that's right, that's the main cache... and also yes, when it reaches the size limit, we flush all of it
10117:35 <kevkevin> LarryRuane: is the new memory map when the block is invalid one of the layers you mentioned?
10217:35 <LarryRuane> so there's this idea of a "view" which is a (more temporary) cache built on top of (layered on top of) a bigger cache
10317:36 <LarryRuane> kevkevin: yes
10417:36 <instagibbs> ah, so one kind of view is for mempool(continuously running) and another view on top would be block vlaidation
10517:36 <LarryRuane> it's pretty cool how these layers work but it took me a while to figure it out, and I'm still not very knowledgable on it, see src/coins.cpp
10617:36 <instagibbs> two different smaller maps on top of the main cache
10717:36 <instagibbs> is this right?
10817:36 <LarryRuane> instagibbs: mempool! yes! that's just another layer!
10917:37 <LarryRuane> so now we see at least THREE in-memory caching layers!
11017:37 <LarryRuane> and then of course there's the LevelDB layer, which is at the bottom (i guess depending on how you visualize it), on disk
11117:39 <LarryRuane> I think this relates to what some of you may know much better than me, functional programming ... where data structures are much more often immutable, and you apply changes atomically once everything is validated or no errors, else discard
11217:39 <LarryRuane> anyway, I'm not an expert on functional programming but I think there's a connection to that style or programming
11317:40 <LarryRuane> Okay so back to unordered_map ... it's faster than an (ordered) `std::map` ... the tradeoff being that a `std::map` allows you to iterate the entries in key-order, but at the expense of time and (maybe) storage
11417:41 <LarryRuane> I think there are some `std::map`s in bitcoin core that could be `std::unordered_map` instead, but were coded before `std::unordered_map` existed! It was added only more recently
11517:43 <Pins> In this specific case the entries are no ordered, right? So no reason to use std::map
11617:43 <LarryRuane> So anyway, an unordered_map has this bucket array, and that has to be accounted for when we try to figure out how much actual memory this map is consuming, and this (finally!) relates to this PR
11717:43 <instagibbs> utxos have no inherent (key) ordering
11817:43 <LarryRuane> Pins: yes exactly! and this data structure is so big and important, we want to take every advantage we can
11917:43 <Pins> perfect
12017:44 <LarryRuane> So let's see... sorry this is all kind of disorganized, any more questions about this coins map (maps) or the entries they contain? did I miss anything?
12117:45 <LarryRuane> Oh so, there was a recent PR merged, https://github.com/bitcoin/bitcoin/pull/25325 that made this unordered map use significantly less memory, super nice improvement!
12217:46 <LarryRuane> it did that by having this particular unordered_map (not all unordered_maps in the system) use a custom memory allocator
12317:46 <LarryRuane> now of course, any custom allocator must, at the bottom, use the same system memory allocator that everything else uses!
12417:48 <LarryRuane> but this layer in between (you can think of it as) makes the unordered_map's allocations more efficient in terms of both speed and (pyysical) memory usage, even though the unordered_map (which we don't control) is doing exactly the same memory allocations
12517:48 <instagibbs> "this particular" which one
12617:48 <instagibbs> too many layers :)
12717:49 <LarryRuane> but I think mainly what it did is allowed MORE coins entries to be cached in memory before we hit that physical memory limit, so there is a higher hit ratio (the percentage of times we look for a UTXO in memory and it's present, don't have to read disk)
12817:49 <LarryRuane> instagibbs: oh right haha, the "main" memory cache, which is the base for both the mempool and the block layers
12917:50 <instagibbs> ok, the largest in memory one, makes sense
13017:50 <LarryRuane> so actually, those mempool and block maps also use this new memory allocator, but there's probably only a minor benefit, because there's small compared to the main one
13117:50 <instagibbs> đź‘Ť
13217:51 <LarryRuane> okay, whew, sorry this is so messy, but finally, what this PR does is to improve the way we calculate physical memory usage for this (well, any) coins utxo map
13317:52 <LarryRuane> let me stop and take a breath, can anyone explain what this PR does basically?
13417:52 <Pins> cache more coins with the same memory usage
13517:52 <LarryRuane> (and be aware, this really isn't an important PR, I'm not sure it will even be merged! but I saw it as a good excuse for discussing this important memory data structure!)
13617:53 <Pins> LarryRuane (y)
13717:53 <LarryRuane> Pins: that's what 25325 did, but this PR we're reviewing (one of mine) is a small tweak on that (already merged) PR
13817:53 <Pins> hummm
13917:54 <LarryRuane> gosh we're almost out of time, let me just throw it open to anyone, does anyone have answers to any of the questions on https://bitcoincore.reviews/27748?
14017:56 <LarryRuane> let me just quickly summarize.. when I saw this in PR 25325 (now in master): https://github.com/bitcoin/bitcoin/blob/681ecac5c2d462920cd32636eec15599a9bcf424/src/memusage.h#L186
14117:56 <LarryRuane> I wondered, why is `m.bucket_count()` part of this calculation?
14217:58 <LarryRuane> Seems like the `std::unordered_map` implementation should use the custom memory allocator to allocate the bucket array... so then its physical memory usage would be accounted for in the "chunks" (see the line just above, like 185)
14317:58 <LarryRuane> well, it turns out that this bucket array is TOO BIG to be allocated by the custom allocator!
14417:59 <LarryRuane> in that case, beyond a certain rather small allocation size, the custom allocator just does a normal system allocation (i.e. `new` in c++)
14517:59 <LarryRuane> so it's correct (in master) but kind of a kludge and maybe lucky
14618:00 <LarryRuane> in this PR (that we're reviewing) we just keep track of any system allocations that WE (the custom allocator) do, and account for the bucket array that way, without ever having to call `m.bucket_count()`
14718:01 <LarryRuane> Well, we're out of time, thanks everyone, especially the newcomers and @instagibbs and @stickies-v for your expertise!
14818:01 <LarryRuane> #endmeeting