Coins: allow Flush() without cache drop (utxo db and indexes)

Host: jamesob  -  PR author: jamesob

The PR branch HEAD was eebaca76 at the time of this review club meeting.


The UTXO cache is a critical data structure in Bitcoin for both correctness and performance. It’s responsible for maintaining a view of the spendable coins based upon the transactions in blocks. It is often a major bottleneck during block validation, and incorrect behavior is almost certain to lead to consensus failure.

Because the UTXO set is accessed so frequently, we would like its contents to be available as quickly as possible. This amounts to having as much of the set in memory as possible. The issue is that (at the moment) the in-memory representation of the UTXO set is more than 8GB, and obviously not all hosts running Bitcoin have that much memory.

For that reason, the UTXO cache is stratified across several layers: some on-disk, and some in-memory. The -dbcache parameter controls how much memory we allocate to the in-memory portion. As we validate blocks, we pull unspent coins that we look up from disk into memory until we run out of allocated memory (as indicated by this logic).

At that point, we completely empty the UTXO cache by writing it to disk by calling CCoinsViewCache::Flush(). In master, this is the only way of reconciling the state of the cache with the leveldb store on disk, even though sometimes we Flush() not because we have exceeded our dbcache, but to ensure durability. For example, we periodically flush the coins cache to avoid having to replay blocks if we shut down improperly.

Once we flush the cache, we are forced to read from and write to disk for all UTXO operations, which can be notably slower depending on the underlying disk. For this reason, separating the emptying of the cache from the writing to disk might allow us to ensure durability without losing the performance benefits of maintaining the cache.

A year ago, andrewtoth proposed in PR #15218 that we flush the UTXO set after completion of initial block download to avoid having to reconstruct the entire set if an unclean shutdown happened before a periodic flush, which basically amounts to a -reindex-chainstate. Other reviewers criticized this idea because of the performance implications of emptying the cache.

Another case that requires writing to disk without necessarily emptying the cache can be found in the assumeutxo project. When loading a UTXO set from a serialized snapshot, it’s preferable to write out the newly constructed chainstate immediately after load to avoid having to reload the snapshot once again after a bad shutdown. Benchmarks have shown that some platforms benefit significantly from maintaining the contents of the cache after writing them to disk.


  • Did you review the PR? Concept ACK, approach ACK, ACK <commit>, or NACK?  Don’t forget to put your PR review on GitHub or ask questions.

  • What is the “shape” of the UTXO set? What is it keyed and valued by? What are the operations it supports? Hint: try looking at coins.h.

  • What are the different layers of the UTXO cache?
    • How does CCoinsView relate to CCoinsViewCache and CCoinsViewDB?
    • How does CCoinsViewDB relate to DBWrapper?
  • What do the flags associated with CCoinsCacheEntry objects mean?
    • What does DIRTY mean?
    • What does FRESH mean?
    • Why do we go to the trouble of maintaining these flags?
  • What happens when CCoinsViewCache::Flush() is called? How do the coins in memory make their way to the disk?

  • When an unspent coin is flushed to disk and then spent in the next block, we will have done (at least) two writes and a read for that coin. If a coin is created and spent without a Flush() in between, no disk reads or writes are done. Why is this?

  • Can you describe the consensus bug referenced here?

  • Are you able to describe how ccoins_cache_simulation_test works?
    • This test has a lot of randomness. How do we ensure full coverage?
    • How does this test compare to the one I’ve added?
  • Do you understand the benchmarks that are given as justification for this PR?
    • Given the sensitivity and complexity of the UTXO cache, how should we think about the trade-offs of supporting this kind of operation?

Meeting Log

  118:58 <jamesob> Okay folks, in a few minutes we're going to get started on the meeting for
  219:00 <jamesob> #startmeeting
  319:00 <fjahr> hi
  419:00 <jamesob> hi
  519:00 <jonatack> hi
  619:00 <ajonas> hi
  719:00 <jamesob> "We usually start Bitcoin Core IRC meetings with a 'hi' so it's clear who's at keyboard. Feel free to say hi, even if you arrive in the middle of the meeting!" in the words of jonatak
  819:00 <emzy> hi
  919:00 <andrewtoth> hi
 1019:00 <jonatack> jamesob: that would be jnewbery :)
 1119:01 <jnewbery> hi
 1219:01 <jamesob> quick reminder about meeting conventions:
 1319:01 <jamesob> You don't have to ask to ask a question (e.g. "I have a question about x but I don't know if it's on-topic?"). Just go ahead and ask. If it's off-topic we'll tell you.
 1419:01 <jamesob> I'm here to help moderate, not to lead. You don't need to wait for me to ask a specific question — just jump in at any point.
 1519:01 <nehan_> hi
 1619:01 <jamesob> (also in the words of one of the other Js)
 1719:01 <kanzure> hi
 1819:02 <jamesob> okay, so today we're going to be covering
 1919:02 <jamesob> Who had a chance to read the PR? (y/n)
 2019:02 <fjahr> y
 2119:02 <andrewtoth> y
 2219:02 <jonatack> y
 2319:02 <nehan_> .5*y
 2419:02 <jamesob> okay we can do floats too
 2519:02 <lightlike> hi
 2619:03 <jnewbery> ~.5
 2719:03 <emzy> y
 2819:03 <jkczyz> .5 hi
 2919:03 <jamesob> In summary, the PR allows writing the contents of the UTXO set down to disk without wiping out the in-memory cache, which has some performance benefits.
 3019:04 <jamesob> The parameter that allows this isn't actually used in the PR.
 3119:04 <jamesob> Who had a chance to review the PR? (y/n)
 3219:05 <fjahr> wip
 3319:05 <emzy> n
 3419:05 <jonatack> first commit
 3519:05 <andrewtoth> 0.5 y
 3619:05 <jamesob> (well, the parameter isn't actually used aside from accompanying tests)
 3719:05 <jonatack> + concept
 3819:05 <lightlike> also only 1st commit
 3919:06 <jamesob> that's great. Before we start talking about things like cache flushing, though, and in-memory vs. on disk we should probably get some things straight conceptually
 4019:06 <jamesob> Can anyone tell me what the "shape" of the UTXO set is? I.e. what is it keyed and valued by?
 4119:06 <fjahr> key: COutPoint vlaue: Coin
 4219:07 <pelt> hi
 4319:07 <jamesob> correct. what's an outpoint?
 4419:07 <andrewtoth> key: txid:vout value: scriptpubkey,amount,height in block
 4519:08 <fjahr> txid and output pos
 4619:08 <jonatack> the data structure used to refer to a transaction output
 4719:08 <michaelfolkson> "The data structure used to refer to a particular transaction output, consisting of a 32-byte TXID and a 4-byte output index number (vout)."
 4819:08 <jamesob> andrewtoth: that's mostly correct, but you're missing a few important things in terms of the UTXO set data structure
 4919:09 <jonatack> i cribbed that from michaelfolkson
 5019:09 <jamesob> instead of being valued by a Coin, which has the constituent data you've listed, it's actually valued by a CCoinsCacheEntry, which has some extra stuff
 5119:09 <jamesob> anyone know what that extra stuff is?
 5219:10 <jnewbery> flags
 5319:10 <andrewtoth> it appears to just be flags for DIRTY and FRESH and then the Coin
 5419:10 <nehan_> flags
 5519:10 <jamesob> jnewbery andrewtoth nehan_: bingo. we'll come back to those in a bit
 5619:10 <jamesob> this wasn't *always* the structure of the UTXO set. can anyone tell me how it looked previously?
 5719:11 <jamesob> long story short, it used to be keyed by txid and valued by an array of coins. more info in this PR:
 5819:12 <jamesob> (but that's an aside)
 5919:12 <jamesob> okay, so now let's talk about the different layers of the UTXO set's implementation
 6019:12 <jnewbery> hint: it changed in 0.15
 6119:13 <jamesob> there are a few different classes to be aware of here. Who can tell me what the CCoinsView class is?
 6219:13 <andrewtoth> ahh yes i remember, there was a dos vector there where it would have to pull in lots of coins into memory doing it that way
 6319:13 <jamesob> andrewtoth right, and I think it was in general less efficient for access
 6419:14 <jamesob> hint:
 6519:14 <lightlike> did every node that upgraded to 0.15 have to reindex-chainstate back then?
 6619:14 <fjahr> It basically represents the whole UTXO set
 6719:14 <jamesob> I believe so - there was a migration function written somewhere
 6819:15 <jamesob> fjahr: in a sense. CCoinsView defines the interface of the UTXO set, independent of any given storage medium
 6919:15 <nehan_> CCoinsView is the common interface for all the different layers
 7019:15 <jamesob> nehan_: right!
 7119:15 <jonatack> A txout dataset abstraction
 7219:15 <jnewbery> upgrade code here:
 7319:15 <jamesob> so there isn't an actual instantiation of a CCoisnView anywhere, but there are a few concrete subclasses
 7419:15 <nehan_> I don't really understand exactly *why* there have to be so many layers, but maybe we will get to that
 7519:16 <jamesob> nehan_: yep, we'll get there pretty shortly
 7619:16 <jamesob> can anyone tell us what CCoinsViewDB does?
 7719:16 <jamesob> (which is a subclass of CCoinsView, unsurprisingly)
 7819:17 <jamesob> (
 7919:17 <andrewtoth> it's an implementation of the interface to represent the state of the utxo set in leveldb
 8019:17 <jamesob> andrewtoth: right!
 8119:17 <nehan_> manages the leveldb
 8219:18 <jamesob> you can think of this in very coarse terms as being the conduit to storing and reading UTXOs to and from disk
 8319:19 <jamesob> so there's another class called CCoinsViewBacked, can anyone tell us what that does? it's a little abstract
 8419:19 <jamesob> (
 8519:20 <nehan_> I guess it's purpose is so that you can change base?
 8619:21 <nehan_> it's just a 1:1 wrapper around whatever is backing it, right?
 8719:21 <fjahr> backed by something else than the db I guess
 8819:21 <jamesob> nehan_: yep, basically
 8919:21 <jamesob> it's a CCoinsView implementation that says "I'm one layer of a cache, but there's another coins view that sits behind me, and I'll consult that view if the user of me asks for a coin I don't have."
 9019:22 <jamesob> so there's a subclass of CCoinsViewBacked called CCoinsViewCache: surely someone can tell us what this does
 9119:22 <jonatack> It seems to be used in case of shutdown
 9219:22 <jonatack> in CCoinsViewErrorCatcher
 9319:23 <jamesob> jonatack: yep, that's an interesting but sort of tangential use of CCoinsViewBacked
 9419:23 <jonatack> src/coins.h#L338-359
 9519:23 <nehan_> jamesob: Couldn't all the sub classes have just overriden CCoinsView directly? Why have an intermediate CCoinsViewBacked?
 9619:23 <jamesob> CCoinsViewErrorCatcher exists basically so that we can propagate leveldb errors to sensible error messages in the GUI
 9719:23 <andrewtoth> it maintains an in memory cache of the utxo set, *backed* by the leveldb representation of the utxo set
 9819:24 <lightlike> CCoinsViewCache seems to be the actual in-memory cache of limited size that is backed by the db
 9919:24 <jamesob> nehan_: good question. It turns out (I think) that is because CCoinsViewDB doesn't have a cache behind it, and so it subclasses CCoinsView directly
10019:24 <jonatack> oh right, src/coins.h#L209
10119:24 <jamesob> it's the only view that does so
10219:24 <jamesob> andrewtoth lightlike: yep, exactly right
10319:25 <jamesob> so! to get around to nehan_'s earlier question, we arrange different instances of these caches to relate to one another
10419:25 <jamesob> can someone tell us what this "stack" of coins views looks like?
10519:26 <nehan_> this was very helpful:
10619:26 <jnewbery> nehan_: +1
10719:26 <jonatack> agreed
10819:27 <jamesob> thanks for linking :)
10919:28 <jamesob> so basically, the arrangement looks like this:
11019:28 <jamesob> CCoinsViewDB (disk; i.e. canonical store) <- CCoinsViewCatcher (leveldb errors to GUI) <- CCoinsViewCache (in-memory cache)
11119:28 <emzy> nehan_: +1
11219:29 <jamesob> so on the top of the cache is the in-memory store. If we try to fetch a coin from that view and it fails, it'll try the fetch from CCoinsViewCatcher. CCoinsViewCatcher doesn't store *anything*, so it will always fall back to CCoinsViewDB, which ultimately uses a class called DBWrapper to consult leveldb (which is the on-disk local database we use)
11319:30 <jamesob> so as you can see this is pretty complicated.
11419:30 <jamesob> why don't we just do everything in memory?
11519:30 <lightlike> will said coin be automatically added to the in-memory cache in this case?
11619:31 <jamesob> lightlike that's a good question and we're jumping ahead a little bit, but yes: CCoinsViewCache pulls any fetched coin into its cache after it fetches it from its parent cache
11719:31 <nehan_> jamesob: "on the top of the cache is the in-memory store" --> do you mean on top of CCCoinsViewCatcher? Is there another in-memory store on top of CCoinsViewCache?
11819:32 <jonatack> s/CCoinsViewCatcher/CCoinsViewErrorCatcher/ ??
11919:32 <jamesob> sorry that was poorly phrased on my part - what I mean is that the CCoinsViewCache is the top layer of the cache
12019:32 <jamesob> jonatack: right, thanks
12119:32 <jamesob> nehan_: your second question is a great one
12219:32 <nehan_> jamesob: ah, got it thanks
12319:32 <jamesob> yes, sometimes we stack CCoinsViewCaches on top of one another
12419:33 <lightlike> not automatically added because of memory limitations - the utxo set is too large for many computers.
12519:33 <jonatack> jamesob: thanks, makes more sense to me now :D
12619:33 <jamesob> Sometimes in-memory caches (CCoinsViewCache instances) sit on top of one another, e.g. when connecting a block ( to enforce atomicity.
12719:33 <michaelfolkson> jamesob: Memory restrictions will cause crashes
12819:33 <jamesob> lightlike michaelfolkson: correctamoondo
12919:34 <jamesob> and beyond that, there are durability considerations: if we crash while everything is in memory, the UTXO set gets wipe out and we'd have to reindex on startup
13019:35 <jamesob> so then why don't we just use leveldb for everything, skip the memory stuff?
13119:35 <michaelfolkson> Performance, longer sync time?
13219:35 <emzy> just a thought, it is to slow.
13319:35 <nehan_> jamesob: this seems like a really weird way to use what should be just an in-memory cache. it's sort of being used as a way of atomically applying updates?
13419:36 <jamesob> yup. Memory is way slower than disk as many of you probably know (
13519:36 <jamesob> oops, vice versa
13619:36 <jamesob> disk is way slower than memory
13719:36 <jonatack> whew
13819:36 <nehan_> i'm surprised leveldb doesn't do its own caching
13919:37 <jamesob> nehan_: good point. it does to a certain extent, but it isn't sophisticated and we set it to be pretty limited
14019:37 <jamesob> that'd be worth looking into
14119:38 <jamesob> nehan_: yeah, in the case I link to, we use the temporary CCoinsViewCache as a sort of "transaction" (in the database sense) to be able to easily "roll back" if any of the spends don't validate when connecting a block
14219:38 <jnewbery> jamesob: in that example you used of a cache on top of another, this final call to flush(): is what causes the UTXO set to be updated in the case that the block is successfully connected? If block connection fails then we can just throw away `view` and the underlying UTXO set doesn't
14319:38 <jnewbery> get updated?
14419:39 <jamesob> exactly - the changes don't propagate to our "actual" UTXO set until we call `Flush()` there
14519:40 <jamesob> so when to Flush() actually turns out to be pretty important because as you can probably guess, it not only amounts to writing to disk but (currently) it removes everything from the in-memory portion of the view structure
14619:40 <jamesob> this PR, of course, makes the latter part of that optional
14719:40 <jamesob> but we're still not quite done talking about the CoinsView internals. let's return to those flags we were talking about on CCoinsCacheEntry
14819:41 <jamesob> in order to make efficient use of the in-memory cache, we want to be able to avoid writing to and reading from disk if, say, a coin is created and spent without running low on memory
14919:42 <jamesob> (when we run low on memory, we `Flush()`)
15019:42 <jamesob> those flags are key in helping us do that
15119:42 <jamesob> so what does the FRESH flag indicate when it is true for a cache entry?
15219:42 <jonatack> src/coins.h#L120 FRESH is a performance optimization with which we can erase coins that are fully spent if we know we do not need to flush the changes to the parent cache. It is always safe to not mark FRESH if that condition is not guaranteed.
15319:43 <nehan_> I found that comments confusing. what is the "condition" exactly?
15419:43 <nehan_> s/comments/comment
15519:43 <jamesob> jonatack: right. FRESH basically says "the cache that sits behind me has never seen this entry - I haven't flushed it yet. So if the entry gets removed, I can just remove it from my cache - no need to tell my parent."
15619:43 <jamesob> nehan_: I think the "condition" is "my parent hasn't seen this entry"
15719:43 <andrewtoth> this is basically an optimization to avoid writing a transient utxo. if the db hasn't seen it and it is spent, it never needs to be written
15819:44 <jamesob> so there are certain cases where you may not know if leveldb has seen the entry or not, i.e. when we are recovering from a crash that happened during a Flush()
15919:44 <pelt> If the system crashes while flush() is running, it would require the node to reindex. Is that correct?
16019:44 <jonatack> i read it as generally, when in doubt, don't mark as FRESH but it's nice if we can
16119:44 <jamesob> pelt: luckily that isn't the case - there is a mechanism built into CCoinsViewDB that allows us to recover from such a crash without reindex
16219:45 <nehan_> in my experience one usually uses the term "dirty" to indicate that an entry is different in the cache than in its underlying store
16319:45 <jnewbery> nehan_: that's what we're using here
16419:45 <jnewbery>
16519:45 <nehan_> ok. fresh means was dirty but not anymore?
16619:46 <pelt> jamesob soming to the effect of, since we can tell node when we're done even if we crash on flush() disregard the delta on CCoinsViewDB?
16719:46 <andrewtoth> nehan_ fresh means dirty and not seen in db
16819:46 <jnewbery> Imagine we receive a block that has transaction A and transaction B, which spends one of A's outputs. We apply that block atomically and never need to apply that output to our UTXO set when we flush
16919:46 <jamesob> nehan_: fresh implies dirty, I think
17019:46 <nehan_> o_O
17119:47 <jnewbery> because we've marked A's output as FRESH, when we spend it, we can just remove it from our cache instead of flushing it at the end of block connection
17219:47 <jamesob> jnewbery: right!
17319:47 <fjahr> hm, can something be dirty if there is no underlying store yet? never thought of that...
17419:47 <nehan_> jnewbery: thanks
17519:47 <jnewbery> I think it's worth pointing out that the comment is a bit out of date (the words 'pruned' and 'fully spent' only made sense when the UTXO was tracking transactions instead of outputs
17619:48 <jamesob> DIRTY basically means "this entry somehow differs from the entry my parent would have"
17719:48 <lightlike> i understand the use case for fresh, but not yet the one for DIRTY.
17819:48 <jnewbery> just ignore the words 'pruned' and 'fully'
17919:49 <jamesob> jnewbery: I initially thought that, but "pruned" does still have a relevant meaning: because parent caches can return null values for coins which have been spent in their caches, those null values are sometimes referred to as "pruned"
18019:49 <jamesob> anyway, suffice to say: this stuff can get complex!
18119:49 <jamesob> the way that fresh and dirty flags work, when they are set, and why is potentially the hardest thing to understand about the innerworkings of the UTXO view structure
18219:50 <andrewtoth> if the utxo is brand new, it is DIRTY because the db doesn't have it, so it's different. It's also FRESH, since the db doesn't have it. If it gets spent before a write, it can just be removed
18319:50 <jnewbery> huh, I'll need to look into that. I thought 'pruned' meant 'all the outputs of this tx have been spent, we can remove it from our (pre-0.15 per-transaction) UTXO set. No need to discuss here though!
18419:50 <jamesob> but they allow us to fully leverage the in-memory cache by pruning coins which are created and spent in between disk flushes without ever touching the disk
18519:51 <jamesob> andrewtoth: right
18619:51 <nehan_> andrewtoth: thank you! that's a good explanation.
18719:51 <jamesob> but a coin that has already been written to the parent cache but is then spent should be marked DIRTY
18819:51 <nehan_> so the writer needs to be confident that this coin is absolutely not in the parent cache.
18919:51 <nehan_> when marking things fresh
19019:52 <jamesob> yep, right
19119:52 <jamesob> so - the bulk of this pull request is basically this chunk of code:
19219:52 <jamesob> can someone describe it in high level terms?
19319:53 <jamesob> (we're going to more or less stop in 5 minutes in case you all have any burning questions)
19419:55 <jnewbery> It's a flush, but without emptying the cache. You still need to clear the flags because they're invalidated by writing to the layer below
19519:55 <jonatack> jamesob: fwiw i wondered if this first commit wouldn't be better separated into 2 commits, one that adds Sync() and one the adds bool erase
19619:55 <jonatack> that*
19719:55 <jamesob> jnewbery: right! thanks
19819:56 <jamesob> jonatack: yeah could be - would need to look at it again, but it may be a small enough change that that'd be a stylistic call
19919:56 <jamesob> anyway, early in the PR there was a consensus bug based upon not resetting those flags when flushing without erase
20019:56 <jamesob> I'm curious if anyone can tell us how that bug works
20119:57 <jamesob> (and/or chime in with whatever questions you'd like because we're almost out of time)
20219:57 <nehan_> i think it's leaving a coin marked as "fresh" even though it has been sync'd to the parent
20319:58 <emzy> Any practical numbers of the speedup?
20419:58 <nehan_> so you might never sync the spend of the coin
20519:58 <jonatack> the logical danger would be setting FRESH when it shouldn't be
20619:58 <nehan_> and i suppose you could spend a utxo again
20719:58 <lightlike> could it be interesting idea for the future to try partial flushs (according to some heuristic) so that we free up some space but hopefully keep those coins in memory that we may need soon?
20819:58 <jamesob> nehan_: right!
20919:58 <nehan_> good catch by sjors
21019:58 <jamesob> emzy: see and subsequent bench results
21119:59 <jamesob> yeah, all due props to sjors
21219:59 <michaelfolkson> Any good resources for better understanding Core's use of LevelDB and databases? This looks promising
21319:59 <jamesob> lightlike: yep, I have a branch that tried this but surprisingly I didn't get good results
21420:00 <jonatack> yes, bien joué monsieur sjors * claps
21520:00 <jamesob> michaelfolkson: there aren't a ton of resources on that surprisingly, but I can recommend a very broad talk I did back in 2018:
21620:00 <jamesob> there's a video for that somewhere too
21720:00 <jnewbery> michaelfolkson: I hadn't seen that before, but from a very quick skim it looks decent
21820:01 <jamesob> but in general, best thing to do is review the database/coins code in coins.{h,cpp}, txdb.{h,cpp}, and FlushStateToDisk in validation.cpp
21920:01 <michaelfolkson> jamesob: Cool thanks, I will watch that.
22020:01 <jamesob> okay folks, that's all the time we have
22120:01 <jamesob> thanks for showing up!
22220:01 <jamesob> #endmeeting
22320:01 <fjahr> thanks jamesob
22420:01 <nehan_> thanks!
22520:01 <andrewtoth> thanks jamesob! and thanks to all organizers
22620:01 <jnewbery> That was great. Thanks James!
22720:01 <lightlike> thanks! learned a lot today!
22820:01 <emzy> tnx jamesob
22920:02 <billygarrison> Thanks James and participants! I learned a lot as a fly on the wall.
23020:02 <jonatack> michaelfolkson: interesting. that link was by grokchain, who was present at review club lately
23120:02 <michaelfolkson> Thanks for your patience jamesob! I'll get there one day haha
23220:02 <ajonas> Talk that nehan_ and James referenced is here for those interested:
23320:02 <jamesob> michaelfolkson: keep at it!
23420:02 <jnewbery> You're welcome to come back and do another assumeUTXO PR whenever you want!
23520:02 <jonatack> jamesob: awesome meeting and notes. learned tons.
23620:03 <jonatack> thanks!
23720:03 <billygarrison> jamesob excellent intro written on the review page btw - gave great context for a newbie like me!
23820:03 <jamesob> jnewbery: careful what you wish for!
23920:03 <pelt> thanks all
24020:03 <jamesob> billygarrison: glad to hear it
24120:03 <jamesob> jonatack jnewbery: thanks for putting this together. great idea!
24220:04 <jonatack> +1 assume-utxo follow-up
24320:04 <pelt> was very helpful to better understand CCoinsView* layering
24420:05 <jamesob> pelt: glad to hear. it's pretty complicated
24520:07 <jnewbery> here's a write-up of the UTXO changes in v0.15:
24620:07 <jnewbery> (I didn't want to derail the conversation during the meeting)
24720:08 <michaelfolkson> Thanks