The PR branch HEAD was eebaca76 at the time of this review club meeting.
The UTXO cache is a critical data structure in Bitcoin for both correctness and
performance. It’s responsible for maintaining a view of the spendable coins
based upon the transactions in blocks. It is often a major bottleneck during
block validation, and incorrect behavior is almost certain to lead to consensus
Because the UTXO set is accessed so frequently, we would like its contents to be
available as quickly as possible. This amounts to having as much of the set
in memory as possible. The issue is that (at the moment) the in-memory
representation of the UTXO set is more than 8GB, and obviously not all hosts
running Bitcoin have that much memory.
For that reason, the UTXO cache is stratified across several
layers: some on-disk, and some in-memory. The
-dbcache parameter controls how much memory we allocate to the in-memory
portion. As we validate blocks, we pull unspent coins that we look up from disk
into memory until we run out of allocated memory (as indicated by this
At that point, we completely empty the UTXO cache by writing it to disk by
In master, this is the only way of reconciling the state of the cache
with the leveldb store on disk, even though sometimes we Flush() not because we
have exceeded our dbcache, but to ensure durability. For example, we
the coins cache to avoid having to replay blocks if we shut down improperly.
Once we flush the cache, we are forced to read from and write to disk
for all UTXO operations, which can be notably slower depending on the
underlying disk. For this reason, separating the emptying of the cache
from the writing to disk might allow us to ensure durability without losing the
performance benefits of maintaining the cache.
A year ago, andrewtoth proposed in
PR #15218 that we flush the
UTXO set after completion of initial block download to avoid having to
reconstruct the entire set if an unclean shutdown happened before a periodic
flush, which basically amounts to a -reindex-chainstate. Other reviewers
criticized this idea because of the performance implications of emptying the
Another case that requires writing to disk without necessarily emptying the
cache can be found in the assumeutxo
loading a UTXO set from a serialized snapshot, it’s preferable to write out the
newly constructed chainstate immediately after load to avoid having to reload
the snapshot once again after a bad shutdown. Benchmarks have
that some platforms benefit significantly from maintaining the contents of the
cache after writing them to disk.
What is the “shape” of the UTXO set? What is it keyed and valued by? What are
the operations it supports? Hint: try looking at
What are the different layers of the UTXO cache?
How does CCoinsView relate to CCoinsViewCache and CCoinsViewDB?
How does CCoinsViewDB relate to DBWrapper?
What do the flags associated with CCoinsCacheEntry objects mean?
What does DIRTY mean?
What does FRESH mean?
Why do we go to the trouble of maintaining these flags?
What happens when CCoinsViewCache::Flush() is called? How do the coins
in memory make their way to the disk?
When an unspent coin is flushed to disk and then spent in the next block, we
will have done (at least) two writes and a read for that coin. If a coin is
created and spent without a Flush() in between, no disk reads or writes are
done. Why is this?
Can you describe the consensus bug referenced
<jamesob> it's a CCoinsView implementation that says "I'm one layer of a cache, but there's another coins view that sits behind me, and I'll consult that view if the user of me asks for a coin I don't have."
<jamesob> so on the top of the cache is the in-memory store. If we try to fetch a coin from that view and it fails, it'll try the fetch from CCoinsViewCatcher. CCoinsViewCatcher doesn't store *anything*, so it will always fall back to CCoinsViewDB, which ultimately uses a class called DBWrapper to consult leveldb (which is the on-disk local database we use)
<jamesob> nehan_: yeah, in the case I link to, we use the temporary CCoinsViewCache as a sort of "transaction" (in the database sense) to be able to easily "roll back" if any of the spends don't validate when connecting a block
<jamesob> so when to Flush() actually turns out to be pretty important because as you can probably guess, it not only amounts to writing to disk but (currently) it removes everything from the in-memory portion of the view structure
<jonatack> src/coins.h#L120 FRESH is a performance optimization with which we can erase coins that are fully spent if we know we do not need to flush the changes to the parent cache. It is always safe to not mark FRESH if that condition is not guaranteed.
<jamesob> jonatack: right. FRESH basically says "the cache that sits behind me has never seen this entry - I haven't flushed it yet. So if the entry gets removed, I can just remove it from my cache - no need to tell my parent."
<jnewbery> Imagine we receive a block that has transaction A and transaction B, which spends one of A's outputs. We apply that block atomically and never need to apply that output to our UTXO set when we flush
<jamesob> jnewbery: I initially thought that, but "pruned" does still have a relevant meaning: because parent caches can return null values for coins which have been spent in their caches, those null values are sometimes referred to as "pruned"
<andrewtoth> if the utxo is brand new, it is DIRTY because the db doesn't have it, so it's different. It's also FRESH, since the db doesn't have it. If it gets spent before a write, it can just be removed
<jnewbery> huh, I'll need to look into that. I thought 'pruned' meant 'all the outputs of this tx have been spent, we can remove it from our (pre-0.15 per-transaction) UTXO set. No need to discuss here though!
<lightlike> could it be interesting idea for the future to try partial flushs (according to some heuristic) so that we free up some space but hopefully keep those coins in memory that we may need soon?