The PR branch HEAD was 9288d747 at the time of this review club meeting.
Notes
The coins database is the structure in Bitcoin Core which keeps track of the
UTXO set. We looked at the coins database in a previous PR Review Club
(#17487). The notes for that Review Club meeting include an
excellent description of the caching for the coins database.
In v0.15 the coins database was changed to be keyed by transaction output
(PR 10195). For more
information on that change see this blog
post.
The ‘prune’ in ultraprune came from the fact that once all of the outputs
from a transaction had been spent, that entry could be removed or ‘pruned’
from the coins database (note that this is a different concept from
a pruned node, which deletes old block files that it no longer needs). The
‘pruned’ terminology for the coins database no longer makes sense - either a
coin is unspent (and exists in the database) or is spent (and removed from the
database).
When the coins database was changed from per-transaction to per-txout,
some of the ‘pruned’ code comments and variable names were left in. This PR
was motivated by improving the clarity of those comments. Subsequently some
additional small changes were added.
This PR touches consensus-critical code. Does it meet the high bar
for justifying a change to such critical code? Is improving
code clarity justification enough?
What does it mean for a coin to be FRESH? What about DIRTY?
Coins in a coins cache can be:
spent or unspent,
FRESH or not FRESH, and
DIRTY or not DIRTY
How many total states are there? How many of those are valid? Does this PR
change that?
Could this change have a performance impact during block validation
or Initial Block Download?
Appendix
(notes added after meeting)
A summary of the validity of possible states, with explanation:
For a description of FRESH & DIRTY, see comments
before &
after
this PR.
<jnewbery> This week's PR is rather short, but I think it's a good opportunity to dig into the coins structures a bit more. Notes are in the normal place: https://bitcoincore.reviews/18113.html
<jnewbery> and we might as well do question 2 at the same time: This PR touches consensus-critical code. Does it meet the high bar for justifying a change to such critical code? Is improving code clarity justification enough?
<fjahr> If it’s a significant improvement I think it is justified because it reduces the risk of bugs being introduced in the future and makes “real” improvements and new feature development easier
<willcl_ark> tACK from me. But I have to confess I was a bit less clear on the code-side this week. It was a new area for me to be looking at (i didn't attent the previous related meeting in Jan)
<andrewtoth> Concept ACK, but it seems like it's doing too much for how critical the code is. Could the renaming of PRUNED to SPENT and comment updates be a separate change to the consensus code change?
<jnewbery> willcl_ark: the notes and meeting log from https://bitcoincore.reviews/17487 are really good, so I'd recommend reading through those if this has piqued your interest
<jnewbery> andrewtoth: yes, this could definitely be split in two. One that changes the commends and variable naming, and then a second that makes the logical changes
<willcl_ark> One thing I couldn't tell, is there any guarantee that this could not create a chain split? At a high level, if the change is modifying _both_ the code and the associated test, it seems like that guarantee is harder to prove...
<jkczyz> regarding splitting into PRs, if the changes don't bleed across commits then I think one PR is fine. Unless the logical is contentious. But in that case the comments would need to be updated if the comment PR goes in first
<fjahr> consensus failure: a block would be rejected because different nodes have a different utxo set, one of them would miss a coin that is spent in the block
<lightlike> why can't a coin be "both FRESH and spent" then as one of the commit messages says? Couldn't it be created in the child cache, spent there and still be FRESH because it never existed in the parent?
<jonatack> jkczyz: i agree, istm the main reason to split would be if over time there was a lack of review that could be alleviated by splitting, or to get in worthwhile standalone changes if others were contentious.
<fjahr> michaelfolkson: I am not 100% clear on this in which cases this occurs, I am referring to the comment in coins.h: "DIRTY: This cache entry is potentially different from the version in the parent cache"
<jnewbery> michaelfolkson: I'd need to look back at the code in validation.cpp, but I think there could be a situation where a coin gets deleted in a child cache and then re-created. It would then be the same as the parent cache but would be marked as DIRTY
<jnewbery> I think that might happen in a re-org - the block that spent the coin is disconnected, which adds the coin to the cache, and then the competing block also spends the coin and it gets removed
<fjahr> michaelfolkson: To be sure you can not mark it as DIRTY you would probably have to check the parent cache all the time and that would be too much of a performance penalty I guess
<Prayank> I have a noob question. Sorry have not read much details but it looked interesting and this is my first meeting in core or review club. Does this FRESH and DIRTY thing in any way affect privacy when a full node interacts with other full nodes.
<michaelfolkson> Prayank: I would say not with these definitions of fresh and dirty. There is another context where coins from a coinbase transaction are referred to as fresh and that's where there's a potential privacy benefit
<fjahr> Let me try to answer Prayank: We are discussing internal management of data, this not information that is shared between nodes so it can't impact privacy
<jnewbery> there's also another meaning of 'dirty' in the wallet for addresses that have already been used, but that is unrelated to what we're talking about here
<willcl_ark> So will nodes from before/after this patch be marking coins in their caches differently to each other, and therefore flushing them differently (risking chainsplit)? That's what I can't quite tell..
<jnewbery> willcl_ark: the logical change in this PR is to do with the node's internal caching strategy. The underlying coins database shouldn't be affected
<amiti> oh right. ah got confused and thought thats what this PR is introducing, but actually the change here is that we don't need to fetch spent coins
<jnewbery> (you could potentially remove the cache entirely and just use the disk's coins database and arrive at the same view of consensus, but performance would be horrendous)
<earpiece> fjahr: the pr is not related to P2P layer but so not information intentionally shared but the privacy impact to be answered is, are the changes creating a side channel etc
<earpiece> could a probing peering observe additional information by using the fresh/dirty/spent/unspent to exfiltrate info about wallet etc, I don't think it could but not certain
<amiti> jkczyz: re (2) spent coins should not have entries -> if a coin is retrieved from the parent cache then spent in the child cache, it should have a DIRTY entry there until its flushed, right?
<jnewbery> amiti: jkczyz: yes, spent and DIRTY is a valid state. It means the child cache has spent the coin and that spentness needs to be propogated to the parent when the cache is flushed
<nehan_> jkczyz amiti: re(2) I think only coins that are created and spent in the child view should not have entries (with this change. instead of returning an entry with FRESH)
<amiti> I'm seeing 4 valid states with the PR. 1) spent, not fresh, dirty 2) unspent, fresh, dirty 3) unspent, not fresh, dirty 4) unspent, not fresh, not dirty.
<jnewbery> well they say there are only two hard things in computer science: naming things and cache invalidation, and we've covered both of them today
<jnewbery> amiti: this PR doesn't change that. The behavior is already 'if it's spent and FRESH, drop it'. The change in this PR is 'don't fetch a spent coin from a parent cache'
<nehan_> ugh but which 4? i think i had the wrong ones too: 1) not dirty & not fresh & not spent (or does this mean it wouldn't be in the child cash?) 2) dirty & not fresh & spent 3) dirty & fresh & spent 4) dirty & not fresh & spent
<jnewbery> I think that naming should always be as intuitive as we can make it. Any confusion stemming from other associations with the name can eventually exhibit themselves as bugs