Coins: Don't allow a coin to be spent and FRESH. Improve commenting. (
utxo db and indexes) Mar 18, 2020
The PR branch HEAD was 9288d747 at the time of this review club meeting.
The coins database is the structure in Bitcoin Core which keeps track of the
UTXO set. We looked at the coins database in a
previous PR Review Club
(#17487). The notes for that Review Club meeting include an
excellent description of the caching for the coins database.
The coins database is a key-value store. It was introduced in v0.8 (
1677 - Ultraprune) where it was keyed by
transaction. Pieter Wuille describes this change in a recent Chaincode Podcast
In v0.15 the coins database was changed to be keyed by transaction
( PR 10195). For more
information on that change see this blog
The ‘prune’ in ultraprune came from the fact that once all of the outputs
from a transaction had been spent, that entry could be removed or ‘pruned’
from the coins database (note that this is a different concept from
a pruned node, which deletes old block files that it no longer needs). The
‘pruned’ terminology for the coins database no longer makes sense - either a
coin is unspent (and exists in the database) or is spent (and removed from the
When the coins database was changed from per-transaction to per-txout,
some of the ‘pruned’ code comments and variable names were left in. This PR
was motivated by improving the clarity of those comments. Subsequently some
additional small changes were added.
Did you review the PR?
Concept ACK, approach ACK, tested ACK, or
(Don’t forget to put your PR review on GitHub.)
This PR touches consensus-critical code. Does it meet the high bar
for justifying a change to such critical code? Is improving
code clarity justification enough?
What does it mean for a coin to be
FRESH? What about
Coins in a coins cache can be:
spent or unspent,
FRESH or not
DIRTY or not
How many total states are there? How many of those are valid? Does this PR
Could this change have a performance impact during block validation
or Initial Block Download?
(notes added after meeting)
A summary of the validity of possible states, with explanation:
For a description of FRESH & DIRTY, see comments
this PR. For more details on possible cases, see
here. Meeting Log
1 13:00 <jnewbery> #startmeeting
7 13:00 <michaelfolkson> hi
10 13:01 <jnewbery> Hi folks. Welcome to PR Review Club! Feel free to say hi to let everyone know you're here.
11 13:01 <jnewbery> As always, feel free to jump in at any point to ask questions. No need to ask to ask, just ask!
12 13:01 <Prayank> Hello Everyone 😀
13 13:02 <jnewbery> wow. Emojis. That's novel :)
14 13:02 <jnewbery> This week's PR is rather short, but I think it's a good opportunity to dig into the coins structures a bit more. Notes are in the normal place: https://bitcoincore.reviews/18113.html
19 13:02 <jnewbery> who had a chance to review the changes? y/n
28 13:03 <jnewbery> great. Solitary confinement seems to be doing wonders for people's time for code review
29 13:03 <jnewbery> question 1: Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?
30 13:04 <jnewbery> and we might as well do question 2 at the same time: This PR touches consensus-critical code. Does it meet the high bar for justifying a change to such critical code? Is improving code clarity justification enough?
31 13:04 <fjahr> Concept ACK, the code looks good but I don't feel comfortable enough with everything especially considering reorgs
32 13:05 <fjahr> If it’s a significant improvement I think it is justified because it reduces the risk of bugs being introduced in the future and makes “real” improvements and new feature development easier
33 13:05 <willcl_ark> tACK from me. But I have to confess I was a bit less clear on the code-side this week. It was a new area for me to be looking at (i didn't attent the previous related meeting in Jan)
34 13:05 <emzy> Conxept ACK, seens the code is more clear for future changes.
35 13:05 <jnewbery> (it's my code but NACK is an acceptable answer)
36 13:05 <jkczyz> Concept ACK. IMHO, consensus-critical code *should* be clear
37 13:06 <andrewtoth> Concept ACK, but it seems like it's doing too much for how critical the code is. Could the renaming of PRUNED to SPENT and comment updates be a separate change to the consensus code change?
38 13:06 <jonatack> scripted diffs and improving commenting, even in consensus code, seem to still have a worthwhile benefit/risk ratio
39 13:06 <jnewbery> willcl_ark: the notes and meeting log from https://bitcoincore.reviews/17487 are really good, so I'd recommend reading through those if this has piqued your interest
40 13:07 <michaelfolkson> Yeah we can't leave consensus code untouched in fear of breaking something if clarity can be improved. Needs lots of review though
41 13:07 <jnewbery> andrewtoth: yes, this could definitely be split in two. One that changes the commends and variable naming, and then a second that makes the logical changes
42 13:08 <andrewtoth> It just seems like review will not be as focused with such a large diff and essentially two different things being changed
43 13:08 <jnewbery> that's fair feedback
44 13:08 <pinheadmz> jnewbery: the organization of the different cache layers is whats missing from my overall view of this PR
45 13:08 <nehan_> i think it would be better to split this PR into two: one for comments/no consensus changes and one with the changes
47 13:09 <willcl_ark> One thing I couldn't tell, is there any guarantee that this could not create a chain split? At a high level, if the change is modifying _both_ the code and the associated test, it seems like that guarantee is harder to prove...
48 13:10 <jnewbery> willcl_ark: if the logical change is bad, then yes, this could cause a consensus failure. Does anyone want to expand on how?
49 13:11 <jkczyz> regarding splitting into PRs, if the changes don't bleed across commits then I think one PR is fine. Unless the logical is contentious. But in that case the comments would need to be updated if the comment PR goes in first
50 13:11 <fjahr> consensus failure: a block would be rejected because different nodes have a different utxo set, one of them would miss a coin that is spent in the block
51 13:11 <Prayank> Interesting
52 13:12 <jnewbery> fjahr: yeah, that's what we should be scared about with any change like this
53 13:13 <jnewbery> I'm going to move onto the next question, but feel free to ask questions/give input on previous questions as we go
54 13:13 <jnewbery> What does it mean for a coin to be FRESH? What about DIRTY?
55 13:14 <amiti> DIRTY means it exists in the parent cache, but has been changed in the child cache
56 13:14 <amiti> FRESH means it doesn't exist in the parent cache
57 13:14 <jnewbery> amiti: exactly right
58 13:15 <fjahr> in the comment it says *potentially* different from the parent cache
59 13:15 <jnewbery> a Coin object can also be spent or unspent. How many different potential states are there?
61 13:15 <jnewbery> fjahr: ah yes, thanks for clarifying
62 13:15 <jnewbery> jkczyz: yes, and which of those are valid?
63 13:16 <jkczyz> good question. Haven't gone through all them yet :)
64 13:16 <amiti> if a coin is FRESH, is it *always* DIRTY, or is it *never* DIRTY?
65 13:16 <amiti> I think it has to be one of those two, right?
66 13:16 <michaelfolkson> fjahr: Why only potentially? It has been changed?
67 13:17 <lightlike> why can't a coin be "both FRESH and spent" then as one of the commit messages says? Couldn't it be created in the child cache, spent there and still be FRESH because it never existed in the parent?
68 13:17 <amiti> lightlike: my understanding is because in that case you could delete it from the child cache
69 13:18 <jonatack> jkczyz: i agree, istm the main reason to split would be if over time there was a lack of review that could be alleviated by splitting, or to get in worthwhile standalone changes if others were contentious.
70 13:18 <jnewbery> lightlike: important to distinguish between before this PR and after this PR
71 13:18 <fjahr> michaelfolkson: I am not 100% clear on this in which cases this occurs, I am referring to the comment in coins.h: "DIRTY: This cache entry is potentially different from the version in the parent cache"
72 13:18 <jnewbery> before this PR, a Coin could be spent and FRESH
73 13:19 <willcl_ark> Semantically that reads badly, but according to the two definitions above... it appears to be an acceptable state.
74 13:19 <lightlike> oh ok, got it.
75 13:20 <jnewbery> michaelfolkson: I'd need to look back at the code in validation.cpp, but I think there could be a situation where a coin gets deleted in a child cache and then re-created. It would then be the same as the parent cache but would be marked as DIRTY
76 13:20 <michaelfolkson> Ah ok
77 13:21 <michaelfolkson> Should it be marked as DIRTY in this case?
78 13:21 <jnewbery> I think that might happen in a re-org - the block that spent the coin is disconnected, which adds the coin to the cache, and then the competing block also spends the coin and it gets removed
79 13:22 <jnewbery> or something like that. The point is that if we change a coin in a cache, it MUST be marked as DIRTY
80 13:22 <michaelfolkson> Ok
81 13:22 <jnewbery> if not, changes might not get saved when the cache is flushed
82 13:23 <fjahr> michaelfolkson: To be sure you can not mark it as DIRTY you would probably have to check the parent cache all the time and that would be too much of a performance penalty I guess
83 13:23 <Prayank> I have a noob question. Sorry have not read much details but it looked interesting and this is my first meeting in core or review club. Does this FRESH and DIRTY thing in any way affect privacy when a full node interacts with other full nodes.
84 13:24 <fjahr> I don't really know, just speculating
85 13:24 <jnewbery> Prayank: noob questions are welcome here :)
86 13:24 <andrewtoth> FRESH must always be DIRTY. You cannot have a state with FRESH and not DIRTY.
87 13:24 <an4s> I am a noob here as well. Joined late so kind of outta sync
88 13:24 <amiti> andrewtoth: gotcha. thanks
89 13:25 <andrewtoth> because to be FRESH it must necessarily be different than what the parent has, since the parent does not have it
90 13:25 <jnewbery> andrewtoth: Yes, I think that's right. FRESH implies DIRTY
91 13:26 <michaelfolkson> Prayank: I would say not with these definitions of fresh and dirty. There is another context where coins from a coinbase transaction are referred to as fresh and that's where there's a potential privacy benefit
92 13:26 <fjahr> Let me try to answer Prayank: We are discussing internal management of data, this not information that is shared between nodes so it can't impact privacy
94 13:26 <jnewbery> michaelfolkson: fjahr: exactly right
95 13:26 <amiti> so what's the point of additionally marking FRESH? Is there a performance benefit when flushing the cache?
96 13:27 <amiti> (before this PR)
97 13:27 <jnewbery> there's also another meaning of 'dirty' in the wallet for addresses that have already been used, but that is unrelated to what we're talking about here
98 13:28 <jnewbery> can anyone answer amiti? Why do we have FRESH at all?
99 13:28 <willcl_ark> So will nodes from before/after this patch be marking coins in their caches differently to each other, and therefore flushing them differently (risking chainsplit)? That's what I can't quite tell..
100 13:29 <andrewtoth> If it's marked FRESH and is spent, it can be removed instead of flushed to parent
101 13:29 <pinheadmz> jnewbery: i think its bc if that coin is spent, the parent caceh never needs to know about it
102 13:29 <andrewtoth> If it's not spent, it needs to be flushed to parent because it's also DIRTY
103 13:30 <jnewbery> willcl_ark: the logical change in this PR is to do with the node's internal caching strategy. The underlying coins database shouldn't be affected
104 13:30 <amiti> oh right. ah got confused and thought thats what this PR is introducing, but actually the change here is that we don't need to fetch spent coins
106 13:30 <jnewbery> (you could potentially remove the cache entirely and just use the disk's coins database and arrive at the same view of consensus, but performance would be horrendous)
107 13:30 <earpiece> fjahr: the pr is not related to P2P layer but so not information intentionally shared but the privacy impact to be answered is, are the changes creating a side channel etc
108 13:31 <willcl_ark> jnewbery: Ok thanks.
109 13:31 <jnewbery> (but only kinda - an extra layer of cache is used during block connection)
110 13:31 <jkczyz> Are the assumptions then (1) FRESH implies DIRTY and (2) spent coins should not have entries?
111 13:32 <earpiece> could a probing peering observe additional information by using the fresh/dirty/spent/unspent to exfiltrate info about wallet etc, I don't think it could but not certain
112 13:32 <jnewbery> amiti: right. That' the change
113 13:32 <andrewtoth> Would spent and ~Dirty be an invalid state as well?
114 13:32 <jnewbery> earpiece: nice question. What do people think?
115 13:32 <fjahr> earpiece: good question. I don't think there is a chance because this is general chain state and it is not connected to the wallet
116 13:33 <jnewbery> fjahr: I agree. This is logic that all nodes do on all transactions/blocks, so leaks no information about the wallet
118 13:34 <jnewbery> that paper deals with monero/zcash, but I think it's generally interesting
119 13:35 <earpiece> ah yeah, Tramer's work. it's a good recent reference :)
120 13:35 <amiti> jkczyz: re (2) spent coins should not have entries -> if a coin is retrieved from the parent cache then spent in the child cache, it should have a DIRTY entry there until its flushed, right?
121 13:36 <jkczyz> amiti: Yeah, that was my thinking
122 13:36 <jkczyz> In which case I could only see three possible states
123 13:37 <jnewbery> amiti: jkczyz: yes, spent and DIRTY is a valid state. It means the child cache has spent the coin and that spentness needs to be propogated to the parent when the cache is flushed
124 13:37 <nehan_> jkczyz amiti: re(2) I think only coins that are created and spent in the child view should not have entries (with this change. instead of returning an entry with FRESH)
125 13:39 <andrewtoth> spent and not DIRTY is not a valid state though right?
126 13:39 <jnewbery> andrewtoth: it was before this PR. After this PR, it's not
127 13:40 <andrewtoth> ahh gotcha
128 13:40 <jnewbery> before this PR we could fetch a spent coin from the parent and keep it in the child cache. That would be spent and not-DIRTY
129 13:40 <nehan_> jnewbery: does that mean you can remove the line return !coin.IsSpent() in GetCoin?
130 13:40 <nehan_> remove the check, that is.
131 13:41 <andrewtoth> I was looking at all the commenting changes and the actual logic change escaped me. Thanks!
132 13:41 <jkczyz> jnewbery: In that case I'll change my answer to 5 valid states :)
133 13:41 <amiti> jkczyz: before, or after the PR?
135 13:42 <jnewbery> andrewtoth: that's another good reason to split this PR up!
136 13:42 <michaelfolkson> Not DIRTY is FRESH right? There are only two states. DIRTY and FRESH
137 13:43 <michaelfolkson> Oh no sorry, ignore me I'm with you
138 13:43 <nehan_> michaelfolson: no this tripped me up last time. DIRTY is not the opposite of FRESH.
140 13:44 <nehan_> oh, no, because it might be spent in the child view
141 13:44 <jnewbery> right
142 13:45 <jnewbery> yes, FRESH and DIRTY aren't the most intuitive names. Perhaps NEWCOIN and MUSTFLUSH would be better
143 13:45 <amiti> I'm seeing 4 valid states with the PR. 1) spent, not fresh, dirty 2) unspent, fresh, dirty 3) unspent, not fresh, dirty 4) unspent, not fresh, not dirty.
144 13:45 <amiti> jkczyz: hows that match up to your 5?
145 13:46 <amiti> jnewbery: NEWCOIN and MUSTFLUSH would be _so_ much better
146 13:46 <nehan_> amiti: +1
147 13:46 <michaelfolkson> Start with 8 (2^3) and then subtract from that
148 13:46 <jonatack> +1 on better naming
149 13:47 <jnewbery> well they say there are only two hard things in computer science: naming things and cache invalidation, and we've covered both of them today
150 13:47 <andrewtoth> lol
151 13:47 <michaelfolkson> Do "they" say that?! haha
152 13:47 <jkczyz> amiti: also had (5) spent, fresh, dirty since FRESH implies DIRTY. Thinking if that one makes sense now...
153 13:48 <andrewtoth> It does make sense
154 13:48 <jnewbery> jkczyz: after this PR, a spent coin can't be FRESH
155 13:48 <amiti> jkczyz: thats the one that this PR changes. if it's fresh and spent, drop it.
156 13:48 <andrewtoth> It is a candidate to remove
157 13:48 <amiti> wait wait
158 13:49 <nehan_> i got four states as well
159 13:49 <amiti> ok nevermind, don't wait
160 13:49 <andrewtoth> it can't be spent and NOT dirty
162 13:49 <jnewbery> amiti: this PR doesn't change that. The behavior is already 'if it's spent and FRESH, drop it'. The change in this PR is 'don't fetch a spent coin from a parent cache'
163 13:50 <amiti> yeah thats what I mixed up before 🤦♀️
164 13:50 <jnewbery> which introduces a new invariant: a spent coin can now *never* be fresh
165 13:50 <amiti> maybe this time I'll get it :)
166 13:51 <jnewbery> ok 10 minutes left. I have one last question, but now is also your chance to say anything you've been holding back on
167 13:51 <jnewbery> Could this change have a performance impact during block validation or Initial Block Download?
168 13:51 <jkczyz> jnewbery: so 4 then?
169 13:51 <jnewbery> oh yeah, I agree that the answer is 4
170 13:51 <nehan_> ugh but which 4? i think i had the wrong ones too: 1) not dirty & not fresh & not spent (or does this mean it wouldn't be in the child cash?) 2) dirty & not fresh & spent 3) dirty & fresh & spent 4) dirty & not fresh & spent
171 13:51 <nehan_> s/cash/cache
172 13:52 <jnewbery> bitcoin cache?
173 13:52 <amiti> nehan_: #1 would be when a child retrieves a not spent coin from the parent, right?
174 13:52 <nehan_> because i think you're saying after this change #3 will no longer happen
175 13:53 <nehan_> amiti: right. just not sure when that happens?
176 13:53 <jnewbery> My 4 are: (spent & not-FRESH & not-DIRTY), (unspent & not-FRESH & not-DIRTY), (unspent & not-FRESH & DIRTY), (unspent & FRESH & DIRTY)
177 13:54 <jnewbery> nehan_: retrieving unspent coins from a parent happens all the time. It's how we retrieve coins from the disk coins DB
178 13:55 <amiti> jnewbery: why not (spent & not-FRESH & DIRTY)?
179 13:55 <jnewbery> any thoughts about performance impact?
180 13:56 <jonatack> This PR LGTM, but naming-wise I think "FRESH" and "DIRTY" generally imply mutually exclusive states to people.
181 13:56 <michaelfolkson> Agreed
182 13:56 <jnewbery> amiti: sorry you're right. Replace my first one with (spent and not-FRESH and DIRTY)
183 13:56 <nehan_> why wouldn't the first one happen?
184 13:57 <nehan_> that seems like the case where you read a spent UTXO from the parent
186 13:57 <willcl_ark> jonatack: agree, FRESH is too close to (the opposite of DIRTY,) CLEAN
187 13:57 <nehan_> jnewbery: ah right, thanks
188 13:58 <willcl_ark> ...although CCoinsCacheEntry::Flags does explain pretty clearly what the two flags mean
189 13:58 <jnewbery> seems like we have almost unanimous agreement that the FRESH/DIRTY terminology is confusing!
191 13:59 <jnewbery> I think that naming should always be as intuitive as we can make it. Any confusion stemming from other associations with the name can eventually exhibit themselves as bugs
192 13:59 <jnewbery> ok, that's about time. I just realised that I didn't shill my own podcast episode in the notes, which is uncharacteristic of me
195 14:00 <jnewbery> #endmeeting
196 14:00 <jnewbery> thanks all!
197 14:00 <willcl_ark> so can we have an unspent && DIRTY coin?
198 14:00 <Prayank> Thanks 👍
199 14:00 <willcl_ark> thanks jnewbery
200 14:00 <michaelfolkson> I have a backlog to catch up on. The sipa one was immense
201 14:00 <michaelfolkson> Thanks
204 14:00 <emzy> Thanks jnewbery and all!
205 14:00 <jkczyz> thanks!
206 14:01 <pinheadmz> ty all confusing and interesting today
207 14:01 <jonatack> Thanks jnewbery and everyone! Good to see new people too :)
208 14:01 <jnewbery> willcl_ark: yes, a Coin can be uspent and DIRTY
209 14:01 <amiti> pinheadmz: well put 😂
210 14:01 <willcl_ark> jnewbery: is that the re-org case?
211 14:01 <fjahr> jnewbery: Since the spent coins from the parent cache are skipped there could be a small performance impact?
212 14:01 <andrewtoth> Thanks jnewbery!
214 14:02 <jnewbery> fjahr: too late!
215 14:02 <fjahr> jnewbery: i mean small performance improvement
217 14:02 <jnewbery> I don't know the answer for sure. I was just throwing it out there as a conversation starter
219 14:03 <jonatack> fjahr: I'm thinking it's possible but only benchmarking can tell. Have to measure.
220 14:03 <jnewbery> jonatack: agree with benchmarking