Separate UTXO set access from validation functions (validation)

https://github.com/bitcoin/bitcoin/pull/32317

Host: stickies-v  -  PR author: TheCharlatan

The PR branch HEAD was 76a8f22c5c5cf48a9c36cc40db9224f0454917d0 at the time of this review club meeting.

Notes

Motivation

The bitcoinkernel project carves out validation logic into a separate, stateful library, allowing it to be used by other applications. Because the bitcoinkernel project has opted for an incremental approach rather than a complete rewrite, its current interface is still strongly influenced by Bitcoin Core’s architecture, requirements and assumptions.

The UTXO set is a crucial component of Bitcoin Core’s architecture, but it is an implementation detail, and one that other Bitcoin node implementations may choose not to implement. For example:

  • Utreexo nodes, such as Floresta, rely on an accumulator instead of a UTXO set
  • SwiftSync is near-stateless, and does not have the concept of a UTXO set.

The motivation behind this PR is to, in future work, allow kernel users without a UTXO set to validate a transaction by providing the validation function the specific UTXTOs it is spending.

UTXO set

The UTXO set is managed by Chainstate::m_coins_views, which contains an in-memory CCoinsCacheView m_cacheview caching layer and a canonical on-disk CCoinsViewDB m_dbview layer. The m_cacheview is instantiated with a base pointer to m_dbview, allowing CCoinsCacheView::GetCoin() to automatically load a UTXO (or Coin) from disk if it does not exist in the cache.

Approach

In the first 4 commits, this PR reduces coupling between transaction validation functions and the UTXO set by requiring the caller to first fetch the Coins or CTxOuts they require and passing those to the validation function, instead of letting the validation function access the UTXO set directly.

In subsequent commits, ConnectBlock()’s dependency on the UTXO set is removed entirely by carving out the remaining logic that requires UTXO set interaction into a separate SpendBlock() method.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. Why is carving out the new SpendBlock() function from ConnectBlock() helpful for this PR? How would you compare the purpose of the two functions?

  3. Do you see another benefit of this decoupling, besides allowing kernel usage without a UTXO set?

  4. Especially during IBD, transaction validation must be fast. Are there any changes where you have concerns about performance? Which measures can you identify this PR has adopted to minimize the performance impact of this refactor?

  5. SpendBlock() takes a CBlock block, CBlockIndex pindex and uint256 block_hash parameter, all referencing the block being spent. Why do we need 3 parameters to do that?

  6. CCoinsViewCache has two methods AccessCoin() and GetCoin() that both return a Coin-like type. What is the difference between both methods, and when should which be used?

  7. The first commits in this PR refactor CCoinsViewCache out of the function signature of a couple of validation functions. Does CCoinsViewCache hold the entire UTXO set? Why is that (not) a problem? Does this PR change that behaviour?

  8. Why does commit a7e4132 use explicit template instantiation for GetP2SHSigOpCount()?

  9. Following on the previous question, why is there explicit instantiation of the <std::span<std::reference_wrapper<const Coin>>> template when a std::reference_wrapper<T> already implicitly converts to T&?

Meeting Log

  117:00 <stickies-v> #startmeeting
  217:00 <corebot> stickies-v: Meeting started at 2025-05-14T17:00+0000
  317:00 <corebot> stickies-v: Current chairs: stickies-v
  417:00 <corebot> stickies-v: Useful commands: #action #info #idea #link #topic #motion #vote #close #endmeeting
  517:00 <corebot> stickies-v: See also: https://hcoop-meetbot.readthedocs.io/en/stable/
  617:00 <corebot> stickies-v: Participants should now identify themselves with '#here' or with an alias like '#here FirstLast'
  717:00 <stickies-v> hey everyone, welcome to the review club!
  817:00 <stringintech> Hi!
  917:00 <brunoerg> hi
 1017:00 <Davidson> Hi
 1117:00 <monlovesmango> hey
 1217:00 <marcofleon> yo
 1317:00 <JoaoLeal> Hi
 1417:00 <stickies-v> today we'll be covering https://bitcoincore.reviews/32317, titled "Separate UTXO set access from validation functions"
 1517:01 <enochazariah> hello
 1617:01 <marcofleon> woo!
 1717:01 <stickies-v> is anyone here for the first time? feel free to say hi, even if you're just lurking
 1817:01 <TheCharlatan> hello!
 1917:01 <Davidson> stickies-v: I haven't participated in a while. Ho hi :)
 2017:01 <Davidson> so*
 2117:02 <lightlike> hi
 2217:02 <pablomartin_> hello
 2317:02 <stickies-v> well, welcome back Davidson! are you the floresta davidson, by any chance?
 2417:02 <Davidson> Yeap, it's me
 2517:03 <stickies-v> oh nice, we've got kernel users, contributors, reviewers and enthusiasts all in one meeting!
 2617:03 <stickies-v> (reviewing IS contributing, of course)
 2717:03 <stickies-v> did anyone get the chance to review the notes and/or PR (y/n)?
 2817:04 <Davidson> y
 2917:04 <stringintech> y
 3017:04 <marcofleon> y, checked out the notes and an initial code review of the PR
 3117:04 <brunoerg> yes
 3217:04 <monlovesmango> yes
 3317:04 <enochazariah> yes, checked the code out
 3417:04 <TheCharlatan> y :D
 3517:05 <stickies-v> oh my that's a lot of prep, excellent
 3617:05 <stickies-v> let's dive right into the questions, we'll start off with the more conceptual ones and then progress into code questions. as always, review club is async so feel free to continue conversation on previous questions when we move on, or raise any other questions you have!
 3717:06 <stickies-v> 2. Why is carving out the new SpendBlock() function from ConnectBlock() helpful for this PR? How would you compare the purpose of the two functions?
 3817:08 <monlovesmango> it will remove all UTXO set interactions from ConnectBlock which makes ConnectBlock much more modular. SpendBlock encapsulates all the UTXO interactions that are needed when connecting new block
 3917:09 <marcofleon> carving it out is helpful because using ConnectBlock no longer requires the utxo set. So you can do a lot of block validation without a utxo set
 4017:09 <Davidson> Before this PR, ConnectBlock also looked utxos up, and therefore needed access to the utxo set. Now, the feching of utxos is defered to the new function. Making ConnectBlock work without access to the utxo set
 4117:10 <stickies-v> right, not every full node implementation has a UTXO set, so only being able to connect a block by passing UTXO set as a parameter seems rather opinionated for how Core works
 4217:11 <stickies-v> follow-up question: does the new ConnectBlock allow for stateless validation?
 4317:12 <marcofleon> do you count blockundo as non-state? if that makes sense
 4417:12 <monlovesmango> yes..? what does stateless validation mean? not needing the statefulness of utxo set?
 4517:12 <marcofleon> i guess the new Connectblock would still requite some state then?
 4617:13 <marcofleon> but it's prevalidated by SpendBlock
 4717:13 <stickies-v> blockundo is a ConnectBlock parameter, so that does not count as state
 4817:14 <stringintech> I guess using the fJustCheck as true could do this for us?
 4917:14 <Davidson> I would say no. Because it still uses a ref to the block tree index.
 5017:14 <monlovesmango> ah ok, so stateless validation would be any state that is not passed in by parameters?
 5117:15 <stickies-v> what I'm trying to get at is: if someone has a serialized block, and they want to check if it's valid, can they validate it with a function (similar to) ConnectBlock?
 5217:15 <stickies-v> so state in this case would be from the Chainstate instance of which ConnectBlock is a method
 5317:17 <TheCharlatan> there is still extra state require, for example they still need the deployment bits
 5417:18 <marcofleon> those pesky deployment bits
 5517:18 <stickies-v> Davidson: yeah ConnectBlock has side-effects (as clearly implied by the name too) in that it e.g. updates the chainstate, and the block index
 5617:19 <marcofleon> i thought SpendBlock updates the chainstate?
 5717:20 <marcofleon> unless i'm mistaking what the chainstate is here...
 5817:20 <stickies-v> oh sorry yes that's confusing naming 🙈
 5917:22 <TheCharlatan> I guess you meant members of the Chainstate class?
 6017:23 <stickies-v> i was thinking about connecting the block to the chaintip (i.e. state of the Chain) but now i'm not actually sure if that's ahppening in ConnectBlock
 6117:24 <stickies-v> you're right marcofleon that ConnectBlock does not update the UTXO set, commonly called the chainstate
 6217:25 <TheCharlatan> ah, yes, that is moved out of ConnectBlock in the commit " validation: Move SetBestBlock out of ConnectBlocky
 6317:25 <stickies-v> ah right!
 6417:26 <stickies-v> 3. 3. Do you see another benefit of this decoupling, besides allowing kernel usage without a UTXO set?
 6517:27 <marcofleon> maybe easier testing
 6617:27 <Davidson> I would say it's cleaner. Since you have a clear separation of concerns
 6717:27 <enochazariah> i see modularity as another benefit of decoupling
 6817:27 <marcofleon> by splitting up parts of validation into separate functions
 6917:28 <stickies-v> yeah improved testability was the main benefit i was thinking of, reusability/modularity might be a win too even though the potential there is probably a bit more limited since there's only so many places these functions can be used
 7017:28 <monlovesmango> it also helps with code maintainability as there is separation of concerns
 7117:28 <stickies-v> anything else?
 7217:29 <Davidson> Makes it easier to validate blocks in parallel (e.g. swift-sync)
 7317:30 <stickies-v> yes! but isn't that a direct effect of the "allowing kernel usage without a UTXO set" bit?
 7417:30 <Davidson> yeah, I think so
 7517:31 <stickies-v> TheCharlatan observed a few other minor improvements:
 7617:32 <stickies-v> reducing the amount of UTXO set lookups (by just doing it once at the beginning) can have minimal performance improvements
 7717:33 <stickies-v> and then from a maintainability / code clarity perspective: explicitly passing objects (coins) through a callstack is easier to reason about and has less thread safety issues etc than having each frame do its own map lookup
 7817:34 <stickies-v> 4. Especially during IBD, transaction validation must be fast. Are there any changes where you have concerns about performance?
 7917:38 <marcofleon> maybe iterating over the txs in the block twice now? doesn't seem like a big deal though
 8017:38 <stickies-v> Or if you don't have any concerns: Which measures can you identify this PR has adopted to minimize the performance impact of this refactor?
 8117:38 <TheCharlatan> there are a few additional vector allocations, where instead of retrieving elements one-by-one, references to elements are filled into a vector and then passed by spans.
 8217:38 <Davidson> I didn't see anything extraordinary. Maybe use a little bit more memory?
 8317:39 <Davidson> TheCharlatan: oh you were faster than me :D
 8417:39 <TheCharlatan> heh :D
 8517:40 <stickies-v> yeah vector allocations was the main thing I could see too. This PR relies quite heavily on passing references rather than copying, so the overhead there should be quite minimal
 8617:40 <stickies-v> of course, these references can introduce lifetime risks, so that might be something to consider in your review
 8717:41 <lightlike> not so important for the question, but I don't agree with the "especially" in the question : In my opinion, validation / block propagation being fast is much more important at the tip than during IBD, which is a one-time thing.
 8817:42 <Davidson> stickies-v: I assume buildinga spam<T> from a vector<T> is almost free? (Rust bro here :D )
 8917:42 <stickies-v> lightlike: oh, interesting! yeah, i agree that at the tip performance is also crucial, and it's maybe a bit meaningless to compare which is more important - could have phrased that better, sorry!
 9017:42 <monlovesmango> I did have a question about why we pass block_hash as a parameter to ConnectBlock when block.GetHash() is still called when asserting Assume(block.GetHash() == block_hash)
 9117:44 <stickies-v> Davidson: yes, I'm not enough of a C++ expert to give you the sound answer, but my understanding is that a span allows to iterate over the container with almost no overhead, like a view
 9217:44 <Davidson> monlovesmango: I believe this is the topic for question number 5, no?
 9317:45 <stickies-v> it is indeed!
 9417:45 <Davidson> stickies-v: nice, so it's basically one vec allocation per block with this block's coins. And then we pass refs to that vec
 9517:45 <marcofleon> is block_hash just sort of used as sanity check in those Spendblock assertions?
 9617:46 <TheCharlatan> marcofleon: yes :)
 9717:46 <stickies-v> 5. `SpendBlock()` takes a `CBlock block`, `CBlockIndex pindex` and `uint256 block_hash` parameter, all referencing the block being spent. Why do we need 3 parameters to do that?
 9817:46 <monlovesmango> Davidson: haha oops
 9917:47 <marcofleon> the CBlock contains the actual txs that are used to check against and update the utxo set so gotta have that
10017:47 <Davidson> So, mostly sanity check...? I've seen that the pindex arg also gets used to figure out if our ChainState represents the previous block's state, so we need this one
10117:48 <stickies-v> monlovesmango: `Assume` statements are only compiled into debug builds, so this check should not incur any overhead on non-debug builds
10217:48 <stickies-v> otherwise, indeed it would be a bit silly to pass `block_hash` as a performance optimization. good spot!
10317:48 <monlovesmango> I assumed the block_hash parameter was for performance reasons, so that we don't have to re-hash
10417:48 <monlovesmango> stickies-v: ok!! that makes a lot more sense :)
10517:49 <monlovesmango> also good to know that about `Assume` statements
10617:49 <marcofleon> pindex also used for the height to cehck against bip30 stuff
10717:50 <marcofleon> and for UpdateCoins
10817:51 <monlovesmango> isn't bip30 stuff now in SpendBlock?
10917:52 <stickies-v> monlovesmango: yes, the question is about `SpendBlock`
11017:52 <monlovesmango> omg ignore sorry..
11117:53 <marcofleon> "bip30 stuff" is my best summarizing of that whole chunk of code/text
11217:55 <stickies-v> but why do we need to pass `pindex`? Can't it be retrieved from `block`?
11317:56 <stickies-v> (and if it can, should it?)
11417:57 <Davidson> You mean the prevblock? I think it's better to get the prevblock straight from pindex to make sure we actually building on tip?
11517:57 <stickies-v> We're almost at time, so gonna launch the next/last question already
11617:57 <stickies-v> 7. The first commits in this PR refactor `CCoinsViewCache` out of the function signature of a couple of validation functions. Does `CCoinsViewCache` hold the entire UTXO set? Why is that (not) a problem? Does this PR change that behaviour?
11717:58 <stringintech> I guess we need another disk access for that which is not good to do? (previous question)
11817:59 <stickies-v> Davidson: no, I mean dropping the `pindex` argument from the `SpendBlock` fn signature, and just letting `SpendBlock` get a `CBlockIndex` from `block`
11918:00 <stickies-v> stringintech: I don't think we need disk access for that, the entire block index (`BlockManager::m_block_index`) is kept in-memory, it is relatively small
12018:00 <TheCharlatan> Davidson, that is a good reason, but we could also assert that beforehand. Other than that, looking up the block index with a block's hash does incur bit of a performance penalty.
12118:01 <stringintech> stickies-v: oh!! thanks.
12218:02 <stickies-v> TheCharlatan: sure, it's a lookup, but it's a map - do you think that's going to be measurable?
12318:04 <TheCharlatan> I doubt it would be. I think I implemented it that way more because we already have it available at its call sites.
12418:04 <Davidson> stickies-v: Oh, I didn't know you could do that :')
12518:04 <stickies-v> anyway, we're past end time for today, so I'm going to wrap it up here, but as always feel free to share thoughts or follow-up questions - i'll be around for a while longer!
12618:05 <stickies-v> thanks for joining the discussion today everyone, and thanks TheCharlatan for authoring the PR!
12718:05 <marcofleon> yeah wait how are we getting a CBlockindex from a CBlock?
12818:05 <stickies-v> #endmeeting
12918:05 <corebot> stickies-v: Meeting ended at 2025-05-14T18:05+0000
13018:05 <corebot> stickies-v: Raw log: https://achow101.com/ircmeetings/2025/bitcoin-core-pr-reviews.2025-05-14_17_00.log.json
13118:05 <corebot> stickies-v: Formatted log: https://achow101.com/ircmeetings/2025/bitcoin-core-pr-reviews.2025-05-14_17_00.log.html
13218:05 <corebot> stickies-v: Minutes: https://achow101.com/ircmeetings/2025/bitcoin-core-pr-reviews.2025-05-14_17_00.html
13318:05 <marcofleon> just by getting the hash and then looking it up?
13418:05 <TheCharlatan> yes
13518:05 <stickies-v> yeah, `BlockManager::m_block_index` is an unordered_map keyed by block hash
13618:05 <TheCharlatan> thanks for hosting stickies-v!
13718:06 <Davidson> thanks everyone!
13818:06 <marcofleon> got it, thanks
13918:06 <marcofleon> thanks stickies and Charlatan!
14018:06 <stringintech> Thank you everyone
14118:07 <monlovesmango> thanks for hosting stickies-v and TheCharlatan !!
14218:07 <marcofleon> this question: CCoinsViewCache has two methods AccessCoin() and GetCoin() that both return a Coin-like type. What is the difference between both methods, and when should which be used?
14318:07 <marcofleon> I'd like to know when they should be used?
14418:08 <marcofleon> I can see the differences in return value
14518:08 <marcofleon> and what happens if the coin isn't found or not spent
14618:10 <stickies-v> yeah there are 2 main differences between those 2 functions:
14718:11 <marcofleon> I guess if you're not gonna modify the coin or store it, use Access?
14818:11 <stickies-v> `GetCoin` ensures it only returns a `Coin` if it's not spent, whereas `AccessCoin` does not do that
14918:11 <stickies-v> second: `GetCoin` returns a copy, whereas `AccessCoin` returns a reference to the `Coin` in `cacheCoins`
15018:12 <stickies-v> so `AccessCoin` is naturally going to be more performant, but at the cost of introducing lifetime risk, you can end up with a dangling reference if you're not careful
15118:13 <stickies-v> neither is going to allow you to modify the coin, `AccessCoin` returns a `const` reference
15218:13 <TheCharlatan> yes, their lifetime is only valid for as long as the cache is not mutated in between, so you should not use it if you plan on doing any other calls to the coins in between.
15318:14 <monlovesmango> marcofleon: thanks for asking that I was also interested in that answer
15418:15 <marcofleon> perfect, yeah I see it now. thanks for going through that
15518:15 <marcofleon> monlovesmango: no problem :) it was a good question, had to know
15618:15 <stickies-v> TheCharlatan: I think for unordered_map the references should remain valid as long as not that specific member is erased, though
15718:16 <stickies-v> "References and pointers to either key or data stored in the container are only invalidated by erasing that element, even when the corresponding iterator is invalidated." from https://en.cppreference.com/w/cpp/container/unordered_map
15818:18 <TheCharlatan> Mmh, right. I guess we could be erasing though when we spend, right?
15918:18 <stickies-v> yeah, i think so
16018:20 <stickies-v> and i actually had a question on the `IsSpent` check in `GetCoin` - I think in normal operation that should always be false except for when we haven't flushed yet, right?
16118:22 <TheCharlatan> mmh, good question, I'll have to check again
16218:52 <marcofleon> TheCharlatan: little nit, but the name ConnectBlock doesn't match what it's doing in the PR. I think it just does validation now (minus utxo set check)
16318:52 <marcofleon> especially because setbestblock got moved as well
16418:53 <marcofleon> maybe i'll come up with a briliant new function name tomorrow. I'll keep you posted
16518:59 <stickies-v> it doesn't just do validation though, it also updates the block index (bumping validity)