Bitcoin Core PR Review Club

Separate UTXO set access from validation functions (validation)

May 14, 2025

https://github.com/bitcoin/bitcoin/pull/32317

Host: stickies-v - PR author: TheCharlatan

The PR branch HEAD was 76a8f22c5c5cf48a9c36cc40db9224f0454917d0 at the time of this review club meeting.

Notes

Motivation

The bitcoinkernel project carves out validation logic into a separate, stateful library, allowing it to be used by other applications. Because the bitcoinkernel project has opted for an incremental approach rather than a complete rewrite, its current interface is still strongly influenced by Bitcoin Core’s architecture, requirements and assumptions.

The UTXO set is a crucial component of Bitcoin Core’s architecture, but it is an implementation detail, and one that other Bitcoin node implementations may choose not to implement. For example:

Utreexo nodes, such as Floresta, rely on an accumulator instead of a UTXO set
SwiftSync is near-stateless, and does not have the concept of a UTXO set.

The motivation behind this PR is to, in future work, allow kernel users without a UTXO set to validate a transaction by providing the validation function the specific UTXTOs it is spending.

UTXO set

The UTXO set is managed by Chainstate::m_coins_views, which contains an in-memory CCoinsCacheView m_cacheview caching layer and a canonical on-disk CCoinsViewDB m_dbview layer. The m_cacheview is instantiated with a base pointer to m_dbview, allowing CCoinsCacheView::GetCoin() to automatically load a UTXO (or Coin) from disk if it does not exist in the cache.

Approach

In the first 4 commits, this PR reduces coupling between transaction validation functions and the UTXO set by requiring the caller to first fetch the Coins or CTxOuts they require and passing those to the validation function, instead of letting the validation function access the UTXO set directly.

In subsequent commits, ConnectBlock()’s dependency on the UTXO set is removed entirely by carving out the remaining logic that requires UTXO set interaction into a separate SpendBlock() method.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
Why is carving out the new SpendBlock() function from ConnectBlock() helpful for this PR? How would you compare the purpose of the two functions?
Do you see another benefit of this decoupling, besides allowing kernel usage without a UTXO set?
Especially during IBD, transaction validation must be fast. Are there any changes where you have concerns about performance? Which measures can you identify this PR has adopted to minimize the performance impact of this refactor?
SpendBlock() takes a CBlock block, CBlockIndex pindex and uint256 block_hash parameter, all referencing the block being spent. Why do we need 3 parameters to do that?
CCoinsViewCache has two methods AccessCoin() and GetCoin() that both return a Coin-like type. What is the difference between both methods, and when should which be used?
The first commits in this PR refactor CCoinsViewCache out of the function signature of a couple of validation functions. Does CCoinsViewCache hold the entire UTXO set? Why is that (not) a problem? Does this PR change that behaviour?
Why does commit a7e4132 use explicit template instantiation for GetP2SHSigOpCount()?
Following on the previous question, why is there explicit instantiation of the <std::span<std::reference_wrapper<const Coin>>> template when a std::reference_wrapper<T> already implicitly converts to T&?

Meeting Log

17:00

<stickies-v> #startmeeting

17:00

<corebot> stickies-v: Meeting started at 2025-05-14T17:00+0000

17:00

<corebot> stickies-v: Current chairs: stickies-v

17:00

<corebot> stickies-v: Useful commands: #action #info #idea #link #topic #motion #vote #close #endmeeting

17:00

<corebot> stickies-v: See also: https://hcoop-meetbot.readthedocs.io/en/stable/

17:00

<corebot> stickies-v: Participants should now identify themselves with '#here' or with an alias like '#here FirstLast'

17:00

<stickies-v> hey everyone, welcome to the review club!

17:00

<stringintech> Hi!

17:00

<brunoerg> hi

17:00

<Davidson> Hi

17:00

<monlovesmango> hey

17:00

<marcofleon> yo

17:00

<JoaoLeal> Hi

17:00

<stickies-v> today we'll be covering https://bitcoincore.reviews/32317, titled "Separate UTXO set access from validation functions"

17:01

<enochazariah> hello

17:01

<marcofleon> woo!

17:01

<stickies-v> is anyone here for the first time? feel free to say hi, even if you're just lurking

17:01

<TheCharlatan> hello!

17:01

<Davidson> stickies-v: I haven't participated in a while. Ho hi :)

17:01

<Davidson> so*

17:02

<lightlike> hi

17:02

<pablomartin_> hello

17:02

<stickies-v> well, welcome back Davidson! are you the floresta davidson, by any chance?

17:02

<Davidson> Yeap, it's me

17:03

<stickies-v> oh nice, we've got kernel users, contributors, reviewers and enthusiasts all in one meeting!

17:03

<stickies-v> (reviewing IS contributing, of course)

17:03

<stickies-v> did anyone get the chance to review the notes and/or PR (y/n)?

17:04

<Davidson> y

17:04

<stringintech> y

17:04

<marcofleon> y, checked out the notes and an initial code review of the PR

17:04

<brunoerg> yes

17:04

<monlovesmango> yes

17:04

<enochazariah> yes, checked the code out

17:04

<TheCharlatan> y :D

17:05

<stickies-v> oh my that's a lot of prep, excellent

17:05

<stickies-v> let's dive right into the questions, we'll start off with the more conceptual ones and then progress into code questions. as always, review club is async so feel free to continue conversation on previous questions when we move on, or raise any other questions you have!

17:06

<stickies-v> 2. Why is carving out the new SpendBlock() function from ConnectBlock() helpful for this PR? How would you compare the purpose of the two functions?

17:08

<monlovesmango> it will remove all UTXO set interactions from ConnectBlock which makes ConnectBlock much more modular. SpendBlock encapsulates all the UTXO interactions that are needed when connecting new block

17:09

<marcofleon> carving it out is helpful because using ConnectBlock no longer requires the utxo set. So you can do a lot of block validation without a utxo set

17:09

<Davidson> Before this PR, ConnectBlock also looked utxos up, and therefore needed access to the utxo set. Now, the feching of utxos is defered to the new function. Making ConnectBlock work without access to the utxo set

17:10

<stickies-v> right, not every full node implementation has a UTXO set, so only being able to connect a block by passing UTXO set as a parameter seems rather opinionated for how Core works

17:11

<stickies-v> follow-up question: does the new ConnectBlock allow for stateless validation?

17:12

<marcofleon> do you count blockundo as non-state? if that makes sense

17:12

<monlovesmango> yes..? what does stateless validation mean? not needing the statefulness of utxo set?

17:12

<marcofleon> i guess the new Connectblock would still requite some state then?

17:13

<marcofleon> but it's prevalidated by SpendBlock

17:13

<stickies-v> blockundo is a ConnectBlock parameter, so that does not count as state

17:14

<stringintech> I guess using the fJustCheck as true could do this for us?

17:14

<Davidson> I would say no. Because it still uses a ref to the block tree index.

17:14

<monlovesmango> ah ok, so stateless validation would be any state that is not passed in by parameters?

17:15

<stickies-v> what I'm trying to get at is: if someone has a serialized block, and they want to check if it's valid, can they validate it with a function (similar to) ConnectBlock?

17:15

<stickies-v> so state in this case would be from the Chainstate instance of which ConnectBlock is a method

17:17

<TheCharlatan> there is still extra state require, for example they still need the deployment bits

17:18

<marcofleon> those pesky deployment bits

17:18

<stickies-v> Davidson: yeah ConnectBlock has side-effects (as clearly implied by the name too) in that it e.g. updates the chainstate, and the block index

17:19

<marcofleon> i thought SpendBlock updates the chainstate?

17:20

<marcofleon> unless i'm mistaking what the chainstate is here...

17:20

<stickies-v> oh sorry yes that's confusing naming 🙈

17:22

<TheCharlatan> I guess you meant members of the Chainstate class?

17:23

<stickies-v> i was thinking about connecting the block to the chaintip (i.e. state of the Chain) but now i'm not actually sure if that's ahppening in ConnectBlock

17:24

<stickies-v> you're right marcofleon that ConnectBlock does not update the UTXO set, commonly called the chainstate

17:25

<TheCharlatan> ah, yes, that is moved out of ConnectBlock in the commit " validation: Move SetBestBlock out of ConnectBlocky

17:25

<stickies-v> ah right!

17:26

<stickies-v> 3. 3. Do you see another benefit of this decoupling, besides allowing kernel usage without a UTXO set?

17:27

<marcofleon> maybe easier testing

17:27

<Davidson> I would say it's cleaner. Since you have a clear separation of concerns

17:27

<enochazariah> i see modularity as another benefit of decoupling

17:27

<marcofleon> by splitting up parts of validation into separate functions

17:28

<stickies-v> yeah improved testability was the main benefit i was thinking of, reusability/modularity might be a win too even though the potential there is probably a bit more limited since there's only so many places these functions can be used

17:28

<monlovesmango> it also helps with code maintainability as there is separation of concerns

17:28

<stickies-v> anything else?

17:29

<Davidson> Makes it easier to validate blocks in parallel (e.g. swift-sync)

17:30

<stickies-v> yes! but isn't that a direct effect of the "allowing kernel usage without a UTXO set" bit?

17:30

<Davidson> yeah, I think so

17:31

<stickies-v> TheCharlatan observed a few other minor improvements:

17:32

<stickies-v> reducing the amount of UTXO set lookups (by just doing it once at the beginning) can have minimal performance improvements

17:33

<stickies-v> and then from a maintainability / code clarity perspective: explicitly passing objects (coins) through a callstack is easier to reason about and has less thread safety issues etc than having each frame do its own map lookup

17:34

<stickies-v> 4. Especially during IBD, transaction validation must be fast. Are there any changes where you have concerns about performance?

17:38

<marcofleon> maybe iterating over the txs in the block twice now? doesn't seem like a big deal though

17:38

<stickies-v> Or if you don't have any concerns: Which measures can you identify this PR has adopted to minimize the performance impact of this refactor?

17:38

<TheCharlatan> there are a few additional vector allocations, where instead of retrieving elements one-by-one, references to elements are filled into a vector and then passed by spans.

17:38

<Davidson> I didn't see anything extraordinary. Maybe use a little bit more memory?

17:39

<Davidson> TheCharlatan: oh you were faster than me :D

17:39

<TheCharlatan> heh :D

17:40

<stickies-v> yeah vector allocations was the main thing I could see too. This PR relies quite heavily on passing references rather than copying, so the overhead there should be quite minimal

17:40

<stickies-v> of course, these references can introduce lifetime risks, so that might be something to consider in your review

17:41

<lightlike> not so important for the question, but I don't agree with the "especially" in the question : In my opinion, validation / block propagation being fast is much more important at the tip than during IBD, which is a one-time thing.

17:42

<Davidson> stickies-v: I assume buildinga spam<T> from a vector<T> is almost free? (Rust bro here :D )

17:42

<stickies-v> lightlike: oh, interesting! yeah, i agree that at the tip performance is also crucial, and it's maybe a bit meaningless to compare which is more important - could have phrased that better, sorry!

17:42

<monlovesmango> I did have a question about why we pass block_hash as a parameter to ConnectBlock when block.GetHash() is still called when asserting Assume(block.GetHash() == block_hash)

17:44

<stickies-v> Davidson: yes, I'm not enough of a C++ expert to give you the sound answer, but my understanding is that a span allows to iterate over the container with almost no overhead, like a view

17:44

<Davidson> monlovesmango: I believe this is the topic for question number 5, no?

17:45

<stickies-v> it is indeed!

17:45

<Davidson> stickies-v: nice, so it's basically one vec allocation per block with this block's coins. And then we pass refs to that vec

17:45

<marcofleon> is block_hash just sort of used as sanity check in those Spendblock assertions?

17:46

<TheCharlatan> marcofleon: yes :)

17:46

<stickies-v> 5. `SpendBlock()` takes a `CBlock block`, `CBlockIndex pindex` and `uint256 block_hash` parameter, all referencing the block being spent. Why do we need 3 parameters to do that?

17:46

<monlovesmango> Davidson: haha oops

17:47

<marcofleon> the CBlock contains the actual txs that are used to check against and update the utxo set so gotta have that

17:47

<Davidson> So, mostly sanity check...? I've seen that the pindex arg also gets used to figure out if our ChainState represents the previous block's state, so we need this one

17:48

<stickies-v> monlovesmango: `Assume` statements are only compiled into debug builds, so this check should not incur any overhead on non-debug builds

17:48

<stickies-v> otherwise, indeed it would be a bit silly to pass `block_hash` as a performance optimization. good spot!

17:48

<monlovesmango> I assumed the block_hash parameter was for performance reasons, so that we don't have to re-hash

17:48

<monlovesmango> stickies-v: ok!! that makes a lot more sense :)

17:49

<monlovesmango> also good to know that about `Assume` statements

17:49

<marcofleon> pindex also used for the height to cehck against bip30 stuff

17:50

<marcofleon> and for UpdateCoins

17:51

<monlovesmango> isn't bip30 stuff now in SpendBlock?

17:52

<stickies-v> monlovesmango: yes, the question is about `SpendBlock`

17:52

<monlovesmango> omg ignore sorry..

17:53

<marcofleon> "bip30 stuff" is my best summarizing of that whole chunk of code/text

17:55

<stickies-v> but why do we need to pass `pindex`? Can't it be retrieved from `block`?

17:56

<stickies-v> (and if it can, should it?)

17:57

<Davidson> You mean the prevblock? I think it's better to get the prevblock straight from pindex to make sure we actually building on tip?

17:57

<stickies-v> We're almost at time, so gonna launch the next/last question already

17:57

<stickies-v> 7. The first commits in this PR refactor `CCoinsViewCache` out of the function signature of a couple of validation functions. Does `CCoinsViewCache` hold the entire UTXO set? Why is that (not) a problem? Does this PR change that behaviour?

17:58

<stringintech> I guess we need another disk access for that which is not good to do? (previous question)

17:59

<stickies-v> Davidson: no, I mean dropping the `pindex` argument from the `SpendBlock` fn signature, and just letting `SpendBlock` get a `CBlockIndex` from `block`

18:00

<stickies-v> stringintech: I don't think we need disk access for that, the entire block index (`BlockManager::m_block_index`) is kept in-memory, it is relatively small

18:00

<TheCharlatan> Davidson, that is a good reason, but we could also assert that beforehand. Other than that, looking up the block index with a block's hash does incur bit of a performance penalty.

18:01

<stringintech> stickies-v: oh!! thanks.

18:02

<stickies-v> TheCharlatan: sure, it's a lookup, but it's a map - do you think that's going to be measurable?

18:04

<TheCharlatan> I doubt it would be. I think I implemented it that way more because we already have it available at its call sites.

18:04

<Davidson> stickies-v: Oh, I didn't know you could do that :')

18:04

<stickies-v> anyway, we're past end time for today, so I'm going to wrap it up here, but as always feel free to share thoughts or follow-up questions - i'll be around for a while longer!

18:05

<stickies-v> thanks for joining the discussion today everyone, and thanks TheCharlatan for authoring the PR!

18:05

<marcofleon> yeah wait how are we getting a CBlockindex from a CBlock?

18:05

<stickies-v> #endmeeting

18:05

<corebot> stickies-v: Meeting ended at 2025-05-14T18:05+0000

18:05

<corebot> stickies-v: Raw log: https://achow101.com/ircmeetings/2025/bitcoin-core-pr-reviews.2025-05-14_17_00.log.json

18:05

<corebot> stickies-v: Formatted log: https://achow101.com/ircmeetings/2025/bitcoin-core-pr-reviews.2025-05-14_17_00.log.html

18:05

<corebot> stickies-v: Minutes: https://achow101.com/ircmeetings/2025/bitcoin-core-pr-reviews.2025-05-14_17_00.html

18:05

<marcofleon> just by getting the hash and then looking it up?

18:05

<TheCharlatan> yes

18:05

<stickies-v> yeah, `BlockManager::m_block_index` is an unordered_map keyed by block hash

18:05

<TheCharlatan> thanks for hosting stickies-v!

18:06

<Davidson> thanks everyone!

18:06

<marcofleon> got it, thanks

18:06

<marcofleon> thanks stickies and Charlatan!

18:06

<stringintech> Thank you everyone

18:07

<monlovesmango> thanks for hosting stickies-v and TheCharlatan !!

18:07

<marcofleon> this question: CCoinsViewCache has two methods AccessCoin() and GetCoin() that both return a Coin-like type. What is the difference between both methods, and when should which be used?

18:07

<marcofleon> I'd like to know when they should be used?

18:08

<marcofleon> I can see the differences in return value

18:08

<marcofleon> and what happens if the coin isn't found or not spent

18:10

<stickies-v> yeah there are 2 main differences between those 2 functions:

18:11

<marcofleon> I guess if you're not gonna modify the coin or store it, use Access?

18:11

<stickies-v> `GetCoin` ensures it only returns a `Coin` if it's not spent, whereas `AccessCoin` does not do that

18:11

<stickies-v> second: `GetCoin` returns a copy, whereas `AccessCoin` returns a reference to the `Coin` in `cacheCoins`

18:12

<stickies-v> so `AccessCoin` is naturally going to be more performant, but at the cost of introducing lifetime risk, you can end up with a dangling reference if you're not careful

18:13

<stickies-v> neither is going to allow you to modify the coin, `AccessCoin` returns a `const` reference

18:13

<TheCharlatan> yes, their lifetime is only valid for as long as the cache is not mutated in between, so you should not use it if you plan on doing any other calls to the coins in between.

18:14

<monlovesmango> marcofleon: thanks for asking that I was also interested in that answer

18:15

<marcofleon> perfect, yeah I see it now. thanks for going through that

18:15

<marcofleon> monlovesmango: no problem :) it was a good question, had to know

18:15

<stickies-v> TheCharlatan: I think for unordered_map the references should remain valid as long as not that specific member is erased, though

18:16

<stickies-v> "References and pointers to either key or data stored in the container are only invalidated by erasing that element, even when the corresponding iterator is invalidated." from https://en.cppreference.com/w/cpp/container/unordered_map

18:18

<TheCharlatan> Mmh, right. I guess we could be erasing though when we spend, right?

18:18

<stickies-v> yeah, i think so

18:20

<stickies-v> and i actually had a question on the `IsSpent` check in `GetCoin` - I think in normal operation that should always be false except for when we haven't flushed yet, right?

18:22

<TheCharlatan> mmh, good question, I'll have to check again

18:52

<marcofleon> TheCharlatan: little nit, but the name ConnectBlock doesn't match what it's doing in the PR. I think it just does validation now (minus utxo set check)

18:52

<marcofleon> especially because setbestblock got moved as well

18:53

<marcofleon> maybe i'll come up with a briliant new function name tomorrow. I'll keep you posted

18:59

<stickies-v> it doesn't just do validation though, it also updates the block index (bumping validity)