The bitcoinkernel project carves out validation logic into a separate, stateful library, allowing it
to be used by other applications. Because the bitcoinkernel project has opted for an incremental
approach rather than a complete rewrite, its current interface is still strongly influenced by
Bitcoin Core’s architecture, requirements and assumptions.
The UTXO set is a crucial component of Bitcoin Core’s architecture, but it is an implementation
detail, and one that other Bitcoin node implementations may choose not to implement. For example:
Utreexo nodes, such as
Floresta, rely on an accumulator instead of a UTXO set
SwiftSync is
near-stateless, and does not have the concept of a UTXO set.
The motivation behind this PR is to, in future work, allow kernel users without a UTXO set to
validate a transaction by providing the validation function the specific UTXTOs it is spending.
UTXO set
The UTXO set is managed by Chainstate::m_coins_views, which contains an in-memory CCoinsCacheView
m_cacheview caching layer and a canonical on-disk CCoinsViewDB m_dbview layer. The m_cacheview
is instantiated with a base
pointer
to m_dbview, allowing CCoinsCacheView::GetCoin() to automatically load a UTXO (or Coin) from
disk if it does not exist in the cache.
Approach
In the first 4 commits, this PR reduces coupling between transaction validation
functions and the UTXO set by requiring the caller to first fetch the Coins or CTxOuts they
require and passing those to the validation function, instead of letting the validation function
access the UTXO set directly.
In subsequent commits, ConnectBlock()’s dependency on the UTXO set is removed entirely by carving
out the remaining logic that requires UTXO set interaction into a separate SpendBlock() method.
Why is carving out the new SpendBlock() function from ConnectBlock() helpful for this PR? How
would you compare the purpose of the two functions?
Do you see another benefit of this decoupling, besides allowing kernel usage without a UTXO set?
Especially during IBD, transaction validation must be fast. Are there any changes where you have
concerns about performance? Which measures can you identify this PR has adopted to minimize the
performance impact of this refactor?
SpendBlock() takes a CBlock block, CBlockIndex pindex and uint256 block_hash parameter,
all referencing the block being spent. Why do we need 3 parameters to do that?
CCoinsViewCache has two methods AccessCoin() and GetCoin() that both return a Coin-like
type. What is the difference between both methods, and when should which be used?
The first commits in this PR refactor CCoinsViewCache out of the function signature of a couple
of validation functions. Does CCoinsViewCache hold the entire UTXO set? Why is that (not) a
problem? Does this PR change that behaviour?
Why does commit
a7e4132
use explicit template instantiation for GetP2SHSigOpCount()?
Following on the previous question, why is there explicit instantiation of the
<std::span<std::reference_wrapper<const Coin>>> template when a std::reference_wrapper<T>
already implicitly
converts
to T&?
<stickies-v> let's dive right into the questions, we'll start off with the more conceptual ones and then progress into code questions. as always, review club is async so feel free to continue conversation on previous questions when we move on, or raise any other questions you have!
<stickies-v> 2. Why is carving out the new SpendBlock() function from ConnectBlock() helpful for this PR? How would you compare the purpose of the two functions?
<monlovesmango> it will remove all UTXO set interactions from ConnectBlock which makes ConnectBlock much more modular. SpendBlock encapsulates all the UTXO interactions that are needed when connecting new block
<marcofleon> carving it out is helpful because using ConnectBlock no longer requires the utxo set. So you can do a lot of block validation without a utxo set
<Davidson> Before this PR, ConnectBlock also looked utxos up, and therefore needed access to the utxo set. Now, the feching of utxos is defered to the new function. Making ConnectBlock work without access to the utxo set
<stickies-v> right, not every full node implementation has a UTXO set, so only being able to connect a block by passing UTXO set as a parameter seems rather opinionated for how Core works
<stickies-v> what I'm trying to get at is: if someone has a serialized block, and they want to check if it's valid, can they validate it with a function (similar to) ConnectBlock?
<stickies-v> Davidson: yeah ConnectBlock has side-effects (as clearly implied by the name too) in that it e.g. updates the chainstate, and the block index
<stickies-v> i was thinking about connecting the block to the chaintip (i.e. state of the Chain) but now i'm not actually sure if that's ahppening in ConnectBlock
<stickies-v> yeah improved testability was the main benefit i was thinking of, reusability/modularity might be a win too even though the potential there is probably a bit more limited since there's only so many places these functions can be used
<stickies-v> and then from a maintainability / code clarity perspective: explicitly passing objects (coins) through a callstack is easier to reason about and has less thread safety issues etc than having each frame do its own map lookup
<stickies-v> Or if you don't have any concerns: Which measures can you identify this PR has adopted to minimize the performance impact of this refactor?
<TheCharlatan> there are a few additional vector allocations, where instead of retrieving elements one-by-one, references to elements are filled into a vector and then passed by spans.
<stickies-v> yeah vector allocations was the main thing I could see too. This PR relies quite heavily on passing references rather than copying, so the overhead there should be quite minimal
<lightlike> not so important for the question, but I don't agree with the "especially" in the question : In my opinion, validation / block propagation being fast is much more important at the tip than during IBD, which is a one-time thing.
<stickies-v> lightlike: oh, interesting! yeah, i agree that at the tip performance is also crucial, and it's maybe a bit meaningless to compare which is more important - could have phrased that better, sorry!
<monlovesmango> I did have a question about why we pass block_hash as a parameter to ConnectBlock when block.GetHash() is still called when asserting Assume(block.GetHash() == block_hash)
<stickies-v> Davidson: yes, I'm not enough of a C++ expert to give you the sound answer, but my understanding is that a span allows to iterate over the container with almost no overhead, like a view
<stickies-v> 5. `SpendBlock()` takes a `CBlock block`, `CBlockIndex pindex` and `uint256 block_hash` parameter, all referencing the block being spent. Why do we need 3 parameters to do that?
<Davidson> So, mostly sanity check...? I've seen that the pindex arg also gets used to figure out if our ChainState represents the previous block's state, so we need this one
<stickies-v> 7. The first commits in this PR refactor `CCoinsViewCache` out of the function signature of a couple of validation functions. Does `CCoinsViewCache` hold the entire UTXO set? Why is that (not) a problem? Does this PR change that behaviour?
<stickies-v> Davidson: no, I mean dropping the `pindex` argument from the `SpendBlock` fn signature, and just letting `SpendBlock` get a `CBlockIndex` from `block`
<stickies-v> stringintech: I don't think we need disk access for that, the entire block index (`BlockManager::m_block_index`) is kept in-memory, it is relatively small
<TheCharlatan> Davidson, that is a good reason, but we could also assert that beforehand. Other than that, looking up the block index with a block's hash does incur bit of a performance penalty.
<stickies-v> anyway, we're past end time for today, so I'm going to wrap it up here, but as always feel free to share thoughts or follow-up questions - i'll be around for a while longer!
<marcofleon> this question: CCoinsViewCache has two methods AccessCoin() and GetCoin() that both return a Coin-like type. What is the difference between both methods, and when should which be used?
<stickies-v> so `AccessCoin` is naturally going to be more performant, but at the cost of introducing lifetime risk, you can end up with a dangling reference if you're not careful
<TheCharlatan> yes, their lifetime is only valid for as long as the cache is not mutated in between, so you should not use it if you plan on doing any other calls to the coins in between.
<stickies-v> "References and pointers to either key or data stored in the container are only invalidated by erasing that element, even when the corresponding iterator is invalidated." from https://en.cppreference.com/w/cpp/container/unordered_map
<stickies-v> and i actually had a question on the `IsSpent` check in `GetCoin` - I think in normal operation that should always be false except for when we haven't flushed yet, right?
<marcofleon> TheCharlatan: little nit, but the name ConnectBlock doesn't match what it's doing in the PR. I think it just does validation now (minus utxo set check)