The PR branch HEAD was ab3fef71e at the time of this review club meeting.
Notes
A circular dependency exists when two or more
modules depend on each other. PR#13695 added a
linter which statically analyzes files to detect circular dependencies. At the time, a few already
existed and were listed in
EXPECTED_CIRCULAR_DEPENDENCIES.
One of the circular dependencies in the codebase is validation <-> txmempool. These two modules
are each heavily depended on by other modules, so their interdependency creates a “domino effect”
into the rest of the codebase. To experience this for yourself, compile (with ccache), make a small
edit to protocol.h (where P2P protocol messages are defined), then run make again and watch it
rebuild txmempool and everything mempool-related.
PR#22677 removes this dependency by removing
txmempool’s dependency on validation. It creates a GetFiltered() function that allows
validation and other clients to pass in an arbitrary predicate, iterates through all of its entries,
applies the predicate function to each of them, and returns the set of filtered iterators to mempool
entries. This allows validation to apply consensus rules to all of the mempool entries without
access to the mempool’s internals.
What is a circular dependency? Why should we avoid circular dependencies?
What functionality lives in validation? What functionality lives in txmempool?
Why should/shouldn’t validation depend on txmempool? Why should/shouldn’t txmempool depend on
validation?
One of the functions that causes txmempool to rely on validation is the check() function. What
checks does it perform, and how does this PR split the checks into two categories?
Another function that causes txmempool to rely on validation is the removeForReorg() function,
which calls CheckSequenceLocks() and CheckFinalTx(). Why do we need to re-check sequence
locks in a reorg? How is it possible for a coinbase spend to become premature in a reorg?
<pg2> Removing circular dependencies is desirable, but are there any downside or risks for this kind of refactoring? (e.g. could this break any downstream systems built on top of Bitcoin Core?)
<glozow> pg2: good answer. i might add (for the context of this PR) that txmempool is a "dumb" data structure that shouldn't know any details about consensus rules
<glozow> There would theoretically be 2 ways to remove this: make txmempool not depend on validation anymore, or make validation not depend on txmempool anymore
<pg2> Txmempool shouldn't depend on validation, because the transactions in mempool are already validated (and therefore donot need to depend on anything in validation.{h.cpp}).
<glozow> i also think an argument could be made for neither depending on the other - you don't really need to know about the existence of a mempool in order to apply validation rules to a transaction
<pg2> glozow: thanks for your answers. after this PR, is txmempool "complete dumb", or "dumb enough"? Or there are places where it still knows about consensus rules (unnecessarily)?
<glozow> since i've already started saying possibly controversial things... another thing the mempool probably doesn't need to be responsible for is the fee estimator
<_aj_> glozow: "neither depending on the other" -- isn't validation the *action* of accepting a block (which means the txs in the block should no longer be in the mempool since they're no longer valid on top of the new tip) ; the consensus rules themselves are in consensus/ and script/ and the like?
<glozow> _aj_: ah true, i completely agree. I guess in general I mean that consensus rules themselves (i.e. consensus/ and script/) and the mempool are not conceptually related
<glozow> OK next question. One of the functions that causes txmempool to rely on validation is the `check()` function. What checks does it perform, and how does this PR split the checks into two categories?
<pg2> For every transaction input in mempool, `check()` checks if the input refers to any other mempool transactions, otherwise it checks if the previous transaction output is unspent.
<glozow> so I've (arbitrarily) split the `check()` assertions into 2 categories: contextual checks and internal consistency checks. Contextual is what pg2 already mentioned - we go through and make sure all the transactions' inputs refer to something available in our mempool or UTXO set.
<glozow> Internal consistency checks are things like making sure the ancestor/descendant counts of each entry add up correctly, checking that our total fee accounting and dynamic memory usage are accurate, etc.
<glozow> Moving on to next question: Another function that causes txmempool to rely on validation is the `removeForReorg()` function, which calls `CheckSequenceLocks()` and `CheckFinalTx()`. Why do we need to re-check sequence locks in a reorg?
<glozow> larryruane: right! but... how is it possible for a coinbase spend in the mempool to become premature in a reorg? Don't we only reorg when we see a longer chain?!
<_aj_> the first block in a multi-block reorg will be lower than the last block before the reorg even if difficulty doesn't change? [A B C D] -> [A E F G H], E has lower height than D?