The PR branch HEAD was f87e07c6f at the time of this review club meeting.
Notes
CChainState
represents the node’s knowledge of a block chain, with an (optional)
mempool that is synchronized with that chain. It was introduced in
PR 10279 to eventually be an
exposed “libconsensus”. Initially it was intended to clarify the internal
interfaces within bitcoind.
ChainstateManager
is the class in validation that manages one or more CChainState objects. It
was introduced in PR 17737
in order to provide unified interface to the multiple chainstates required by
the AssumeUTXO project. Now
that a validation instance can contain multiple chainstates, it is eventually
intended to be the main interface into validation from the rest of the code.
The ChainstateManager::ProcessNewBlock() and
ChainstateManager::ProcessNewBlockHeaders() public functions were added in
PR 18698 as high-level
interfaces methods to validation. The callers (most notably net_processing) do
not need to be aware of validation’s chainstates (or even whether there are
multiple chainstates), and simply pass the block or block headers to be
validated.
AcceptToMemoryPool()
(often called ATMP) is a free function in the global namespace. Prior to this PR, it was the
interface through which the rest of the code could call into validation to
submit a transaction to the mempool. The calling code would need to provide the
chainstate and mempool as arguments to the function, which breaks the abstraction
that calling code doesn’t need to know about validation’s internals.
This PR adds a new
ChainstateManager::ProcessTransaction()
interface function, similar to ProcessNewBlock() and
ProcessNewBlockHeaders(), which provides a high-level interface method for
submitting transactions to be validated. The caller no longer needs to know
about the chainstate or mempool inside validation.
The PR also removes the responsibility of calling the CTxMemPool::check()
consistency check function from the caller (net_processing) and
moves it to validation.
Which components currently call AcceptToMemoryPool()? List all the places that we’ll call ATMP.
What does CTxMemPool::check() do? Whose responsibility is it to call that function?
What does the bypass_limits argument in ATMP do? In which circumstances is ATMP call with bypass_limits set to true?
One of the commit logs reads:
“This logic is not necessary for memory pool transactions, as
AcceptToMemoryPool already refuses previously-known transaction ids
entirely.” refers to the logic at
https://github.com/bitcoin/bitcoin/blob/a206b0ea12eb4606b93323268fc81a4f1f952531/src/main.cpp#L484-L486,
which was later removed in commit 450cbb0944cd20a06ce806e6679a1f4c83c50db2.
What was the change that removed that logic?
The last commit is to Always call mempool.check() after processing a new transaction. Does that add any overhead to processing/relaying transactions? Why/why not?
<stickies-v> code outside of validation.cpp should not need to care about chainstate management, so through ChainstateManager::ProcessTransaction those callers can now add transactions without needing a reference to mempool or chaintip
<jnewbery> Kaizen_Kintsugi: AssumeUTXO is a huge project that touches lots of parts of validation and init. This PR wasn't motivated specifically by AssumeUTXO, but I'd argue that clarifying interfaces should help add new features if done right
<stickies-v> a mutex - a bit fuzzy on the details but i think it tries to ensure validation consistency by only allowing a single thread at the same time to modify crucial components like chainstate etc?
<jnewbery> Kaizen_Kintsugi: being pedantic for a second, cs_main is a *mutex*, and that mutex is *locked* in code paths, which excludes other threads from reading/writing data at the same time
<jnewbery> so to summarize, cs_main is the big lock in Bitcoin Core. It _should_ be predominantly to protect chainstate and validation, but there's also a lot of state in net_processing and other places that are guarded by cs_main
<glozow> (1) loading mempool.dat from disk, (2) receiving a new tx on p2p, (3) adding transactions from disconnected blocks (4) from clients like wallet and rpc via BroadcastTransaction
<jnewbery> follow up question. Out of (1), (2), (3), (4) and (4b), which are calls from inside validation, and which are from external client code (where by client I mean other components within bitcoin core, not external programs)
<jnewbery> stickies-v: yes, the same logic is involved in package validation, although that doesn't actually use the ATMP function. Package validation would be from the same call sites as (2) and (4) when it gets merged
<jnewbery> yeah, mempool.dat is only written on shutdown and read on startup usually. I think there may be an rpc to dump it manually but I'm not 100% sure about that
<jnewbery> I'd call net_processing a client of validation. It calls into various functions to either provide new data for validation to validate, or to to request what the current state is
<stickies-v> CTxMemPool::check() verifies that all the transactions in the mempool are consistent, e.g. they don't spend the same inputs twice. Since it's by default not used on mainnet, I think this is mostly a dev/debug feature?
<stickies-v> Previously, it was the responsibility of anyone interacting with the mempool. Since this PR, ChainstateManager::ProcessTransaction takes care of this automatically.
<jnewbery> stickies-v's point about it being off by default is important. If that wasn't the case, then this PR might be a performance pessimization. But since check() is generally only used for testing, it's not a problem.
<jnewbery> during a reorg, we'll disconnect one or more blocks and then connect one or more competing blocks. The blocks that are disconnected will probably have a bunch of transactions in them.
<Kaizen_Kintsugi> oh neat, so there is a disconnect pool inside the mempool if I understand correctly, there is a 'partition' for this stuff to happen?
<jnewbery> lsilva_: "connecting" a block is applying its transactions to the UTXO set (removing outputs that are spent by txs in the block and adding new outputs that are created) and updating the chainstate
<stickies-v> hmm so during reorg we ignore the default mempool size of 300MB but instead use a (default) MAX_DISCONNECTED_TX_POOL_SIZE of 20MB of disconnected transactions? Then doesn't it make more sense to just keep the 300MB mempool limit?
<jnewbery> "disconnecting" a block is the reverse: removing the UTXOs that were created in the block and re-adding those that were spent in the block (which were stored in the 'undo data' for that block on disk)
<pg156> so for those transactions from disconnected blocks (but not in new connected blocks), are they added to mempool while bypassing the limits? If so, what if the fee rate of them is too low?
<stickies-v> although I suppose pushing out 20MB of transactions from the mempool (assuming it's full) is much more costly than just temporarily allocating an extra 20MB
<jnewbery> stickies-v: that's a good question! I'm not entirely clear about why we should ignore the 300MB limit when we're putting transactions into the mempool during a re-org
<jnewbery> I think it may be due to ancestor/descendant chains. Since the transactions can only be added sequentially, we can't look at the ancestor/descendant feerates when trying to add the transactions back, so it's perhaps better to add everything, and then limit the mempool size if necessary
<jnewbery> No, I don't think it's anything to do with a super-large re-org. In that case we'd lose most of the transactions, since the MAX_DISCONNECTED_TX_POOL_SIZE is limited to 20MB
<jnewbery> to be honest, I haven't figured out exactly what the rationale is, but this PR doesn't change the behaviour for bypass_limits at all so that's ok :)
<glozow> well if you were at 290mb and disconnectpool is 20mb, your only choices are (1) cut down to 280mb and then accept or (2) allow going to 310mb and then cut right? the former means you might end up with lower fee transactions
<michaelfolkson> sipa: What is the UTXO set implied by the mempool? There's no predictive element of what will be in the next mined block is there? So the mempool is treated by the codebase as irrelevant to any future changes to the UTXO set?