The PR branch HEAD was ebca80d at the time of this review club meeting.
Definitions
A package is a group of transactions that have some dependency
relationship, i.e., some transaction spends the outputs of another
transaction in the package.
A test accept of validation refers to an invocation of mempool validation
that validates but does not submit a transaction to the mempool, called from
the testmempoolaccept RPC.
A Coin (CCoin)
is a UTXO. In our context, coins donât necessarily need to exist on-chain;
they can be outputs from transactions in the mempool or some other valid but
unconfirmed transaction. A Coins View is a data structure that contains
the available coins. The typical usage is to query a Coins View with a
COutPoint.
Background
The testmempoolaccept RPC was introduced in
#11742 and allows users to
test raw transactions for validity. While the return object is an array (with
the intention to allow multiple transactions), testmempoolaccept is currently
limited to one transaction at a time, which means users cannot test
transactions that depend on outputs that the mempool has not seen yet. The main
reason for this limitation is that our validation code only uses the coins
available in the current chain state and mempool (and not anything else), to
validate transaction inputs.
#20833 is also intended to be
a first step towards Package Mempool Accept and hopefully, eventually, Package
Relay. We have discussed
Package Relay in a previous review club,
#16401. You can read more about Package
Mempool Accept
here.
Notes
#20833 enables dry runs of
package validation. It does this by having CCoinsViewMemPool keep track of
the Coins created and spent by transactions in the package.
The
MemPoolAccept
class represents a âvalidation sessionâ with the current mempool and chain
state; it holds cs_main and pool.cs mutexes throughout validation. Its job
is to figure out whether a transaction (received from peers or clients like
RPC) is a valid candidate for submission to the mempool. To do so, it applies
consensus and policy rules.
Importantly, if a transaction is invalid, MemPoolAccept is designed such that
it fails as quickly as possible to mitigate DoS risks.
Each MemPoolAccept instance creates a Coins View from the current chain tip
and mempool. CCoinsView is a class with a hierarchical structure (see
diagram below).
A CCoinsViewBacked points to another CCoinsView as its âbackend.â When
CCoinsViewBacked.GetCoin() cannot find a coin in its own data structures,
it calls backend.GetCoin() to continue searching. Thanks to the magic of
polymorphism, CCoinsViewBacked Coins Views can be âlayeredâ to allow
searching for coins in multiple locations without changing the underlying chain
state. We have discussed the layered UTXO cache in a previous review
club, #17487.
Since the UTXO set is quite large and stored on disk, Bitcoin Core uses a
cache to improve validation performance. To prevent cache thrashing caused
by an attacker broadcasting many invalid transactions spending a wide variety
of UTXOs (a cheap DoS attack), we keep track of which Coins are brought in from
disk and uncache them if the transaction turns out to be invalid.
Validation roughly consists of PreChecks, PolicyScriptChecks,
ConsensusScriptChecks, and Finalize. Weâve touched on these functions in
previous review clubs, e.g. #19339. This
Stack Exchange
post
about consensus vs policy rules may also be helpful.
In an effort to not make any API-breaking changes, #20833 returns a results
array with one result per transaction in the same order they were passed in
or the first failure result.
What are some use cases for enabling packages through testmempoolaccept?
What are some limitations of this implementation of package validation?
From a code quality standpoint, what is the benefit of moving the
validation state, replaced transactions, and fee to Workspace? Should we
add coins_to_uncache to Workspace and make ATMPArgs a const? Why or why
not?
Why do we run all PreChecks before script checks? Would it still be
correct if we didnât do this?
In
AcceptSingleTransaction
(with test_accept=true), both PolicyScriptChecks and
ConsensusScriptChecks are run. However, in
AcceptMultipleTransactions,
we donât run ConsensusScriptChecks. Why is that okay? (Followup question:
why do we run ConsensusScriptChecks at all in ATMP?)
When we fetch a coin from m_viewmempool, what 4 places do we look and in
what order? Why do we check cache_package_remove before
cache_package_add in CCoinsViewMemPool::GetCoin?
If a block arrives while weâre running PreChecks, do we have to worry
about our CoinsView becoming inconsistent?
Why do we set our CoinsView backend to dummy
(code)?
What bug is this trying to prevent?
What are your thoughts on the results array (returning one for each tx if
successful or the first one if failed)? Do you have any suggestions for
improvement, and is it API-breaking?
<glozow> ok let's start with some conceptual questions. What are some use cases for enabling packages through testmempoolaccept? (non-package-relay related please, since thatâs very far off from this PR)
<sishir> This allows you to test validity for second-layer application transaction chains (i.e. with multiple spending paths and where you don't want to broadcast yet).it could be helpful to return more information to the client, but don't know what that would look like concretely
<stacie> A non-package-relay answer: With packages you can now validate multiple txs with one testmempoolaccept call (which previously accepted an array but but only expected it to be size one)
<glozow> yeah! we can check transactions that build on each other, although there are some limitations - would anyone like to describe some limitations with what this PR enables?
<stacie> Another limitation is the atomic validation - the response is only for the first failure, so if you have multiple, you'll need to go back and correct each one before calling testmempoolaccept again
<ariard> stacie: in practice, that's hard to do because it assumes your txn generation backend to fully consume the errors and be able to correct them while still enforcing application semantic
<glozow> The PR starts off with a few refactors. From a code quality standpoint, what is the benefit of moving the validation state, replaced transactions, and fee to `Workspace`? Should we add `coins_to_uncache` to `Workspace `and make `ATMPArgs` a const? Why or why not?
<AnthonyRonning> For code quality, itâs more explicit than relying on `AcceptToMemoryPool` to fill in memory addresses for passed in state, replaced txs, fee.
<stacie> ariard - agree, and I think the atomic validation is a good thing, but I wonder if people will use this new feature to try validating a set of unrelated transactions. The only use case I can see is to save on the network overhead of multiple RPC calls, but it does seem strange to need to batch/package up a bunch of unrelated transactions ÂŻ\_(ă)_/ÂŻ
<glozow> stacie ariard: yeah, this would still work for unrelated transactions, although that's not what it's intended for. we have a question about API later so we could discuss it further :)
<glozow> cool! So I'll answer the first part, ATMPArgs = arguments for validation, e.g. test_accept. The Workspace gets updated as we go through the steps of validation.
<lightlike> I can see that it makes it makes the code cleaner and easier to read if we separate input arguments and results for ATMP - not sure if that is the only reason for the refactor though?
<jnewbery> sishir: the PolicyScriptChecks function is validating the scripts and signatures. That's the most computationally expensive part of validating a transaction
<glozow> sishir: policy includes non-consensus rules that Bitcoin Core nodes enforce on unconfirmed transactions, mostly to protect from DoS. Some of the checks are in `PreChecks`, and some are in `PolicyScriptChecks`
<glozow> yeah you're all correct. Policy checks rules in addition to consensus. PreChecks is a subset of policy checks because it does some of the policy checks haha
<lightlike> do we need to be concerned about DOS though for a RPC? Or is that in preparation for reusing this code for getting packages from actual peers?
<glozow> Next question: In `AcceptSingleTransaction` (with test_accept=true), both `PolicyScriptChecks` and `ConsensusScriptChecks` are run. However, in `AcceptMultipleTransactions`, we donât run `ConsensusScriptChecks`. Why is that okay?
<AnthonyRonning> Since `AcceptMultipleTransactions` does not broadcast the transactions, even with `test_accept=false`, thereâs no reason to do consensus checks.
<ariard> glozow: I think we don't want to run `ConsensusScriptChecks` because for now those txn aren't added to the mempool and thus there is no matter caching their script results for a potential block validation?
<michaelfolkson> You don't want invalid transactions in your mempool. Takes up space, inefficient, could lose you connections if you try to propagate it?
<glozow> Ok. In a for-realsies transaction validation through MemPoolAccept.AcceptSingleTransaction() (e.g. a tx we receive on P2P), we run ConsensusScriptChecks.
<glozow> When we fetch a coin from a `CCoinsViewMemPool` (in this PR), what 4 places do we look and in what order? Why do we check `cache_package_remove` before `cache_package_add`?
<robot-dreams> s/check/comment* "This is also useful in case of bugs in the standard flags that cause transactions to pass as valid when they're actually invalid."
<AnthonyRonning> `cache_package_remove`, `cache_package_add`, mempool, and then backend (disk?). We check removed first because a package tx before could have already spent another package txâs output.
<glozow> robot-dreams: I think that's a "just in case there is a bug even though we don't think there's a bug" type of thing. policy flags as a superset of consensus flags should guarantee this. (btw we had a whole review club on this topic hehe https://bitcoincore.reviews/19698)
<AnthonyRonning> just a general code question, could there be an a single `cache_package_coin` that both adds and removes as needed? Instead of having 2 lists and iterating through both?
<jnewbery> robot-dreams: a consensus-invalid transaction getting into the mempool is bad news for the miners. They'd construct a block including that transaction and then later find out that the block is invalid.
<jnewbery> so this is just an extra check to make sure that doesn't happen. It shouldn't since policy rules should be stricter than consensus rules, but it's good to double check here.
<jnewbery> glozow: on you question about TestBlockValidity, yes it looks like this is called before getblocktemplate returns a block template. Although I don't know what the advice would be to a miner who saw a "TestBlockValidity() failed" error.
<glozow> we use our cache for performance, but if the attacker got us to pull a bunch of useless coins into our cache, we'd get very little use out of it
<glozow> last question before we wrap up for today: What are your thoughts on the results array (returning one for each tx if successful or the first one if failed)? Do you have any suggestions for improvement?
<lightlike> glozow: would that be an actual attack that would do something worse than slow us down a bit? I'm not so familiar with the topic, but if we have useless full cache, wouldn't we just load more txes from disk until the cache gets better again?
<AnthonyRonning> It seems like the API now does what it intended to do, so ACK. Not API-breaking because nobody could do more than 1 before anyways, this expands functionality.
<robot-dreams> glozow: Another alternative (not necessarily improvement): return an array of the same size, first failure gets input reason, any dependents get `input-missing`?
<jnewbery> We don't ban peers for sending us transactions that are consensus valid but fail our policy checks, or have a fee too low to get into our mempool for example
<sishir> im still trying to understand this PR, will we be able to do something like this (node.testmempoolaccept([rawtx1, rawtx2, rawtx3, rawtx4])) once it's merged?
<robot-dreams> murtyjones: agreed though similar to how compiler writers want to avoid spamming users with errors if they missed a semicolon (and caused the next 500 lines to all have errors), maybe we want some way to show only "uncorrelated" errors
<ariard> Murch: committing to a pairing policy which would ban evict peers for policies divergences would be a violation of the concept itself of policy :)
<jnewbery> there's not time to get into this in great detail now, but the UTXO cache has a slightly unusual access pattern compared to many caches that people are familiar. The main thing it's saving us is _writes_, not _reads_
<jnewbery> if a coin gets spent shortly after it gets created, we never have to write it to disk at all, and many coins get spent in the same block or the next few blocks after they get created (see page 14 of https://eprint.iacr.org/2019/611.pdf)
<stacie> ah! I see it in that comment, "A future pull can extend this rpc to take lists of more than one raw transaction and feed them into an ephemeral mempool (e.g. based on the current mempool)." thanks!
<dhruvm> jnewbery: interesting. does the low frequency of flushing UTXOs to disk have implications on chainstate corruption in case of unexpected termination?
<thomasb06> jnewbery: by the way, would it be useful for Bitcoin to have an error correction algorithm at bit level? My guess is no, but I'm trying to plug in something I know
<ariard> michaelfolkson: a coinjoin counterparty might use its output as the starting outpoint of a new chain without other coinjoin counterparies being aware of it ?
<robot-dreams> Following up on something I didn't understand from earlier, how come we do `ConsensusScriptChecks` in the testmempoolaccept -> AcceptSingleTransaction path, but not in the AcceptMultipleTransactions path?
<michaelfolkson> My understanding is that the use case for package relay is where you construct a chain of transactions that you will broadcast onchain at some point in the future (days, weeks, months)
<michaelfolkson> It is pretty unlikely that you'd construct a Coinjoin with a fee and then just before you broadcast it the network fees jump massively. I guess it is possible
<ariard> michaelfolkson: in the context of multi-party protocol: it's either a) a conveniency mechanism is the bumped tx isn't time-sensitive or b) a security mechanism if they're
<michaelfolkson> Yeah maybe that happens. I'd have thought Coinjoins would want to confirm immediately given the number of participants and how hard it is to organize
<robot-dreams> Here's my current understanding. If we're adding a transaction to the mempool, we bring the inputs into cache, validate, then remove them (because they're spent). If we're TESTING though, we bring the inputs into cache, validate, then manually uncache them?
<robot-dreams> ariard: Earlier I wasn't referring to fee bumping a justice transaction with CPFP, I was referring to double checking that the chain of 2 transactions would be accepted together even though the commitment transaction hadn't been broadcast yet. Would that use case still make sense in the face of malleable justice transactions?
<glozow> robot-dreams: if we added it successfully to the mempool, we keep them in cache because we expect to see them again. but if it was invalid, we uncache them.
<robot-dreams> glozow: I think I have some confusion over what is "it" and "them" :-P if we add transaction T with inputs vin and outputs vout to the mempool, here's my understanding of what happens:
<ariard> robot-dreams: I think I understand your question better, you mean being sure you can punish correctly a revoked commitment. Yes it makes sense to use testmempoolaccept when you manually assert correctness of your transaction generation code
<robot-dreams> glozow: 1. bring everything in vin into the cache, 2. validate, 3. remove everything from vin from the cache / add everything from vout to the cache, is that right?
<ariard> but not for production as you don't have to txstandardardness-verify a counterparty conribution, all justice transaction fields are controlled by the punisher
<glozow> oops đ yeah so we care about all the coins that are spent by a tx, and we don't mark them as spent if we're just adding to mempool. but we do want to keep them in the cache for later.
<robot-dreams> In `ProcessNewPackage` it looks like you unconditionally uncache, but in `AcceptToMemoryPoolWithTime` it looks like you only uncache in the `!result.m_accepted` case
<michaelfolkson> "In most L2s, chain of transactions validity is more liberal than mempool one. Integrating this new testmempoolaccept might break it..."
<robot-dreams> You mean "move fast and break things" is not the motto of bitcoin core? :surprised-pikachu: anyway thanks, that all makes a lot of sense
<michaelfolkson> So this is like a stepping stone advancing testmempoolaccept in the hope that a few more stepping stones along we'll start solving real problems
<michaelfolkson> Maybe testmempoolaccept doesn't have any use cases but a few more stepping stones along a use case or multiple use cases start getting solved