The PR branch HEAD was 35dd8dfcaf at the time of this review club meeting.
Notes
A package is an ordered list of transactions, representable by a Directed Acyclic Graph (a
directed edge exists between a transaction that spends the output of another transaction).
PR #22674 is part of a series of PRs to
implement this proposal. It implements validation and mempool submission of packages consisting
of a single child with its unconfirmed parents.
Future work such as PR #22290 will enable
fee-bumping by CPFP and RBF within packages.
We have discussed Package Mempool Accept in previous review clubs, #20833 and
#21800.
When a transaction fails mempool validation, we categorize the failure as one of a few
TxValidationResult
types. Most notably, we distinguish between consensus rule violations and local policy-based
rejections so that we can inform the P2P layer about peer misbehaviors (see
PeerManagerImpl::MaybePunishNodeForTx).
In a similar vein, this PR distinguishes between PCKG_BAD and PCKG_POLICY.
Miners seek to maximize the total transaction fees while ensuring that their blocks are within
consensus-enforced weight and sigops limits. To simplify this 2-dimensional
knapsack problem,
in the mempool, virtual size of a transaction is
calculated
as the maximum between its BIP141 serialized
size and
its “sigop weight”.
What does IsChildWithParents() check? How does it do this?
What criteria must a package meet in order to be considered a child-with-unconfirmed-parents
package? Is it possible to verify without looking at the current chain? Is it possible to verify
without looking at our mempool?
How does this PR implement checking that a package is child-with-unconfirmed-parents? (Hint: code
here).
Why do we add the child’s
inputs to coins_to_uncache beforehand?
Why do we distinguish between PCKG_BAD and PCKG_POLICY? Within this PR, do we do anything
differently based on the result type?
In what scenarios could the virtual sizes obtained from GetVirtualTransactionSize()here and
the MempoolAcceptResult be different? (Hint: is it possible for the tx to be different? Is it
possible for PreChecks to calculate the virtual size differently?)
Quiz: given a multi-parent-1-child package of Tx1, Tx2, and Tx3 (where Tx3 is the child and there
are no dependencies between the parents), which of the following groups of transactions may be in
the mempool at the end of ProcessNewPackage()?
(A) None
(B) Tx1 only
(C) Tx3 only
(D) Tx1 and Tx2
(E) Tx1 and Tx3
(F) Tx1, Tx2 and Tx3
Under what circumstances is the “mempool full” error returned as the validation result for an individual
transaction? (Hint: the code is
here)
This code
prevents the LimitMempoolSize() from being called after each transaction is submitted. What could
happen if we didn’t do this?
This commit adds a descendant limit for each transaction Workspace and changes the MemPoolAccept
limit to const. Given that our mempool policy does not change during a validation instance, how is
it possible for different transactions to have different descendant limits?
<stickies-v> it adds validation logic for a specific type of package (1 child with multiple parents) so that it can be accepted into the mempool if valid, plus some improved error handling/messaging
<glozow> Kaizen_Kintsugi: partially correct, yes - it defines a package (group of dependent transactions) and implements package acceptance for packages of a specific topology. And it's in preparation for fee-bumping within packages in the future
<larryruane> just to be sure, this concept of a package isn't exposed in any way to the outside world (outside the node) yet, right? No P2P support at all for it? (But will come later?)
<davidbak> one thing unclear to me is whether this is the _first_ change to B.C. dealing with packages or if packages already exist in some way. The gist starts by saying this is for "packages consisting of multiple parents and 1 child" which leaves me wondering if there was a previous one, accepted, dealing with, say, 1 parent and 1 child.
<Kaizen_Kintsugi> cool cool, it seems to relate to the review club a few weeks ago about topoloogical sorting of transactions, this sounds like a pre-sort.
<stickies-v> it verifies that the submitted package has 1 child and that all inputs of that child are transactions that are either submitted in the package (parents) if they are unconfirmed, or transactions that are already confirmed and thus in the UTXO set
<stickies-v> no sorry my earlier explanation is wrong - the UTXO checking happens somewhere else. IsChildWithParents just checks that all the parent txs are used as inputs for the child
<glozow> stickies-v's second answer is correct - we're making sure that all of the transactions correspond to an input of the child (last transaction). though not all parents need to be present, as larryruane says
<stickies-v> I think it's not possible to verify without looking at the current chain, but it is possible to verify without looking at the mempool since all confirmed parent transactions need to be in the package
<stickies-v> all transactions in the package except for the last one need to be used as inputs for the last one (i.e. the child), and all the unconfirmed inputs of the child need to be provided in the package
<larryruane> to answer that, i'm looking at the definition of `IsChildWithParents()` and there's some fancy c++ `std` stuff going on there! I'm not quite familiar with it
<glozow> A child-with-unconfirmed-parents package is a topologically sorted package that consists of exactly one child and all of its unconfirmed parents (no other transactions may be present). The last transaction in the package is the child; each of its inputs must refer to a UTXO in the current chain tip or some preceding transaction in the package.
<larryruane> By recursive I meant, like D has 3 inputs, which refer to outputs of A,B,C (all 4 in the mempool) ... if A has an input that refers to an X output, (X also in mempool), then can (should) X be in the package?
<glozow> How does this PR implement checking that a package is child-with-unconfirmed-parents? Why do we add the child’s inputs to coins_to_uncache beforehand?
<glozow> davidbak: yes, and because arbitrary packages = boundless complexity. we'd either be allowing DoS attacks or using imperfect heuristics that could open up pinning vectors
<stickies-v> glozow we check that each input tx is either in parent_txids or in the UTXO set with m_view.HaveCoin. We keep the coins_to_uncache vector to remove newly added outputs from the cache if the package gets rejected
<glozow> stickies-v: good question. yes it's basically performance. our cache is of limited size, and we want it to store coins that we might need again in the near future
<glozow> for example, if we add a transaction to our mempool, it's a good idea to keep it in the cache because we'll probably look up those inputs when we see the tx in a block later
<glozow> remember that any attacker on P2P can send us a package (and we'll look up all the inputs to validate the transactions), so they could be trying to make us cache thrash
<jnewbery> larryruane: disconnecting/banning is dangerous. If one of those damn attackers is able to make you send a transaction to your peer that would cause them to disconnect you, then they could use that to isolate you, or do it across the network to cause a network split
<larryruane> (for the newer people here, if a peer misbehaves in some possibly-innocuous way, we bump up its "ban score" which means we don't ban it yet, but if the ban score gets too high, then we do ban it (disconnect, i think?))
<glozow> Next question: why might we distinguish between `PCKG_BAD` and `PCKG_POLICY`? Within this PR, do we do anything differently based on the result type?
<jnewbery> if we gave a peer misbehavior points for causing us to fetch UTXOs from disk for a transaction that turned out to fail because of policy, then that could open up those eclipse/netsplit vectors.
<stickies-v> glozow you want to differentiate between two categories of package validation failure, one is network consensus failure (PCKG_BAD) and should not be accepted by anyone - so maybe bad faith? The other is local policy failure (PCKG_POLICY), i.e. the rules each node can tweak, which don't necessarily indicate malicious intent?
<Kaizen_Kintsugi> so a package has to be one child and multiple parents all unconfirmed. If it is anything else it is rejected and coins are removed from the utxo set
<stickies-v> well and I suppose my labeling of malicious intent is a bit too harsh as well - could just be that a peer has a buggy implementation of what's an allowed package, for example
<jnewbery> I think in this PR, we don't do anything differently based on whether the package fails for `PCKG_BAD` or `PCKG_POLICY`, but we could do in future when we update p2p to be able to relay packages
<jnewbery> larryruane: there's no RPC for package submission either. This PR is only making the internal changes to the mempool submission logic (and it's already a large PR!)
<davidbak> there was a way in a previous PR to "dry run" packages (at that time) - could that have been done in this case? or ... is it or is it not a B.C. dev policy to modify RPC handling to account for new-but-incompletely exposed features?
<davidbak> so in the implementation of a feature over multiple PRs (what the Agile people would call stories in an epic) it is ok (from the POV of the maintainers) to have individual PRs that do not include their own testing as long as you get it in a subsequent PR? (trying to learn the dev philosophy here as well as this individual PR)
<jnewbery> There's a distinction between unit tests, which are testing the logic in individual units of code (in this case the mempool submission/acceptance logic), and functional tests, which test end-to-end user functionality.
<jnewbery> in general, I think this project relies too much on the python functional tests, because they're easy to write and our code is not structured in a way that makes it easy to isolate and test individual components
<glozow> that's one possibility. right now we don't have witness replacement ;) so the RPC will just return the transaction that's already in the mempool. if they have different witness sizes then this would be inaccurate!
<glozow> there's another way the virtual size could be different, and it's if the transaction has a toooon of signatures so its "sigop weight" is higher than its BIP141 weight
<jnewbery> larryruane: exactly right. They're _slow_ as well, since they're spinning up one or more full bitcoind nodes. They run in seconds instead of milliseconds
<davidbak> jnewbery: i appreciate that comment, thanks; a different POV, by the way, is that if this PR already has PCKG_BAD then that _is_ an end-to-end user functionality; won't be _exposed_ until the P2P stuff is in later, but if not tested here it adds to the test burden _then_. But it's just an argument, I accept your conclusion that unit tests are appropriate here.
<larryruane> glozow: hope this isn't a sidetrack (ignore if you'd like) but is sigop weight a little like gas in ethereum, so that there's some consideration of the CPU time needed to evaluate scripts? I'm really not familar with sigop weight
<glozow> Miners seek to maximize the total transaction fees while ensuring that their blocks are within consensus-enforced weight and sigops limits. To simplify this 2-dimensional knapsack problem, in the mempool, virtual size of a transaction is calculated as the maximum between its BIP141 serialized size and its “sigop weight”.
<stickies-v> larryruane consensus limits the total amount of sigops in a block though, so miners need to be careful not to include too many sigops in a small/cheap transaction
<stickies-v> hence why it's being bundled in the same heuristic to, like glozow said, make constructing the optimal block more straightforward computationally
<davidbak> not just extensive prep for this meeting, but extensive prep for the PR: the additional written documentation _with excellent drawings_ is a great example of a way to get a PR accepted (to my mind anyway, correct me if I'm wrong)