Mempool validation and submission for packages of 1 child + parents (tx fees and policy, validation)

Host: glozow  -  PR author: glozow

The PR branch HEAD was 35dd8dfcaf at the time of this review club meeting.


  • A package is an ordered list of transactions, representable by a Directed Acyclic Graph (a directed edge exists between a transaction that spends the output of another transaction).

  • Package Mempool Accept is a project implementing mempool validation and policy for packages. It is a prerequisite for package relay.

    • PR #22674 is part of a series of PRs to implement this proposal. It implements validation and mempool submission of packages consisting of a single child with its unconfirmed parents.

    • Future work such as PR #22290 will enable fee-bumping by CPFP and RBF within packages.

    • We have discussed Package Mempool Accept in previous review clubs, #20833 and #21800.

  • If a node sends a consensus-invalid transaction or violates P2P protocol, we should disconnect them in favor of nodes that are following network rules. However, overzealous banning and disconnecting can lead to network partitions.

  • When a transaction fails mempool validation, we categorize the failure as one of a few TxValidationResult types. Most notably, we distinguish between consensus rule violations and local policy-based rejections so that we can inform the P2P layer about peer misbehaviors (see PeerManagerImpl::MaybePunishNodeForTx). In a similar vein, this PR distinguishes between PCKG_BAD and PCKG_POLICY.

  • Miners seek to maximize the total transaction fees while ensuring that their blocks are within consensus-enforced weight and sigops limits. To simplify this 2-dimensional knapsack problem, in the mempool, virtual size of a transaction is calculated as the maximum between its BIP141 serialized size and its “sigop weight”.


  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. What does IsChildWithParents() check? How does it do this?

  3. What criteria must a package meet in order to be considered a child-with-unconfirmed-parents package? Is it possible to verify without looking at the current chain? Is it possible to verify without looking at our mempool?

  4. How does this PR implement checking that a package is child-with-unconfirmed-parents? (Hint: code here). Why do we add the child’s inputs to coins_to_uncache beforehand?

  5. Why do we distinguish between PCKG_BAD and PCKG_POLICY? Within this PR, do we do anything differently based on the result type?

  6. In what scenarios could the virtual sizes obtained from GetVirtualTransactionSize() here and the MempoolAcceptResult be different? (Hint: is it possible for the tx to be different? Is it possible for PreChecks to calculate the virtual size differently?)

  7. Quiz: given a multi-parent-1-child package of Tx1, Tx2, and Tx3 (where Tx3 is the child and there are no dependencies between the parents), which of the following groups of transactions may be in the mempool at the end of ProcessNewPackage()?

    (A) None
    (B) Tx1 only
    (C) Tx3 only
    (D) Tx1 and Tx2
    (E) Tx1 and Tx3
    (F) Tx1, Tx2 and Tx3
  8. Under what circumstances is the “mempool full” error returned as the validation result for an individual transaction? (Hint: the code is here)

  9. This code prevents the LimitMempoolSize() from being called after each transaction is submitted. What could happen if we didn’t do this?

  10. This commit adds a descendant limit for each transaction Workspace and changes the MemPoolAccept limit to const. Given that our mempool policy does not change during a validation instance, how is it possible for different transactions to have different descendant limits?

Meeting Log

  117:00 <glozow> #startmeeting
  217:01 <Kaizen_Kintsugi> awww yisss
  317:01 <Kaizen_Kintsugi> learning time :)
  417:01 <jnewbery> hi!
  517:01 <Kaizen_Kintsugi> in the right place
  617:01 <glozow> Welcome to PR Review Club!
  717:01 <davidbak> hi (lurking today)
  817:01 <schmidty> hi
  917:01 <hernanmarino> Hi !
 1017:01 <glozow> Today we're looking at #22674: Mempool validation and submission for packages of 1 child + parents
 1117:01 <glozow> Notes in the usual place:
 1217:01 <glozow> More background on package mempool accept here:
 1317:01 <SandipanDey[m]> hey there!
 1417:02 <stickies-v> hi everyone
 1517:02 <brunoerg> hi
 1617:02 <larryruane> hi
 1717:02 <lightlike> hi
 1817:02 <glozow> Did anyone get a chance to review the PR, look through the notes, or read the gist? How about a y/n for each?
 1917:02 <Kaizen_Kintsugi> y
 2017:02 <stickies-v> y/y/y
 2117:02 <jnewbery> y/y/y
 2217:02 <davidbak> n/y/y
 2317:03 <larryruane> n/y/n
 2417:03 <glozow> stickies-v: jnewbery: wowowow, extra bonus points! you'll surely ace the quiz
 2517:03 <hernanmarino> n/y/y
 2617:03 <Kaizen_Kintsugi> y/y/y sry
 2717:03 <glozow> Can anyone summarize what this PR does?
 2817:04 <Kaizen_Kintsugi> From what I can tell this creates an object for multiple dependent transactions
 2917:04 <Kaizen_Kintsugi> like to send transactions one after the other so things like fee bumping can be done on them easier in the future
 3017:05 <hernanmarino> it defines mempool validation (criteria) for packages with one child
 3117:05 <stickies-v> it adds validation logic for a specific type of package (1 child with multiple parents) so that it can be accepted into the mempool if valid, plus some improved error handling/messaging
 3217:05 <glozow> Kaizen_Kintsugi: partially correct, yes - it defines a package (group of dependent transactions) and implements package acceptance for packages of a specific topology. And it's in preparation for fee-bumping within packages in the future
 3317:06 <glozow> stickies-v: hernanmarino: right
 3417:06 <glozow> What does `IsChildWithParents()` check? How does it do this?
 3517:07 <larryruane> just to be sure, this concept of a package isn't exposed in any way to the outside world (outside the node) yet, right? No P2P support at all for it? (But will come later?)
 3617:07 <davidbak> one thing unclear to me is whether this is the _first_ change to B.C. dealing with packages or if packages already exist in some way. The gist starts by saying this is for "packages consisting of multiple parents and 1 child" which leaves me wondering if there was a previous one, accepted, dealing with, say, 1 parent and 1 child.
 3717:07 <Kaizen_Kintsugi> cool cool, it seems to relate to the review club a few weeks ago about topoloogical sorting of transactions, this sounds like a pre-sort.
 3817:07 <glozow> larryruane: correct, no P2P support yet
 3917:07 <z9z0b3t1c> what problem does this change solve?
 4017:07 <stickies-v> it verifies that the submitted package has 1 child and that all inputs of that child are transactions that are either submitted in the package (parents) if they are unconfirmed, or transactions that are already confirmed and thus in the UTXO set
 4117:07 <glozow> but this is intended to naturally inform a P2P package
 4217:08 <Kaizen_Kintsugi> i would assume that ischildwith parents checks the mempool for the relevant transactions
 4317:08 <larryruane> if anyone here would like to play around with topological sort, `man tsort`
 4417:08 <glozow> I'm referring to `IsChildWithParents` as defined here
 4517:09 <Kaizen_Kintsugi> looks like it validates the package
 4617:09 <stickies-v> no sorry my earlier explanation is wrong - the UTXO checking happens somewhere else. IsChildWithParents just checks that all the parent txs are used as inputs for the child
 4717:10 <Kaizen_Kintsugi> and it orders them?
 4817:10 <glozow> stickies-v: yeah, you were close! that's indeed implemented in this PR, but the `IsChildWithParents` function only does context-free checking
 4917:11 <larryruane> stickies-v: ".. all inputs of that child are (...) in the package .." I don't think so, all parents don't need to be present in a package
 5017:11 <glozow> no validation is done at this point. we're just checking the package topology given the transaction objects themselves
 5117:11 <Kaizen_Kintsugi> oh cool, it validates the graph of parent child relationships?
 5217:12 <larryruane> I think any single tx in the mempool (regardless of parents) can be a legal package, right?
 5317:12 <Kaizen_Kintsugi> looks like this package cant have a single transaction
 5417:12 <glozow> stickies-v's second answer is correct - we're making sure that all of the transactions correspond to an input of the child (last transaction). though not all parents need to be present, as larryruane says
 5517:12 <larryruane> oh wait, no, sorry, there has to be at least 2 tx present
 5617:13 <glozow> yes, a package is at least 2 transactions
 5717:13 <glozow> next conceptual question: What criteria must a package meet in order to be considered a child-with-unconfirmed-parents package?
 5817:13 <glozow> Is it possible to verify without looking at the current chain? Is it possible to verify without looking at our mempool?
 5917:13 <larryruane> I think the doxygen comment in packages.h should say there must be at least 2
 6017:14 <stickies-v> I think it's not possible to verify without looking at the current chain, but it is possible to verify without looking at the mempool since all confirmed parent transactions need to be in the package
 6117:14 <Kaizen_Kintsugi> damn, That's tough, i think you would have to look at the mempool
 6217:14 <stickies-v> *confirmed -> unconfirmed
 6317:14 <glozow> larryruane: noted :)
 6417:14 <glozow> stickies-v: correct!
 6517:14 <Kaizen_Kintsugi> what if the parents are in a different package?
 6617:15 <glozow> Kaizen_Kintsugi: we wouldn't need to look at the mempool, no
 6717:15 <glozow> er, how would the parents be in a different package?
 6817:15 <Kaizen_Kintsugi> someone put them their maliciously to screw with the system?
 6917:15 <Kaizen_Kintsugi> there*
 7017:15 <glozow> let's start by defining what a child-with-unconfirmed-parents package means
 7117:15 <glozow> can anyone tell us?
 7217:16 <Kaizen_Kintsugi> looks like tis related to child pays for parent
 7317:16 <glozow> hint:
 7417:17 <stickies-v> all transactions in the package except for the last one need to be used as inputs for the last one (i.e. the child), and all the unconfirmed inputs of the child need to be provided in the package
 7517:17 <larryruane> to answer that, i'm looking at the definition of `IsChildWithParents()` and there's some fancy c++ `std` stuff going on there! I'm not quite familiar with it
 7617:17 <glozow> ah, that function wouldn't provide the full answer
 7717:17 <glozow> stickies-v: bingo
 7817:18 <glozow> does that definition make sense to everybody?
 7917:18 <Kaizen_Kintsugi> i read here that this is for transaction batching
 8017:19 <glozow> batching?
 8117:19 <Kaizen_Kintsugi> like sending a buch of transactions at once
 8217:19 <Kaizen_Kintsugi> and then if you need to do a fee bump, this will allow to fee bump on the child transaction only
 8317:20 <Kaizen_Kintsugi> so the parents don't get left dangling
 8417:20 <larryruane> "stickies-v bingo" ... is this recursive?
 8517:21 <stickies-v> larryruane no it's not recursive, but it's using lambda functions which has a bit of a different syntax
 8617:21 <stickies-v> larryruane see for some examples
 8717:21 <glozow> A child-with-unconfirmed-parents package is a topologically sorted package that consists of exactly one child and all of its unconfirmed parents (no other transactions may be present). The last transaction in the package is the child; each of its inputs must refer to a UTXO in the current chain tip or some preceding transaction in the package.
 8817:22 <Kaizen_Kintsugi> okay, I can picture that in my head
 8917:22 <larryruane> By recursive I meant, like D has 3 inputs, which refer to outputs of A,B,C (all 4 in the mempool) ... if A has an input that refers to an X output, (X also in mempool), then can (should) X be in the package?
 9017:23 <davidbak> (transitive)
 9117:23 <larryruane> i.e. a tree, whose "root" is the last tx in the package? (t-sorted)
 9217:23 <larryruane> davidbak: +1
 9317:23 <MarcoFalke> larryruane: That wouldn't be a package, IIUC. You'd have a package of [X,A]
 9417:24 <glozow> larryruane: ah i see what you're saying. no, we're only considering 2-generation packages
 9517:24 <larryruane> glozow: +1 thanks
 9617:24 <glozow> but in terms of the looser definition of a package, yes, any tree like that is a package
 9717:24 <davidbak> by "looser" you mean the long-term vision of which this is a stepping-stone towards?
 9817:25 <glozow> uhhh i don't think we are necessarily stepping towards allowing all packages
 9917:27 <glozow> How does this PR implement checking that a package is child-with-unconfirmed-parents? Why do we add the child’s inputs to coins_to_uncache beforehand?
10017:27 <glozow> hint: implemented here
10117:28 <Kaizen_Kintsugi> is it to make sure you get the child inputs from the utxo set?
10217:28 <davidbak> ok ... is that because the use-case is mainly CPFP at this time and this topology (multi-parent 1-child) is sufficient for that?
10317:30 <glozow> davidbak: yes, and because arbitrary packages = boundless complexity. we'd either be allowing DoS attacks or using imperfect heuristics that could open up pinning vectors
10417:30 <davidbak> tnx
10517:31 <stickies-v> glozow we check that each input tx is either in parent_txids or in the UTXO set with m_view.HaveCoin. We keep the coins_to_uncache vector to remove newly added outputs from the cache if the package gets rejected
10617:31 <glozow> stickies-v: exactly!
10717:32 <Kaizen_Kintsugi> oh cool
10817:32 <glozow> we might be pulling coins from disk into our UTXO set cache. there's no reason to keep them cached
10917:32 <stickies-v> glozow I was confused as to why we check if it's in the cache in the first place though - is this purely for performance reasons?
11017:32 <stickies-v> okay yeah your previous message seems to indicate that
11117:33 <glozow> stickies-v: good question. yes it's basically performance. our cache is of limited size, and we want it to store coins that we might need again in the near future
11217:33 <glozow> for example, if we add a transaction to our mempool, it's a good idea to keep it in the cache because we'll probably look up those inputs when we see the tx in a block later
11317:34 <larryruane> and there's no way to access the coins details _without_ bringing it into cache
11417:34 <stickies-v> alright yeah that makes sense, thanks!
11517:35 <glozow> remember that any attacker on P2P can send us a package (and we'll look up all the inputs to validate the transactions), so they could be trying to make us cache thrash
11617:35 <glozow> larryruane: right exactly
11717:35 <Kaizen_Kintsugi> these damn attackers are always up to no good
11817:35 <larryruane> do we do something like, increase the peer's ban score if that happens a lot?
11917:36 <stickies-v> it looks so easy to miss those vulnerabilities if you're not very familiar with the codebase at large, damn
12017:36 <Kaizen_Kintsugi> for real
12117:37 <jnewbery> larryruane: disconnecting/banning is dangerous. If one of those damn attackers is able to make you send a transaction to your peer that would cause them to disconnect you, then they could use that to isolate you, or do it across the network to cause a network split
12217:37 <larryruane> (for the newer people here, if a peer misbehaves in some possibly-innocuous way, we bump up its "ban score" which means we don't ban it yet, but if the ban score gets too high, then we do ban it (disconnect, i think?))
12317:37 <glozow> larryruane: they can just disconnect and reconnect as a new peer so assigning a ban score wouldn't do much
12417:37 <larryruane> jnewbery: +1 thanks
12517:37 <Kaizen_Kintsugi> man these are interesting subtle problems
12617:38 <glozow> Next question: why might we distinguish between `PCKG_BAD` and `PCKG_POLICY`? Within this PR, do we do anything differently based on the result type?
12717:38 <jnewbery> if we gave a peer misbehavior points for causing us to fetch UTXOs from disk for a transaction that turned out to fail because of policy, then that could open up those eclipse/netsplit vectors.
12817:39 <Kaizen_Kintsugi> Bad packages seem to return a invalid package_state
12917:39 <davidbak> jnewbery: what does "eclipse" mean here?
13017:39 <glozow> there's a discussion here:
13117:40 <jnewbery> davidbak:
13217:40 <stickies-v> glozow you want to differentiate between two categories of package validation failure, one is network consensus failure (PCKG_BAD) and should not be accepted by anyone - so maybe bad faith? The other is local policy failure (PCKG_POLICY), i.e. the rules each node can tweak, which don't necessarily indicate malicious intent?
13317:40 <glozow> stickies-v: yes! :))))
13417:41 <jnewbery> stickies-v: very close, except that PCKG_BAD isn't necessarily a consensus failure
13517:41 <jnewbery> it's more a violation of a (future) p2p protocol rule
13617:41 <glozow> network protocol or consensus
13717:41 <jnewbery> that a peer should only send us a package that is one-child-all-unconfirmed-parents
13817:42 <glozow> consensus would be `PCKG_TX_CONSENSUS`
13917:42 <stickies-v> okay yep makes sense - it's easy to confuse terms sometimes, thanks!
14017:43 <Kaizen_Kintsugi> so a package has to be one child and multiple parents all unconfirmed. If it is anything else it is rejected and coins are removed from the utxo set
14117:43 <stickies-v> well and I suppose my labeling of malicious intent is a bit too harsh as well - could just be that a peer has a buggy implementation of what's an allowed package, for example
14217:43 <jnewbery> I think in this PR, we don't do anything differently based on whether the package fails for `PCKG_BAD` or `PCKG_POLICY`, but we could do in future when we update p2p to be able to relay packages
14317:44 <larryruane> Kaizen_Kintsugi: from the cache, not the UTXO set
14417:44 <Kaizen_Kintsugi> ah ty
14517:44 <glozow> jnewbery: correct
14617:44 <glozow> in the future we would punish/disconnect for `PCKG_BAD` and `PCKG_TX_CONSENSUS`
14717:44 <jnewbery> stickies-v: right, it's sometimes not possible to tell the two apart, and our response to either should be the same in any case
14817:44 <larryruane> I noticed there are no functional (python) tests, is that because we don't have the P2P messages for packages yet? The unit test is nice
14917:45 <glozow> because there is no way to hit this code path through the functional tests :P
15017:46 <jnewbery> larryruane: there's no RPC for package submission either. This PR is only making the internal changes to the mempool submission logic (and it's already a large PR!)
15117:46 <glozow> if you want to look ahead to future commits in #22290, i've added the submitrawpackage RPC and then functional tests
15217:46 <davidbak> there was a way in a previous PR to "dry run" packages (at that time) - could that have been done in this case? or ... is it or is it not a B.C. dev policy to modify RPC handling to account for new-but-incompletely exposed features?
15317:47 <davidbak> (or maybe I got confused between future and past PRs, sorry)
15417:47 <glozow> davidbak: yes that's correct, you're referring to #20833
15517:48 <glozow> that added about 1/2 of the validation logic here
15617:48 <glozow> and lots of testing
15717:48 <glozow> but we wouldn't be able to test package submission through testmempoolaccept, for instance
15817:49 <glozow> Okay next question: let's look at this line of code which is taking the virtual size of a transaction to be returned in testmempoolaccept
15917:50 <glozow> In what scenarios could the virtual sizes obtained from GetVirtualTransactionSize() here and the MempoolAcceptResult be different?
16017:50 <glozow> (Hint: there are at least 2 ways)
16117:50 <davidbak> so in the implementation of a feature over multiple PRs (what the Agile people would call stories in an epic) it is ok (from the POV of the maintainers) to have individual PRs that do not include their own testing as long as you get it in a subsequent PR? (trying to learn the dev philosophy here as well as this individual PR)
16217:51 <glozow> davidbak: the code added in this PR _is_ tested, in the unit tests
16317:51 <davidbak> i get that, yes
16417:52 <glozow> i would say that it's more appropriate to test internal mempool validation from a unit test
16517:52 <jnewbery> There's a distinction between unit tests, which are testing the logic in individual units of code (in this case the mempool submission/acceptance logic), and functional tests, which test end-to-end user functionality.
16617:52 <jnewbery> glozow: +1, it's much more appropriate to use unit tests here
16717:53 <Kaizen_Kintsugi> glozow: if there was a fee bump? shit I'm grasping at straws here
16817:53 <glozow> in this case there is no change in end-to-end user functionality
16917:53 <davidbak> ok, i do know the difference, and I see that POV fine
17017:53 <Kaizen_Kintsugi> or maybe a invalid package is submitted?
17117:53 <larryruane> just a wild guess, maybe if the witness is different?
17217:53 <glozow> larryruane: bingo!
17317:53 <larryruane> haha just lucky :)
17417:53 <jnewbery> in general, I think this project relies too much on the python functional tests, because they're easy to write and our code is not structured in a way that makes it easy to isolate and test individual components
17517:54 <glozow> that's one possibility. right now we don't have witness replacement ;) so the RPC will just return the transaction that's already in the mempool. if they have different witness sizes then this would be inaccurate!
17617:54 <Kaizen_Kintsugi> oh shit, i was thinking it was related to the 'weight' from segwit
17717:55 <larryruane> jnewbery: +1 when a python functional test fails, it can be hell to figure out why (so much code is looped in)
17817:55 <glozow> there's another way the virtual size could be different, and it's if the transaction has a toooon of signatures so its "sigop weight" is higher than its BIP141 weight
17917:55 <jnewbery> larryruane: exactly right. They're _slow_ as well, since they're spinning up one or more full bitcoind nodes. They run in seconds instead of milliseconds
18017:56 <davidbak> jnewbery: i appreciate that comment, thanks; a different POV, by the way, is that if this PR already has PCKG_BAD then that _is_ an end-to-end user functionality; won't be _exposed_ until the P2P stuff is in later, but if not tested here it adds to the test burden _then_. But it's just an argument, I accept your conclusion that unit tests are appropriate here.
18117:57 <stickies-v> glozow but it looks like both GetVirtualTransactionSize and MempoolAcceptResult consider the sigop cost? so how would they differ?
18217:57 <glozow> `GetVirtualTransactionSize` with just a tx as argument doesn't use sigop cost AFAIK
18317:57 <larryruane> glozow: hope this isn't a sidetrack (ignore if you'd like) but is sigop weight a little like gas in ethereum, so that there's some consideration of the CPU time needed to evaluate scripts? I'm really not familar with sigop weight
18417:58 <stickies-v> I looked at policy.cpp and it looks like it does?
18517:58 <glozow> yeah it's just a wrapper for `GetVirtualTransactionSize(tx, 0, 0)`
18617:59 <glozow> yes, that's the one called within `PreChecks`
18717:59 <glozow> the one being called in the RPC code is the one with 1 argument:
18817:59 <stickies-v> ah, dang! shouldn't just have done a simple github search, my bad
18918:00 <glozow> larryruane: I agree that limiting sigops is way of limiting CPU expenditure
19018:00 <glozow> sigop "weight" is just a heuristic used in the mempool
19118:01 <glozow> Miners seek to maximize the total transaction fees while ensuring that their blocks are within consensus-enforced weight and sigops limits. To simplify this 2-dimensional knapsack problem, in the mempool, virtual size of a transaction is calculated as the maximum between its BIP141 serialized size and its “sigop weight”.
19218:01 <jnewbery> larryruane: I think this is where the sigops limit was first introduced:
19318:01 <larryruane> okay I see, not consensus (like ETH gas)
19418:01 <glozow> #endmeeting
19518:02 <jnewbery> larryruane: it is a consensus rule
19618:02 <jnewbery> thanks glozow. Great meeting!
19718:02 <glozow> v sad that we didn't get to all the questions, but thank you all for the engaging conversation :)
19818:02 <gene> more optimized for space on chain, right?
19918:02 <stickies-v> larryruane consensus limits the total amount of sigops in a block though, so miners need to be careful not to include too many sigops in a small/cheap transaction
20018:02 <davidbak> big important PR, only 1hr, that's to be expected!
20118:03 <stickies-v> hence why it's being bundled in the same heuristic to, like glozow said, make constructing the optimal block more straightforward computationally
20218:03 <Kaizen_Kintsugi> man this was so awesome
20318:03 <Kaizen_Kintsugi> looking forward to next week
20418:03 <Kaizen_Kintsugi> thx!
20518:04 <stickies-v> agreed, very interesting PR again, thanks for the extensive prep glozow - the questions were really on point to dig through the code!
20618:05 <davidbak> not just extensive prep for this meeting, but extensive prep for the PR: the additional written documentation _with excellent drawings_ is a great example of a way to get a PR accepted (to my mind anyway, correct me if I'm wrong)
20718:07 <hernanmarino> thanks glozow, great meeting and interesting topic
20818:08 <stickies-v> +1 davidbak , amazing documentation indeed