Mempool/validation: mempool ancestor/descendant limits for packages (mempool, tests, validation)

https://github.com/bitcoin/bitcoin/pull/21800

Host: glozow  -  PR author: glozow

The PR branch HEAD was b80cd25 at the time of this review club meeting.

Notes

  • Apart from coinbase transactions, every Bitcoin transaction has at least one parent, another transaction whose outputs are spent as its inputs. We call the spending transaction the child. Ancestors include parents, their parents, and so forth. Descendants include children, their children, and so forth.

  • Each mempool entry has information on its ancestors and descendants in the mempool. This speeds up some operations (i.e. selecting which transactions to include in a block and which to evict for low feerates) and slows down others (adding and removing transactions).

  • We have ancestor/descendant limits as part of our mempool policy with the goal of constraining the maximum computation needed for such operations. For example, if seeing a conflicting transaction in a block could cause us to remove a transaction with 10,000 descendants in the mempool, our mempool could become a sink of computational resources (which could also be triggered intentionally by an attacker) rather than a useful cache of unconfirmed transactions.

  • The current ancestor/descendant default limits are: no transaction can have more than 25 or 101KvB of ancestors or descendants, including itself. The mempool member function CalculateMemPoolAncestors() calculates and enforces limits for a transaction that is in or outside the mempool.

  • PR#21800 defines and implements an ancestor/descendant limit for packages (groups of transactions being validated together for submission to the mempool). The following figure illustrates some examples of why we cannot simply check CalculateMemPoolAncestors() on the transactions individually.

    • In Scenario [A], calling CalculateMemPoolAncestors() on transaction A will show that it is within limits (24 in-mempool ancestors). Calling CalculateMemPoolAncestors() on transaction B will also show that it is within limits (0 in-mempool ancestors).

    • In Scenario [B], transaction A and B both have 13 in-mempool ancestors, but they’re the same transactions. If we naively added their ancestor counts together, we would be grossly overestimating.

    • In Scenario [H], transaction C’s ancestor limits are only exceeded when both transaction A and B are considered together; it isn’t sufficient to only try pair combinations or simply subtract total package sizes from the limits.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. Why do we have ancestor/descendant limits in the mempool? Why do we use 25 transactions and 101K virtual bytes as our limit?

    2a. Is it possible for a block to have a chain of 100 transactions (assuming all other consensus rules are met)?

    2b. Imagine a “family” to be any group of transactions connected by parent-child relationships, including descendants-of-ancestors (e.g. “siblings,” two transactions sharing a parent in the group) and ancestors-of-descendants (e.g. “co-parents,” two transactions sharing a child in the group). What is the largest possible family that can exist in the mempool, assuming the default limits are applied? (Hint: it’s not 25).

  3. In our package validation code, we run CalculateMemPoolAncestors() on each transaction individually during PreChecks(). Why is this insufficient?

  4. What is the package ancestor/descendant policy proposed by this PR? What are your thoughts? Do you have any alternative proposals?

  5. How might we test this algorithm? What are some edge cases we might want to consider?

  6. This PR turns the original CalculateMemPoolAncestors() operating on a CTxMemPoolEntry into a wrapper function, constructing a vector from the singular entry. What could happen if the wrapper function used a copy of the entry instead of the entry itself? (Hint: what happens here if the precondition is not met?).

  7. What is a CTxMemPoolEntryRef? Why don’t we just operate on a std::vector<CTxMemPoolEntry> instead?

Meeting Log

  117:00 <glozow> #startmeeting
  217:00 <murch> Hi
  317:00 <FelixWeis> hi
  417:00 <dunxen> Hi!
  517:00 <emzy> hi
  617:00 <glozow> Welcome to PR Review Club!!
  717:00 <dergoegge> hi
  817:00 <schmidty> hi
  917:00 <michaelfolkson> hi
 1017:00 <svav> Hi
 1117:00 <otto_a> hi
 1217:00 <glozow> Today we're looking at #21800 mempool/validation: mempool ancestor/descendant limits for packages
 1317:00 <lightlike> hi
 1417:00 <jnewbery> helloo!
 1517:00 <dariusp> hi
 1617:00 <glozow> Notes (and a pretty diagram) here: https://bitcoincore.reviews/21800
 1717:00 <shaunapps> hi!
 1817:01 <dunxen> glozow: love the diagram!
 1917:01 <raj_> hi..
 2017:01 <glozow> Did any of you get a chance to review the PR or check out the notes?
 2117:01 <glozow> dunxen: thank you!
 2217:01 <raj_> yes..
 2317:01 <dunxen> yes
 2417:01 <sipa_> hi
 2517:02 <FelixWeis> very pretty diagrams!
 2617:02 <murch> y
 2717:02 <svav> y
 2817:02 <dariusp> checked out notes but not PR, and yes, great diagrams!
 2917:02 <dergoegge> review not yet, notes yes
 3017:02 <jnewbery> yes! Love the transaction family trees
 3117:02 <glozow> wonderful! :D
 3217:02 <michaelfolkson> ~y
 3317:03 <emzy> n
 3417:03 <glozow> Let's start with our first question then (about single transactions, we'll get to packages in abit). Why do we have ancestor/descendant limits in the mempool?
 3517:03 <glozow> Why do we use 25 transactions and 101K virtual bytes as our limit?
 3617:04 <RebelOfBabylon> Not to clog up the mempool?
 3717:04 <otto_a> 24 is too little and 26 is too much
 3817:04 <svav> So it can't become a sink of computational resources and an attack site
 3917:04 <glozow> <RebelOfBabylon> could you elaborate on what "clogged" means?
 4017:04 <raj_> to remove DOS attack vector by removing expensive computation if transactions later gets removed from mepool.
 4117:04 <lightlike> protect against dos attacks
 4217:05 <glozow> svav raj_ lightlike: yeah! what specifically might get expensive?
 4317:05 <dunxen> we want to have a limit to the computation we'd need to perform for a chain of txs. things like adding and removing txs from our mempool can be expensive
 4417:05 <raj_> Not sure on the rationale on exact limits. Curious..
 4517:05 <michaelfolkson> 25 seems very high. What use cases need 25 ancestors/descendants?
 4617:05 <glozow> dunxen: yep! What do we need to do for tx chain when we're adding and removing?
 4717:06 <raj_> glozow, removing large number of descendant can be expensive..
 4817:06 <FelixWeis> its not that high if you think of very simple 1 input, 2 output (1 being change)
 4917:06 <RebelOfBabylon> Well transactions in the mempool use RAM right? So too many large ones would fill up ram. Also you have to verify all parent transactions which can be computationally expensive?
 5017:06 <RebelOfBabylon> Or not use RAM but sit in RAM*
 5117:06 <svav> If you try to remove a conflicting transaction that has a large number of descendants in the mempool
 5217:06 <glozow> to be honest I don't know why 25 exactly. maybe sipa_ knows?
 5317:06 <FelixWeis> its obviously very inefficinet to do 25 transactions that way in a ~ 10 minute timeframe
 5417:07 <dunxen> i think we need to check all the descendants and probably remove them if they're not valid anymore
 5517:07 <glozow> RebelOfBabylon: in terms of space, we usually set a hard limit on how much space it can take up, and will trim if we get too many transactions
 5617:07 <glozow> here, we're worried about a high level of ancestor/descendant connectivity within the mempool
 5717:07 <jnewbery> Fun fact: the ancestor/descendant limits were originally 100/1000 txs: https://github.com/bitcoin/bitcoin/commit/5add7a74 https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-August/010221.html
 5817:07 <glozow> jnewbery: waow!
 5917:08 <dunxen> wow
 6017:08 <michaelfolkson> You'd need to really screw up CPFP to need 25 attempts :)
 6117:08 <FelixWeis> if you know you have to create these 25 outputs, you can do it in 1 batched payment. electrum-wallet even supports "amend by fee" so if you have already an unconfirmed tx in the mempool, you can replace that one with the additional output
 6217:08 <glozow> svav: yeaah that's a good example! you'd need to go through and make sure you remove all of those descendants
 6317:08 <murch1> glozow: It's a DOS protection and the value is in the range of reasonable?
 6417:09 <glozow> btw, these limits are configurable using `-limitancestorsize`, etc.
 6517:09 <michaelfolkson> Or maybe you are desperate not to pay too high fees even a little bit. So you keep CPFP with a tiny fee increase
 6617:10 <glozow> well, you could be trying to CPFP 25 transactions at once?
 6717:10 <murch1> Also ancestor and descendant limits don't effectively limit clusters to be very broad, we found some components in old mempools that were several hundred connected transactions
 6817:10 <glozow> murch1: yep exactly, that's one of my followup questions.
 6917:10 <harding> For the vsize limit, there's also a fundamental limit at the size of a block's transaction space. If the last descendant of a package is further away from its oldest unconfirmed ancestor, it can't contribute fee to that ancestor because they can't be included in the same block.
 7017:10 <glozow> Imagine a "family" to be any group of transactions connected by parent-child relationships, including descendants-of-ancestors and ancestors-of-descendants. What is the largest possible family that can exist in the mempool, assuming the default limits are applied?
 7117:10 <murch1> glozow: oops.
 7217:12 <raj_> glozow, it seems there isn't a theoretical limit to that in mempool, other than max mempool size itself? Because a chain can be formed via multiple packages?
 7317:12 <lightlike> harding: but is this a sufficient reason not to accept larger packages? After all, a miner could just mine the package in two batches, adding the newest descendants in a later block.
 7417:12 <murch1> hint: https://twitter.com/murchandamus/status/1352464020813524994
 7517:12 <svav> Largest possible family is 25?
 7617:12 <glozow> harding: mm that's interesting! so if we're building a block and we only have 100KvB of space left, we wouldn't be able to fit a 102KvB package that relies on the lowest descendant's feerate?
 7717:13 <glozow> svav: nope :)
 7817:13 <harding> lightlike: the miner of block n doesn't know that they'll mine block n+1, so by including less profitable ancestor transactions in block n, their competitior mining block n+1 profits more. I don't think that's economically rational.
 7917:13 <jnewbery> lightlike: the miner is not incentivized to mine the first batch of transactions, since it doesn't expect that it'll mine the next block (unless it has a majority of hash rate on the network)
 8017:13 <murch1> glozow: It's not limited.
 8117:13 <glozow> any other guesses?
 8217:14 <harding> glozow: MAX_MEMPOOL_SIZE number of transactions.
 8317:14 <murch1> Because transaction graphs can extend sideways by descendants having new unrelated ancestors and vice versa
 8417:14 <glozow> murch1: I agree with you answer, and thanks for providing a diagram :P
 8517:14 <murch1> harding: good point ;)
 8617:14 <glozow> harding and raj_ too
 8717:14 <lightlike> jnewbery: ok, but only if the feerate of the partial package is not sufficient to be included otherwise
 8817:15 <glozow> ok, so we've established that families in mempools can get really hairy
 8917:15 <jnewbery> A related question is "what's the maximum number of ancestors + descendants a single transaction can have", and I think the answer there is 46
 9017:15 <svav> 25 to the 25?
 9117:15 <glozow> so we try to limit it, but it's hard to do so
 9217:15 <jnewbery> or 47 if you include the tx itself
 9317:16 <glozow> Let's move on to packages: In our package validation code, we already run `CalculateMemPoolAncestors()` on each transaction individually during PreChecks(). Why is this insufficient?
 9417:16 <jnewbery> lighlike: right - the only point is that the fees associated with txs at the end of that large package don't incentivize the miner to include the txs at the top of that package
 9517:17 <dergoegge> CalculateMemPoolAncestors() only takes txs into account that are already in the mempool, a tx in a package that has all its parents within the package might exceed the limits but CalculateMemPoolAncestors() would be fine with it because it has 0 ancestors in the mempool (example A)
 9617:17 <glozow> dergoegge: yes! it needs to know about both in-mempool and in-package ancestors
 9717:18 <glozow> What is the package ancestor/descendant policy proposed by this PR?
 9817:18 <harding> lightlike: beyond having a mempool whose size is greater by default than the size of a block, we don't included transactions in the mempool that can't be included in the next block (e.g. timelocked transactions). This makes the transaction selection algorithm simple. A descendent transaction that can't be included in the next block because it had >1e6 vbytes of ancestors would violate that rule.
 9917:19 <michaelfolkson> glozow: Treating every transaction in the package as each other's ancestor and descendant
10017:19 <lightlike> harding: thanks, that makes sense.
10117:20 <glozow> michaelfolkson: correct. <insert incest joke here> what does that look like in the code?
10217:21 <raj_> glozow, considering the union of the sets of ancestors and descendant of all transactions in a package and then checking weather this giant set exceeds limit.
10317:21 <harding> It's same policy enforced by Bitcoin Core for standard relay and mempool inclusion, but applied using a simple one-line heuristic that each transaction in the package is counted as having as many more ancestors and descendants as there are transactions in the package being evaluated.
10417:22 <jnewbery> A bit more historical context around the number 25 being chosen for the ancestor/descendant limit: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-October/011401.html https://github.com/bitcoin/bitcoin/pull/6771
10517:22 <glozow> harding: well said!
10617:23 <michaelfolkson> jnewbery: Nice, thanks
10717:23 <glozow> raj_: yes! implementation-wise, we're putting everyone's ancestors in the same pot while calculating, and using package_count instead of 1 and total_package_size instead of 1 transaction's size
10817:23 <glozow> I'm really curious to hear if anyone came up with alternative proposals? :)
10917:25 <michaelfolkson> Well the obvious one is to ditch that assumption :)
11017:25 <harding> One alternative would be to do the full computation on the package just like we would do if each tx was submitted individually, but for the sake of limiting the CPU/memory needed for an eventual P2P package relay, severely restrict the number of transactions allowed in a package from 25 currently to, say, 5.
11117:25 <harding> s/needed for/allowed in the worst case/
11217:26 <glozow> harding: right! that's an interesting idea
11317:27 <murch> raj_: I think that might make it very difficult to assess whether a transaction you're building wuold be able to be put in the mempool
11417:27 <raj_> murch, sorry didn't get that.. how so?
11517:28 <sipa> glozow: so it's really only considering whether the entire package can be accepted at once, and not subsets of it?
11617:28 <lightlike> Maybe one could try to think of an algorithm to precisely find the set(s) of actually related transactions? Obviously this could easily become too complicated/time-consuming.
11717:28 <glozow> sipa: yeah, atomic submission
11817:28 <harding> Another alternative, also aimed at allowing P2P package relay, would be allowing the submitter to include a proof of the longest transaction chain in the package. I *think* it would be possible to take the txids and input outpoints of all the transactions in the package, put them into a tree-like structure, and then to create compact proofs about the shape of that tree. That's a ridiculously heavyweight solution if we expect most
11917:28 <harding> packages to be just a parent and child.
12017:28 <murch> The union of all sets of ancestors and descendants is basically what Clara and I recently described as "Clusters" in our block template improvement proposal
12117:29 <murch> They can be rather large. The tweet I linked shows a cluster of over 200 txs, but is followed by another that shows a cluster of 881 txs in the mempool together
12217:30 <glozow> harding: are there use cases for packages with more than 2 or 3 transactions where it isn't all 1 chain?
12317:30 <murch> I think it would be hard to set a meaningful limit that doesn't restrict naturally occuring clusters. In turn a relatively low limit might be easy to leverage to do things akin from transaction pinning or restricting other parties from bumping their txs
12417:32 <murch> *akin to
12517:32 <harding> glozow: zmnSCPxj has a proposal on the LN mailing list this week that would very long transaction chains with maybe two children at the top and definitely two children at the bottom.
12617:33 <glozow> harding: ahh that's good to know!
12717:34 <harding> (The children at the top would be of a confirmed ancestor, so scratch that.)
12817:35 <glozow> I am looking forward to those meetings that ariard is hosting, would be good to be able to gather more info on what kinds of packages people are looking to make...
12917:35 <harding> Oof, realized that I hadn't put that on my calendar yet.
13017:36 <michaelfolkson> Why should diagram G not be a package?
13117:36 <glozow> michaelfolkson: because they're not connected
13217:36 <murch> harding, glozow: Is there any usecase for transaction packages that are not in the same cluster/family?
13317:36 <glozow> there's no reason why you would submit any of those transactions as a package instead of individually
13417:36 <michaelfolkson> Can't you combine two independent packages into one to try to save on fees?
13517:36 <glozow> michaelfolkson: no?
13617:36 <michaelfolkson> One package effectively subsidizes the other package?
13717:37 <glozow> how would they do that without being dependent?
13817:37 <murch> michaelfolkson: Since there is no dependency, there is no reasons for miners to include both at once.
13917:37 <harding> michaelfolkson: I think that's the technique being called "transaction sponsorship". It would require a consensus change.
14017:37 <harding> (And it wouldn't save on fees, just allow more flexible CPFP-like behavior.)
14117:38 <michaelfolkson> Hmm interesting, thanks
14217:39 <dariusp> similar question: why is [D] a package? Aren't A and B independent, spending different outputs of '1'?
14317:39 <glozow> the only reason you would put transactions in a package instead of submitting them individually is if some package-specific policy will get you something that individual transactions won't. imo the only policy that you'd want is for a tx to be considered for its descendant feerate (or more generally, i guess it's feerate with sponsors in the package) rather than base feerate
14417:40 <dariusp> ohh i guess they could both affect the average fee rate of '1'?
14517:40 <murch> dariusp: Only if the miner considers candidate sets rather than ancestor sets. ^^
14617:40 <glozow> dariusp: good question! yes, they both affect the descendant feerate of 1, although that's not a very good reason to submit them as a package
14717:40 <glozow> i suppose they could be preventing 1 from getting evicted together haha
14817:41 <dariusp> ah i see. So if they were submitted independently, does that mean one of them would be accepted and the other not?
14917:41 <dariusp> got it, thanks!
15017:41 <otto_a> dariusp: it might mean that none would be accepted
15117:41 <murch> yeah, after the first got submitted, the other would be beyond the limit
15217:42 <otto_a> actually I'm not sure that is true
15317:42 <dariusp> but i guess the miner could still split the package and accept just one?
15417:42 <murch> glozow: I don't see how they'd collaborate to keep 1 from being evicted in D
15517:42 <murch> Unless https://gist.github.com/Xekyo/5cb413fe9f26dbce57abfd344ebbfaf2
15617:42 <murch> :p
15717:43 <glozow> murch: eviction is by descendant feerate, but this is after it's already accepted to mempool. them being in a package wouldn't really affect whether or not they're accepted
15817:43 <glozow> anyway, I'll continue with the review club questions?
15917:43 <murch> haha, yeah
16017:43 <michaelfolkson> Sorry :)
16117:43 <glozow> How might we test package ancestor/descendant limits? What are some edge cases we might want to consider?
16217:43 <michaelfolkson> Another PR review club on candidate set based block building
16317:44 <michaelfolkson> The edge cases are effectively the diagrams?
16417:44 <harding> Testing is a bit tricky because it is more restrictive than the actual relay policy, so we can't be lazy and just spin up a synced node with no mempool and just feed it random packages from another node with a current mempool.
16517:46 <michaelfolkson> A shapes and V shapes in the tests
16617:46 <glozow> harding: right. I'm trying to write a fuzzer, but it's so hard to get a package accepted 😂
16717:46 <murch> michaelfolkson: There is no PR to review yet
16817:47 <harding> Maybe one edge case to watch out for is CPFP carve outs, where we can get a chain of transactions with length 26 (instead of 25) or combined size of 111,000 vbytes (instead of 101 vbytes).
16917:47 <michaelfolkson> murch: We've had "conceptual" PR review clubs before. I think we had one on package relay when there was no code (before glozow got to work down this path)
17017:48 <glozow> harding: yes, thank you! will work on writing a test for that
17117:48 <michaelfolkson> glozow: What do you mean hard to get a package accepted?
17217:49 <otto_a> edge cases: anything that would incentivise miners to accept suboptimal packages
17317:49 <michaelfolkson> It seems specific edge scenarios would be better approach than fuzzing for this
17417:49 <dariusp> @otto_a what do you mean by suboptimal packages?
17517:50 <glozow> michaelfolkson: so the hope is to be able to create random packages, submit them to mempool, and make sure that we aren't exceeding limits (which would indicate that our algorithm underestimated). but i've had a hard time getting it to create valid random packages
17617:51 <otto_a> dariusp: one with a lower feerate that a competing package. However I just realized that block acceptance is a different topic from mempool acceptance.
17717:51 <michaelfolkson> glozow: So you are waiting a long time for the randomness to create what you need. Yeah deterministic scenarios like murch's crazy tweet picture seems more useful
17817:52 <dariusp> otto_a ah right
17917:53 <otto_a> are reorgs an edge case?
18017:54 <glozow> otto_a: good thinking! no they aren't, because we'll only be using this for package validation. reorgs affect transactions that are already in the mempool, and we process those one at a time
18117:55 <michaelfolkson> glozow otto_a: A re-org could cause your package to be dropped from a full mempool
18217:55 <michaelfolkson> But that's not initial acceptance
18317:55 <glozow> Ok next question: This PR turns the original `CalculateMemPoolAncestors()` operating on a `CTxMemPoolEntry` into a wrapper function, constructing a vector from the singular entry. What could happen if the wrapper function used a copy of the entry instead of the entry itself?
18417:56 <glozow> (other than "we need to make a copy which takes extra space and time")
18517:56 <otto_a> is there a lock on the mempool?
18617:56 <dergoegge> iterator_to only accepts a ref and a ref to a copy would not find what we are looking for?
18717:56 <glozow> otto_a: yep
18817:58 <glozow> dergoegge: yes, bonus points to you for being well prepared ^_^ the container's `iterator_to` function requires it to be inside the container, so we'd get a bad value for the copy since it's not there
18917:59 <glozow> Last question: What is a `CTxMemPoolEntryRef`?
19018:00 <glozow> okie dokie that is left as an exercise to the reader
19118:00 <murch> kthxbye
19218:00 <glozow> thanks for coming everybody! I really appreciate the feedback and discussions today :) it helped me a lot
19318:00 <glozow> #endmeeting