descendant limits for packages (mempool, tests, validation)

Jun 9, 2021

https://github.com/bitcoin/bitcoin/pull/21800

Host: glozow - PR author: glozow

The PR branch HEAD was b80cd25 at the time of this review club meeting.

Notes

Apart from coinbase transactions, every Bitcoin transaction has at least one parent, another transaction whose outputs are spent as its inputs. We call the spending transaction the child. Ancestors include parents, their parents, and so forth. Descendants include children, their children, and so forth.
Each mempool entry has information on its ancestors and descendants in the mempool. This speeds up some operations (i.e. selecting which transactions to include in a block and which to evict for low feerates) and slows down others (adding and removing transactions).
We have ancestor/descendant limits as part of our mempool policy with the goal of constraining the maximum computation needed for such operations. For example, if seeing a conflicting transaction in a block could cause us to remove a transaction with 10,000 descendants in the mempool, our mempool could become a sink of computational resources (which could also be triggered intentionally by an attacker) rather than a useful cache of unconfirmed transactions.
The current ancestor/descendant default limits are: no transaction can have more than 25 or 101KvB of ancestors or descendants, including itself. The mempool member function CalculateMemPoolAncestors() calculates and enforces limits for a transaction that is in or outside the mempool.
PR#21800 defines and implements an ancestor/descendant limit for packages (groups of transactions being validated together for submission to the mempool). The following figure illustrates some examples of why we cannot simply check CalculateMemPoolAncestors() on the transactions individually.
- In Scenario [A], calling CalculateMemPoolAncestors() on transaction A will show that it is within limits (24 in-mempool ancestors). Calling CalculateMemPoolAncestors() on transaction B will also show that it is within limits (0 in-mempool ancestors).
- In Scenario [B], transaction A and B both have 13 in-mempool ancestors, but they’re the same transactions. If we naively added their ancestor counts together, we would be grossly overestimating.
- In Scenario [H], transaction C’s ancestor limits are only exceeded when both transaction A and B are considered together; it isn’t sufficient to only try pair combinations or simply subtract total package sizes from the limits.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
Why do we have ancestor/descendant limits in the mempool? Why do we use 25 transactions and 101K virtual bytes as our limit?

2a. Is it possible for a block to have a chain of 100 transactions (assuming all other consensus rules are met)?

2b. Imagine a “family” to be any group of transactions connected by parent-child relationships, including descendants-of-ancestors (e.g. “siblings,” two transactions sharing a parent in the group) and ancestors-of-descendants (e.g. “co-parents,” two transactions sharing a child in the group). What is the largest possible family that can exist in the mempool, assuming the default limits are applied? (Hint: it’s not 25).
In our package validation code, we run CalculateMemPoolAncestors() on each transaction individually during PreChecks(). Why is this insufficient?
What is the package ancestor/descendant policy proposed by this PR? What are your thoughts? Do you have any alternative proposals?
How might we test this algorithm? What are some edge cases we might want to consider?
This PR turns the original CalculateMemPoolAncestors() operating on a CTxMemPoolEntry into a wrapper function, constructing a vector from the singular entry. What could happen if the wrapper function used a copy of the entry instead of the entry itself? (Hint: what happens here if the precondition is not met?).
What is a CTxMemPoolEntryRef? Why don’t we just operate on a std::vector<CTxMemPoolEntry> instead?

Meeting Log

17:00

<glozow> #startmeeting

17:00

<murch> Hi

17:00

<FelixWeis> hi

17:00

<dunxen> Hi!

17:00

<emzy> hi

17:00

<glozow> Welcome to PR Review Club!!

17:00

<dergoegge> hi

17:00

<schmidty> hi

17:00

<michaelfolkson> hi

17:00

<svav> Hi

17:00

<otto_a> hi

17:00

<glozow> Today we're looking at #21800 mempool/validation: mempool ancestor/descendant limits for packages

17:00

<lightlike> hi

17:00

<jnewbery> helloo!

17:00

<dariusp> hi

17:00

<glozow> Notes (and a pretty diagram) here: https://bitcoincore.reviews/21800

17:00

<shaunapps> hi!

17:01

<dunxen> glozow: love the diagram!

17:01

<raj_> hi..

17:01

<glozow> Did any of you get a chance to review the PR or check out the notes?

17:01

<glozow> dunxen: thank you!

17:01

<raj_> yes..

17:01

<dunxen> yes

17:01

<sipa_> hi

17:02

<FelixWeis> very pretty diagrams!

17:02

<murch> y

17:02

<svav> y

17:02

<dariusp> checked out notes but not PR, and yes, great diagrams!

17:02

<dergoegge> review not yet, notes yes

17:02

<jnewbery> yes! Love the transaction family trees

17:02

<glozow> wonderful! :D

17:02

<michaelfolkson> ~y

17:03

<emzy> n

17:03

<glozow> Let's start with our first question then (about single transactions, we'll get to packages in abit). Why do we have ancestor/descendant limits in the mempool?

17:03

<glozow> Why do we use 25 transactions and 101K virtual bytes as our limit?

17:04

<RebelOfBabylon> Not to clog up the mempool?

17:04

<otto_a> 24 is too little and 26 is too much

17:04

<svav> So it can't become a sink of computational resources and an attack site

17:04

<glozow> <RebelOfBabylon> could you elaborate on what "clogged" means?

17:04

<raj_> to remove DOS attack vector by removing expensive computation if transactions later gets removed from mepool.

17:04

<lightlike> protect against dos attacks

17:05

<glozow> svav raj_ lightlike: yeah! what specifically might get expensive?

17:05

<dunxen> we want to have a limit to the computation we'd need to perform for a chain of txs. things like adding and removing txs from our mempool can be expensive

17:05

<raj_> Not sure on the rationale on exact limits. Curious..

17:05

<michaelfolkson> 25 seems very high. What use cases need 25 ancestors/descendants?

17:05

<glozow> dunxen: yep! What do we need to do for tx chain when we're adding and removing?

17:06

<raj_> glozow, removing large number of descendant can be expensive..

17:06

<FelixWeis> its not that high if you think of very simple 1 input, 2 output (1 being change)

17:06

<RebelOfBabylon> Well transactions in the mempool use RAM right? So too many large ones would fill up ram. Also you have to verify all parent transactions which can be computationally expensive?

17:06

<RebelOfBabylon> Or not use RAM but sit in RAM*

17:06

<svav> If you try to remove a conflicting transaction that has a large number of descendants in the mempool

17:06

<glozow> to be honest I don't know why 25 exactly. maybe sipa_ knows?

17:06

<FelixWeis> its obviously very inefficinet to do 25 transactions that way in a ~ 10 minute timeframe

17:07

<dunxen> i think we need to check all the descendants and probably remove them if they're not valid anymore

17:07

<glozow> RebelOfBabylon: in terms of space, we usually set a hard limit on how much space it can take up, and will trim if we get too many transactions

17:07

<glozow> here, we're worried about a high level of ancestor/descendant connectivity within the mempool

17:07

<jnewbery> Fun fact: the ancestor/descendant limits were originally 100/1000 txs: https://github.com/bitcoin/bitcoin/commit/5add7a74 https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-August/010221.html

17:07

<glozow> jnewbery: waow!

17:08

<dunxen> wow

17:08

<michaelfolkson> You'd need to really screw up CPFP to need 25 attempts :)

17:08

<FelixWeis> if you know you have to create these 25 outputs, you can do it in 1 batched payment. electrum-wallet even supports "amend by fee" so if you have already an unconfirmed tx in the mempool, you can replace that one with the additional output

17:08

<glozow> svav: yeaah that's a good example! you'd need to go through and make sure you remove all of those descendants

17:08

<murch1> glozow: It's a DOS protection and the value is in the range of reasonable?

17:09

<glozow> btw, these limits are configurable using `-limitancestorsize`, etc.

17:09

<michaelfolkson> Or maybe you are desperate not to pay too high fees even a little bit. So you keep CPFP with a tiny fee increase

17:10

<glozow> well, you could be trying to CPFP 25 transactions at once?

17:10

<murch1> Also ancestor and descendant limits don't effectively limit clusters to be very broad, we found some components in old mempools that were several hundred connected transactions

17:10

<glozow> murch1: yep exactly, that's one of my followup questions.

17:10

<harding> For the vsize limit, there's also a fundamental limit at the size of a block's transaction space. If the last descendant of a package is further away from its oldest unconfirmed ancestor, it can't contribute fee to that ancestor because they can't be included in the same block.

17:10

<glozow> Imagine a "family" to be any group of transactions connected by parent-child relationships, including descendants-of-ancestors and ancestors-of-descendants. What is the largest possible family that can exist in the mempool, assuming the default limits are applied?

17:10

<murch1> glozow: oops.

17:12

<raj_> glozow, it seems there isn't a theoretical limit to that in mempool, other than max mempool size itself? Because a chain can be formed via multiple packages?

17:12

<lightlike> harding: but is this a sufficient reason not to accept larger packages? After all, a miner could just mine the package in two batches, adding the newest descendants in a later block.

17:12

<murch1> hint: https://twitter.com/murchandamus/status/1352464020813524994

17:12

<svav> Largest possible family is 25?

17:12

<glozow> harding: mm that's interesting! so if we're building a block and we only have 100KvB of space left, we wouldn't be able to fit a 102KvB package that relies on the lowest descendant's feerate?

17:13

<glozow> svav: nope :)

17:13

<harding> lightlike: the miner of block n doesn't know that they'll mine block n+1, so by including less profitable ancestor transactions in block n, their competitior mining block n+1 profits more. I don't think that's economically rational.

17:13

<jnewbery> lightlike: the miner is not incentivized to mine the first batch of transactions, since it doesn't expect that it'll mine the next block (unless it has a majority of hash rate on the network)

17:13

<murch1> glozow: It's not limited.

17:13

<glozow> any other guesses?

17:14

<harding> glozow: MAX_MEMPOOL_SIZE number of transactions.

17:14

<murch1> Because transaction graphs can extend sideways by descendants having new unrelated ancestors and vice versa

17:14

<glozow> murch1: I agree with you answer, and thanks for providing a diagram :P

17:14

<murch1> harding: good point ;)

17:14

<glozow> harding and raj_ too

17:14

<lightlike> jnewbery: ok, but only if the feerate of the partial package is not sufficient to be included otherwise

17:15

<glozow> ok, so we've established that families in mempools can get really hairy

17:15

<jnewbery> A related question is "what's the maximum number of ancestors + descendants a single transaction can have", and I think the answer there is 46

17:15

<svav> 25 to the 25?

17:15

<glozow> so we try to limit it, but it's hard to do so

17:15

<jnewbery> or 47 if you include the tx itself

17:16

<glozow> Let's move on to packages: In our package validation code, we already run `CalculateMemPoolAncestors()` on each transaction individually during PreChecks(). Why is this insufficient?

17:16

<jnewbery> lighlike: right - the only point is that the fees associated with txs at the end of that large package don't incentivize the miner to include the txs at the top of that package

17:17

<dergoegge> CalculateMemPoolAncestors() only takes txs into account that are already in the mempool, a tx in a package that has all its parents within the package might exceed the limits but CalculateMemPoolAncestors() would be fine with it because it has 0 ancestors in the mempool (example A)

17:17

<glozow> dergoegge: yes! it needs to know about both in-mempool and in-package ancestors

17:18

<glozow> What is the package ancestor/descendant policy proposed by this PR?

17:18

<harding> lightlike: beyond having a mempool whose size is greater by default than the size of a block, we don't included transactions in the mempool that can't be included in the next block (e.g. timelocked transactions). This makes the transaction selection algorithm simple. A descendent transaction that can't be included in the next block because it had >1e6 vbytes of ancestors would violate that rule.

17:19

<michaelfolkson> glozow: Treating every transaction in the package as each other's ancestor and descendant

100

17:19

<lightlike> harding: thanks, that makes sense.

101

17:20

<glozow> michaelfolkson: correct. <insert incest joke here> what does that look like in the code?

102

17:21

<raj_> glozow, considering the union of the sets of ancestors and descendant of all transactions in a package and then checking weather this giant set exceeds limit.

103

17:21

<harding> It's same policy enforced by Bitcoin Core for standard relay and mempool inclusion, but applied using a simple one-line heuristic that each transaction in the package is counted as having as many more ancestors and descendants as there are transactions in the package being evaluated.

104

17:22

<jnewbery> A bit more historical context around the number 25 being chosen for the ancestor/descendant limit: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-October/011401.html https://github.com/bitcoin/bitcoin/pull/6771

105

17:22

<glozow> harding: well said!

106

17:23

<michaelfolkson> jnewbery: Nice, thanks

107

17:23

<glozow> raj_: yes! implementation-wise, we're putting everyone's ancestors in the same pot while calculating, and using package_count instead of 1 and total_package_size instead of 1 transaction's size

108

17:23

<glozow> I'm really curious to hear if anyone came up with alternative proposals? :)

109

17:25

<michaelfolkson> Well the obvious one is to ditch that assumption :)

110

17:25

<harding> One alternative would be to do the full computation on the package just like we would do if each tx was submitted individually, but for the sake of limiting the CPU/memory needed for an eventual P2P package relay, severely restrict the number of transactions allowed in a package from 25 currently to, say, 5.

111

17:25

<harding> s/needed for/allowed in the worst case/

112

17:26

<glozow> harding: right! that's an interesting idea

113

17:27

<murch> raj_: I think that might make it very difficult to assess whether a transaction you're building wuold be able to be put in the mempool

114

17:27

<raj_> murch, sorry didn't get that.. how so?

115

17:28

<sipa> glozow: so it's really only considering whether the entire package can be accepted at once, and not subsets of it?

116

17:28

<lightlike> Maybe one could try to think of an algorithm to precisely find the set(s) of actually related transactions? Obviously this could easily become too complicated/time-consuming.

117

17:28

<glozow> sipa: yeah, atomic submission

118

17:28

<harding> Another alternative, also aimed at allowing P2P package relay, would be allowing the submitter to include a proof of the longest transaction chain in the package. I *think* it would be possible to take the txids and input outpoints of all the transactions in the package, put them into a tree-like structure, and then to create compact proofs about the shape of that tree. That's a ridiculously heavyweight solution if we expect most

119

17:28

<harding> packages to be just a parent and child.

120

17:28

<murch> The union of all sets of ancestors and descendants is basically what Clara and I recently described as "Clusters" in our block template improvement proposal

121

17:29

<murch> They can be rather large. The tweet I linked shows a cluster of over 200 txs, but is followed by another that shows a cluster of 881 txs in the mempool together

122

17:30

<glozow> harding: are there use cases for packages with more than 2 or 3 transactions where it isn't all 1 chain?

123

17:30

<murch> I think it would be hard to set a meaningful limit that doesn't restrict naturally occuring clusters. In turn a relatively low limit might be easy to leverage to do things akin from transaction pinning or restricting other parties from bumping their txs

124

17:32

<murch> *akin to

125

17:32

<harding> glozow: zmnSCPxj has a proposal on the LN mailing list this week that would very long transaction chains with maybe two children at the top and definitely two children at the bottom.

126

17:33

<glozow> harding: ahh that's good to know!

127

17:34

<harding> (The children at the top would be of a confirmed ancestor, so scratch that.)

128

17:35

<glozow> I am looking forward to those meetings that ariard is hosting, would be good to be able to gather more info on what kinds of packages people are looking to make...

129

17:35

<harding> Oof, realized that I hadn't put that on my calendar yet.

130

17:36

<michaelfolkson> Why should diagram G not be a package?

131

17:36

<glozow> michaelfolkson: because they're not connected

132

17:36

<murch> harding, glozow: Is there any usecase for transaction packages that are not in the same cluster/family?

133

17:36

<glozow> there's no reason why you would submit any of those transactions as a package instead of individually

134

17:36

<michaelfolkson> Can't you combine two independent packages into one to try to save on fees?

135

17:36

<glozow> michaelfolkson: no?

136

17:36

<michaelfolkson> One package effectively subsidizes the other package?

137

17:37

<glozow> how would they do that without being dependent?

138

17:37

<murch> michaelfolkson: Since there is no dependency, there is no reasons for miners to include both at once.

139

17:37

<harding> michaelfolkson: I think that's the technique being called "transaction sponsorship". It would require a consensus change.

140

17:37

<harding> (And it wouldn't save on fees, just allow more flexible CPFP-like behavior.)

141

17:38

<michaelfolkson> Hmm interesting, thanks

142

17:39

<dariusp> similar question: why is [D] a package? Aren't A and B independent, spending different outputs of '1'?

143

17:39

<glozow> the only reason you would put transactions in a package instead of submitting them individually is if some package-specific policy will get you something that individual transactions won't. imo the only policy that you'd want is for a tx to be considered for its descendant feerate (or more generally, i guess it's feerate with sponsors in the package) rather than base feerate

144

17:40

<dariusp> ohh i guess they could both affect the average fee rate of '1'?

145

17:40

<murch> dariusp: Only if the miner considers candidate sets rather than ancestor sets. ^^

146

17:40

<glozow> dariusp: good question! yes, they both affect the descendant feerate of 1, although that's not a very good reason to submit them as a package

147

17:40

<glozow> i suppose they could be preventing 1 from getting evicted together haha

148

17:41

<dariusp> ah i see. So if they were submitted independently, does that mean one of them would be accepted and the other not?

149

17:41

<dariusp> got it, thanks!

150

17:41

<otto_a> dariusp: it might mean that none would be accepted

151

17:41

<murch> yeah, after the first got submitted, the other would be beyond the limit

152

17:42

<otto_a> actually I'm not sure that is true

153

17:42

<dariusp> but i guess the miner could still split the package and accept just one?

154

17:42

<murch> glozow: I don't see how they'd collaborate to keep 1 from being evicted in D

155

17:42

<murch> Unless https://gist.github.com/Xekyo/5cb413fe9f26dbce57abfd344ebbfaf2

156

17:42

<murch> :p

157

17:43

<glozow> murch: eviction is by descendant feerate, but this is after it's already accepted to mempool. them being in a package wouldn't really affect whether or not they're accepted

158

17:43

<glozow> anyway, I'll continue with the review club questions?

159

17:43

<murch> haha, yeah

160

17:43

<michaelfolkson> Sorry :)

161

17:43

<glozow> How might we test package ancestor/descendant limits? What are some edge cases we might want to consider?

162

17:43

<michaelfolkson> Another PR review club on candidate set based block building

163

17:44

<michaelfolkson> The edge cases are effectively the diagrams?

164

17:44

<harding> Testing is a bit tricky because it is more restrictive than the actual relay policy, so we can't be lazy and just spin up a synced node with no mempool and just feed it random packages from another node with a current mempool.

165

17:46

<michaelfolkson> A shapes and V shapes in the tests

166

17:46

<glozow> harding: right. I'm trying to write a fuzzer, but it's so hard to get a package accepted 😂

167

17:46

<murch> michaelfolkson: There is no PR to review yet

168

17:47

<harding> Maybe one edge case to watch out for is CPFP carve outs, where we can get a chain of transactions with length 26 (instead of 25) or combined size of 111,000 vbytes (instead of 101 vbytes).

169

17:47

<michaelfolkson> murch: We've had "conceptual" PR review clubs before. I think we had one on package relay when there was no code (before glozow got to work down this path)

170

17:48

<glozow> harding: yes, thank you! will work on writing a test for that

171

17:48

<michaelfolkson> glozow: What do you mean hard to get a package accepted?

172

17:49

<otto_a> edge cases: anything that would incentivise miners to accept suboptimal packages

173

17:49

<michaelfolkson> It seems specific edge scenarios would be better approach than fuzzing for this

174

17:49

<dariusp> @otto_a what do you mean by suboptimal packages?

175

17:50

<glozow> michaelfolkson: so the hope is to be able to create random packages, submit them to mempool, and make sure that we aren't exceeding limits (which would indicate that our algorithm underestimated). but i've had a hard time getting it to create valid random packages

176

17:51

<otto_a> dariusp: one with a lower feerate that a competing package. However I just realized that block acceptance is a different topic from mempool acceptance.

177

17:51

<michaelfolkson> glozow: So you are waiting a long time for the randomness to create what you need. Yeah deterministic scenarios like murch's crazy tweet picture seems more useful

178

17:52

<dariusp> otto_a ah right

179

17:53

<otto_a> are reorgs an edge case?

180

17:54

<glozow> otto_a: good thinking! no they aren't, because we'll only be using this for package validation. reorgs affect transactions that are already in the mempool, and we process those one at a time

181

17:55

<michaelfolkson> glozow otto_a: A re-org could cause your package to be dropped from a full mempool

182

17:55

<michaelfolkson> But that's not initial acceptance

183

17:55

<glozow> Ok next question: This PR turns the original `CalculateMemPoolAncestors()` operating on a `CTxMemPoolEntry` into a wrapper function, constructing a vector from the singular entry. What could happen if the wrapper function used a copy of the entry instead of the entry itself?

184

17:56

<glozow> (other than "we need to make a copy which takes extra space and time")

185

17:56

<otto_a> is there a lock on the mempool?

186

17:56

<dergoegge> iterator_to only accepts a ref and a ref to a copy would not find what we are looking for?

187

17:56

<glozow> otto_a: yep

188

17:58

<glozow> dergoegge: yes, bonus points to you for being well prepared ^_^ the container's `iterator_to` function requires it to be inside the container, so we'd get a bad value for the copy since it's not there

189

17:59

<glozow> Last question: What is a `CTxMemPoolEntryRef`?

190

18:00

<glozow> okie dokie that is left as an exercise to the reader

191

18:00

<murch> kthxbye

192

18:00

<glozow> thanks for coming everybody! I really appreciate the feedback and discussions today :) it helped me a lot

193

18:00

<glozow> #endmeeting