Detect and ignore transactions that were CPFP'd in the fee estimator (mempool)

Jul 13, 2022

https://github.com/bitcoin/bitcoin/pull/25380

Host: glozow - PR author: darosior

The PR branch HEAD was 200a49f at the time of this review club meeting.

Notes

Fees for the wallet are calculated using a fee estimator attached to the node’s mempool, CBlockPolicyEstimator.
- At a high level, it records the feerate and time at which transactions enter the mempool and how many blocks later they confirm. Later, it uses this information to provide the opposite result: it provides a feerate estimate based on a given confirmation target (measured by how many blocks a user is willing to wait for their transaction to be included in a block).
- To learn more about how the fee estimator works, this article and this gist are good places to start. Then, read the comments in the source code and verify that the behavior matches the description!
The estimatesmartfee RPC provides feerate estimates based on the user’s confirmation target. The test-only estimaterawfee RPC provides more detailed information specific to the implementation.
Since the fee estimator is data-based, it’s crucial to accurately record what feerate a transaction is “offered” at. For example, if a transaction is fee-bumped using Child Pays for Parent (CPFP), the parent’s individual feerate would be an underestimation and the child’s would be an overestimation of their “actual” feerate.
The fee estimator skips transactions entering the mempool with unconfirmed inputs, alleviating potential overestimation due to a high-feerate transaction sponsoring its ancestors. However, since it’s impossible to know whether a transaction will have unconfirmed descendants, a low-feerate parent in a CPFP is still included. This means the fee estimator may underestimate feerates.
One solution is to calculate a transaction’s “actual” feerate, taking ancestors and descendants into consideration, (see #23074), but the calculation is not simple.
PR #25380 prevents the fee estimator from underestimating feerates due to inaccurately recording feerates for CPFP’d transactions.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?
What issue does this PR address?
At a high level, how does the fee estimator work?

(a) What information is recorded for new transactions entering the mempool? (Hint: what does mapMemPoolTxs store?)

(b) What information is recorded for transactions in blocks? (Hint: see processBlockTx)

(c) Does the fee estimator track any transactions that are not in the mempool, were removed, or were mined but not in the mempool beforehand?
In this PR, how do we detect which transactions to drop? Do you agree with this approach? (Hint: see commit 5485acf).
Why is the value for totalunconfirmed, described as a “number of txs,” equal to 1.96 here?
(Bonus) How would you modify the fee estimator to accurately account for CPFP fee-bumping?

Meeting Log

17:00

<glozow> #startmeeting

17:00

<larryruane> Yes, only for tx that are in the mempool, and that needs much more metadata since it's in the mempool. A CTransaction describes a transaction no matter where it is

17:00

<glozow> hi everyone, this is Bitcoin Core PR Review Club

17:01

<larryruane> (+1 glozow)

17:01

<larryruane> hi!

17:01

<effexzi> Hi every1

17:01

<svav> Hi

17:01

<ishaanam[m]> hi

17:01

<Amirreza> glozow: Thanks

17:01

<Amirreza> Hi

17:01

<glozow> feel free to say hi, or lurk if you want :) and let us know if this is your first time review clubbing

17:01

<glozow> today we're looking at #25380, "Detect and ignore transactions that were CPFP'd in the fee estimator"

17:01

<Lov3r_Of_Bitcoin> hello

17:01

<glozow> notes in the usual place: https://bitcoincore.reviews/25380

17:02

<glozow> PR at https://github.com/bitcoin/bitcoin/pull/25380

17:03

<schmidty_> hi!

17:03

<glozow> Did everyone get a chance to review the PR and/or look at the notes? How about a y/n

17:03

<larryruane> y, concept and approach ACK

17:03

<Amirreza> y

17:03

<ishaanam[m]> y

17:03

<svav> Looked at the notes

17:03

<Lov3r_Of_Bitcoin> y

17:04

<glozow> wonderful! could one of you describe to us what this PR does?

17:04

<lightlike> Hi

17:04

<Amirreza> glozow: Sorry for disrupting the meeting flow, so mempool entries are not only for unconfirmed txs? even confirmed ones can be an object of mempool entry?

17:04

<BlueMoon> Hello!!

17:05

<glozow> @ everyone, please never about disrupting the meeting, the purpose of this meeting is to learn more about Bitcoin Core so feel free to ask questions whenever :)

17:06

<glozow> Amirreza: mempool entries are only for unconfirmed transactions.

17:06

<Amirreza> glozow: even unrelated to the PR?

17:06

<Amirreza> glozow: Got it, Thanks!

17:06

<ishaanam[m]> This PR addresses the issue of our fee estimator under-estimating fees because of CPFP. Fees are under-estimated because we ignore transactions with unconfirmed parents, which means that, in the case of CPFP, we only account for the lower feerate of the parent even though miners are also incentivized to include that transaction because of the the child's higher feerate, which is the one getting ignored.

17:06

<svav> The PR is correcting fee underestimations that arise from CPFP transactions

17:07

<glozow> Amirreza: all questions are fine by default, and I'll redirect the conversation if I feel like we're spending too much time off topic.

17:09

<larryruane> And I think the problem with fee estimation being too low is that if the user uses this too-low estimate to submit a transaction, it may not confirm as quickly as expected

17:09

<larryruane> (not to submit a tx, but to construct a tx)

17:09

<glozow> ishaanam: svav: larryruane: yes thank you! My only correction is that the underestimation doesn't arise because we ignore txns with unconfirmed parents, but because we aren't accurately assessing a transaction's "actual" feerate.

17:10

<glozow> What does the fee estimator do? I.e. when a user calls `estimatesmartfee`, what are they inputting and what do they expect as the result?

17:10

<larryruane> glozow: I did have a question on your Notes, "However, since it’s impossible to know whether a transaction will have unconfirmed descendants" ... could you elaborate on what you mean?

17:10

<larryruane> so when a tx first enters the mempool, obviously it can't have any descendants,

17:11

<Amirreza> Does the estimation is for a special txs? Isn't it for a given number of blocks? (what feerate pay to include the tx in n blocks) So what why it under-estimates the CPFP'd txs?

17:11

<bitcoinbassist> are fees for the child txns as part of cpfp generally overestimated (to ensure that it will make it into the block)? Could this introduce overestimation of fees?

17:11

<larryruane> so do you mean that if descendants arrive later (into the mempool), then we don't really "know" that there's an upstream transaction in the mempool?

17:11

<glozow> larryruane: i merely mean the first part, i.e. we can't predict if a tx will be spent in the future

17:12

<glozow> while it's unconfirmed

17:12

<larryruane> ok so when a tx first enters the mempool, we adjust the fee estimate based on it ... and later when (if) it gets a descendant, we don't go back and like correct it?

17:13

<glozow> Amirreza: The estimation is not for special txs. Yes, when a user calls `estimatesmartfee`, it inputs a confirmation target (a number of blocks) and gets a feerate.

17:13

<svav> estimatesmartfee - Are entering a confirmation target in number of blocks

17:13

<glozow> As for why it underestimates CPFP transactions, I think we'll answer that throughout the course of this meeting!

17:14

<glozow> larryruane: yes exactly! if a transaction later gets a descendant, we don't change the data in the fee estimator.

17:14

<Amirreza> glozow: the input I think is the number of blocks they want to wait for the tx to be confirmed. The output is the feerate that is more probable to be include in the given number of blocks.

17:14

<glozow> Let's move on to the questions about how the fee estimator works. What information is recorded for new transactions entering the mempool?

17:15

<glozow> (Hint: grep `mapMemPoolTxs`)

17:15

<Amirreza> glozow, the height of the blockchain when tx is entered the mempool

17:15

<glozow> Amirreza: correct, input a number of blocks, output a feerate.

17:15

<glozow> Amirreza: yes. and what else?

17:16

<larryruane> bucket index (i don't know quite what that means)

17:16

<Amirreza> the bucket index, indicating in which feerate bucket does this tx is.

17:18

<glozow> yep. so essentially, when it entered the mempool and at what feerate

17:18

<glozow> And what do we do when we see a transaction in a block?

17:18

<glozow> hint: https://github.com/bitcoin/bitcoin/blob/194710d8ff398838e4e5bb87b56e19ebed1d6c52/src/policy/fees.cpp#L596

17:19

<larryruane> well of course at a high level, we must remove it from the mempool

17:19

<Amirreza> we update the bucket for that feerate

17:20

<larryruane> why does `_removeTx` have that underscore prefix? Should we attach some meaning to that?

17:20

<Amirreza> To see how long it took for that tx in that feerate to be included in a block.

17:21

<glozow> larryruane: I think that's just a naming convention for a private method

17:21

<glozow> Does the fee estimator track any transactions that are not in the mempool, were removed, or were mined but not in the mempool beforehand?

17:23

<Amirreza> I think no, not for mined but not in the mempool beforehand.

17:23

<glozow> Amirreza: exactly. if it wasn't in our mempool, we have no idea how long it took to get confirmed.

17:23

<larryruane> I would say no also ... why isn't this an assert? https://github.com/bitcoin/bitcoin/blob/194710d8ff398838e4e5bb87b56e19ebed1d6c52/src/policy/fees.cpp#L611

17:24

<larryruane> (it's okay if you want to ignore, if too much of a side-trip)

17:26

<glozow> larryruane: no idea. looks like it was written 8 years ago

17:26

<Amirreza> Can someone explain to me what does ClearCurrent do? https://github.com/bitcoin/bitcoin/blob/194710d8ff398838e4e5bb87b56e19ebed1d6c52/src/policy/fees.cpp#L203

17:31

<glozow> Amirreza: tbh not 100% sure. looks like we're shifting the data on how many unconfirmed txs for each bucket.

17:32

<glozow> Next question: this PR detects which transactions in a block were CPFP'd. How does it do this?

17:32

<glozow> hint: https://github.com/bitcoin-core-review-club/bitcoin/commit/5485acfe88051234a09862819ddc2953f9d42058

17:33

<ishaanam[m]> When we encounter a transaction with an unconfirmed parent whose feerate is lower than its own, we drop the parent transaction.

17:34

<lightlike> We check if the individual feerate of a child is higher than its ancestor score. If that is a case, we remove all parents with a feerate lower than that of the child from the fee calculation.

17:34

<bitcoinbassist> glozow: it checks if its individual fee rate is greater than the collective fee rate of all it's ancestors?

17:34

<bitcoinbassist> collective being the average over all ancestors?

17:35

<ishaanam[m]> What is the difference between the ancestor score and the parent feerate?

17:36

<bitcoinbassist> looks like ancestor score is an average feerate over all ancestors? (including the txn itself?)

17:36

<glozow> ishaanam: the ancestor score is the aggregate feerate for a transaction and all of its ancestors. the parent feerate is the individual feerate of the parent alone.

17:36

<lightlike> I wonder why we do the first check, comparing with the ancestor score. Why not just drop this check and remove parents that have a lower fee than their child?

17:36

<glozow> lightlike: yes exactly, there are 2 conditions: individual is higher than ancestor score and the parent's individual feerate is lower than the child feerate.

17:37

<glozow> as for why,

17:37

<glozow> i think it's possible the parent is bumped by a different child?

17:37

<ishaanam[m]> glozow: ok, thanks

17:38

<lightlike> glozow: but that different child would need to be in the block too, so the parent would be removed regardless?

17:38

<glozow> oh i guess we should still drop the parent in that case, even if this isn't the sponsor.

17:38

<glozow> yeah

17:40

<glozow> will think about this

17:41

<lightlike> i think, with the current rule we'd not drop a CPFP parent if its child happens to depend on another high-paying parent (that would have been included in the block regardless of the CPFP'ing with the low-fee parent). That other parent could increase the ancestor score of the child, right?

17:42

<glozow> lightlike: yeah. the high-feerate parent would increase the ancestor score and we wouldn't drop the low-feerate parent. maybe not good.

17:43

<lightlike> glozow: yes, unless there is some other reason why we'd really need that ancestor rule. I'll ask in the PR, maybe I'm missing something.

17:43

<bitcoinbassist> does ancestor here mean only ancestors which are still in the mempool?

17:43

<glozow> i also wonder if it's too drop-happy if we always ignore transactions with a slightly-higher-feerate child

17:44

<glozow> another Question: does anybody have an alternate approach they'd like to share?

17:44

<glozow> bitcoinbassist: yes absolutely. only the unconfirmed ancestors.

17:45

<glozow> Also a good time to discuss the bonus question: How would you modify the fee estimator to more accurately account for CPFP fee-bumping?

100

17:45

<lightlike> i think the nuclear approach would be to simply drop all txes that are parents with a child in the block. That way we wouldn't make any mistakes, but the downside is we'd have less samples for fee estimation.

101

17:46

<glozow> lightlike: indeed. i wonder what % of mempool transactions have no ancestors/descendants

102

17:47

<bitcoinbassist> I wonder if the check of being greater than ancestor_score is saying that the fee of the child must be sufficiently higher than its ancestors, otherwise its ancestors would have already been confirmed? (so it's likely not paying for it's parent?)

103

17:47

<lightlike> glozow: Also, what % of transactions have descendants in the same block they are mined.

104

17:48

<larryruane> I wonder if anyone saves (bitcoin core) mempool snapshots at each new block so that question (that glozow asked) could be answered

105

17:48

<bitcoinbassist> at least one of its unconfirmed ancestors*

106

17:48

<larryruane> (you know just for analytics)

107

17:49

<glozow> bitcoinbassist: yes that's the idea. lower ancestor score = it must have at least one ancestor with a lower feerate

108

17:50

<glozow> larryruane: i'm sure somebody does

109

17:51

<glozow> Did anybody come up with an answer to "How would you modify the fee estimator to more accurately account for CPFP fee-bumping?" ?

110

17:52

<lightlike> it's convenient that since all transaction that are not on the top level (=parents) are already excluded from the fee estimation anyway, so I think we might really need to take complicated parent-child relationships into account where an ancestor score helps.

111

17:53

<lightlike> *NOT need to take into account

112

17:53

<larryruane> glozow: I think that would be here https://github.com/bitcoin/bitcoin/pull/23074 but I can't say I understand very well

113

17:55

<glozow> lightlike: hm. I think you can still definitely have a fee-bumping tx at the bottom of a n-generation chain

114

17:56

<glozow> oh i think maybe that's why it has both parent and ancestor conditions. the second condition would have been "iterate through each ancestor and drop the one that's low feerate," but we only have access to parents from the `CTxMemPoolEntry`.

115

17:58

<glozow> ok, last question: Why is the value for totalunconfirmed, described as a “number of txs,” equal to 1.96 here https://github.com/bitcoin-core-review-club/bitcoin/blob/200a49f7e7197cfa5ba8b8123d3597e84eab0aa1/test/functional/feature_fee_estimation.py#L323?

116

17:58

<lightlike> glozow: right, but the algorithm in this PR wouldn't detect it as CPFP, right? E.g. if the Parent has one low-fee child C1, and C1 has a large-fee child C2, then we wouldn't remove the parent because we don't loop through all ancestors.

117

17:59

<glozow> lightlike: right, the current approach wouldn't catch that.

118

18:00

<glozow> Ok that's all the time we have for today!

119

18:01

<glozow> The last question was secretly a question about how the decay works

120

18:01

<glozow> left as an exercise to the reader

121

18:01

<glozow> #endmeeting