Detect and ignore transactions that were CPFP'd in the fee estimator (mempool)

https://github.com/bitcoin/bitcoin/pull/25380

Host: glozow  -  PR author: darosior

The PR branch HEAD was 200a49f at the time of this review club meeting.

Notes

  • Fees for the wallet are calculated using a fee estimator attached to the node’s mempool, CBlockPolicyEstimator.

    • At a high level, it records the feerate and time at which transactions enter the mempool and how many blocks later they confirm. Later, it uses this information to provide the opposite result: it provides a feerate estimate based on a given confirmation target (measured by how many blocks a user is willing to wait for their transaction to be included in a block).

    • To learn more about how the fee estimator works, this article and this gist are good places to start. Then, read the comments in the source code and verify that the behavior matches the description!

  • The estimatesmartfee RPC provides feerate estimates based on the user’s confirmation target. The test-only estimaterawfee RPC provides more detailed information specific to the implementation.

  • Since the fee estimator is data-based, it’s crucial to accurately record what feerate a transaction is “offered” at. For example, if a transaction is fee-bumped using Child Pays for Parent (CPFP), the parent’s individual feerate would be an underestimation and the child’s would be an overestimation of their “actual” feerate.

  • The fee estimator skips transactions entering the mempool with unconfirmed inputs, alleviating potential overestimation due to a high-feerate transaction sponsoring its ancestors. However, since it’s impossible to know whether a transaction will have unconfirmed descendants, a low-feerate parent in a CPFP is still included. This means the fee estimator may underestimate feerates.

  • One solution is to calculate a transaction’s “actual” feerate, taking ancestors and descendants into consideration, (see #23074), but the calculation is not simple.

  • PR #25380 prevents the fee estimator from underestimating feerates due to inaccurately recording feerates for CPFP’d transactions.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. What issue does this PR address?

  3. At a high level, how does the fee estimator work?

    (a) What information is recorded for new transactions entering the mempool? (Hint: what does mapMemPoolTxs store?)

    (b) What information is recorded for transactions in blocks? (Hint: see processBlockTx)

    (c) Does the fee estimator track any transactions that are not in the mempool, were removed, or were mined but not in the mempool beforehand?

  4. In this PR, how do we detect which transactions to drop? Do you agree with this approach? (Hint: see commit 5485acf).

  5. Why is the value for totalunconfirmed, described as a “number of txs,” equal to 1.96 here?

  6. (Bonus) How would you modify the fee estimator to accurately account for CPFP fee-bumping?

Meeting Log

  117:00 <glozow> #startmeeting
  217:00 <larryruane> Yes, only for tx that are in the mempool, and that needs much more metadata since it's in the mempool. A CTransaction describes a transaction no matter where it is
  317:00 <glozow> hi everyone, this is Bitcoin Core PR Review Club
  417:01 <larryruane> (+1 glozow)
  517:01 <larryruane> hi!
  617:01 <effexzi> Hi every1
  717:01 <svav> Hi
  817:01 <ishaanam[m]> hi
  917:01 <Amirreza> glozow: Thanks
 1017:01 <Amirreza> Hi
 1117:01 <glozow> feel free to say hi, or lurk if you want :) and let us know if this is your first time review clubbing
 1217:01 <glozow> today we're looking at #25380, "Detect and ignore transactions that were CPFP'd in the fee estimator"
 1317:01 <Lov3r_Of_Bitcoin> hello
 1417:01 <glozow> notes in the usual place: https://bitcoincore.reviews/25380
 1517:02 <glozow> PR at https://github.com/bitcoin/bitcoin/pull/25380
 1617:03 <schmidty_> hi!
 1717:03 <glozow> Did everyone get a chance to review the PR and/or look at the notes? How about a y/n
 1817:03 <larryruane> y, concept and approach ACK
 1917:03 <Amirreza> y
 2017:03 <ishaanam[m]> y
 2117:03 <svav> Looked at the notes
 2217:03 <Lov3r_Of_Bitcoin> y
 2317:04 <glozow> wonderful! could one of you describe to us what this PR does?
 2417:04 <lightlike> Hi
 2517:04 <Amirreza> glozow: Sorry for disrupting the meeting flow, so mempool entries are not only for unconfirmed txs? even confirmed ones can be an object of mempool entry?
 2617:04 <BlueMoon> Hello!!
 2717:05 <glozow> @ everyone, please never about disrupting the meeting, the purpose of this meeting is to learn more about Bitcoin Core so feel free to ask questions whenever :)
 2817:06 <glozow> Amirreza: mempool entries are only for unconfirmed transactions.
 2917:06 <Amirreza> glozow: even unrelated to the PR?
 3017:06 <Amirreza> glozow: Got it, Thanks!
 3117:06 <ishaanam[m]> This PR addresses the issue of our fee estimator under-estimating fees because of CPFP. Fees are under-estimated because we ignore transactions with unconfirmed parents, which means that, in the case of CPFP, we only account for the lower feerate of the parent even though miners are also incentivized to include that transaction because of the the child's higher feerate, which is the one getting ignored.
 3217:06 <svav> The PR is correcting fee underestimations that arise from CPFP transactions
 3317:07 <glozow> Amirreza: all questions are fine by default, and I'll redirect the conversation if I feel like we're spending too much time off topic.
 3417:09 <larryruane> And I think the problem with fee estimation being too low is that if the user uses this too-low estimate to submit a transaction, it may not confirm as quickly as expected
 3517:09 <larryruane> (not to submit a tx, but to construct a tx)
 3617:09 <glozow> ishaanam: svav: larryruane: yes thank you! My only correction is that the underestimation doesn't arise because we ignore txns with unconfirmed parents, but because we aren't accurately assessing a transaction's "actual" feerate.
 3717:10 <glozow> What does the fee estimator do? I.e. when a user calls `estimatesmartfee`, what are they inputting and what do they expect as the result?
 3817:10 <larryruane> glozow: I did have a question on your Notes, "However, since it’s impossible to know whether a transaction will have unconfirmed descendants" ... could you elaborate on what you mean?
 3917:10 <larryruane> so when a tx first enters the mempool, obviously it can't have any descendants,
 4017:11 <Amirreza> Does the estimation is for a special txs? Isn't it for a given number of blocks? (what feerate pay to include the tx in n blocks) So what why it under-estimates the CPFP'd txs?
 4117:11 <bitcoinbassist> are fees for the child txns as part of cpfp generally overestimated (to ensure that it will make it into the block)? Could this introduce overestimation of fees?
 4217:11 <larryruane> so do you mean that if descendants arrive later (into the mempool), then we don't really "know" that there's an upstream transaction in the mempool?
 4317:11 <glozow> larryruane: i merely mean the first part, i.e. we can't predict if a tx will be spent in the future
 4417:12 <glozow> while it's unconfirmed
 4517:12 <larryruane> ok so when a tx first enters the mempool, we adjust the fee estimate based on it ... and later when (if) it gets a descendant, we don't go back and like correct it?
 4617:13 <glozow> Amirreza: The estimation is not for special txs. Yes, when a user calls `estimatesmartfee`, it inputs a confirmation target (a number of blocks) and gets a feerate.
 4717:13 <svav> estimatesmartfee - Are entering a confirmation target in number of blocks
 4817:13 <glozow> As for why it underestimates CPFP transactions, I think we'll answer that throughout the course of this meeting!
 4917:14 <glozow> larryruane: yes exactly! if a transaction later gets a descendant, we don't change the data in the fee estimator.
 5017:14 <Amirreza> glozow: the input I think is the number of blocks they want to wait for the tx to be confirmed. The output is the feerate that is more probable to be include in the given number of blocks.
 5117:14 <glozow> Let's move on to the questions about how the fee estimator works. What information is recorded for new transactions entering the mempool?
 5217:15 <glozow> (Hint: grep `mapMemPoolTxs`)
 5317:15 <Amirreza> glozow, the height of the blockchain when tx is entered the mempool
 5417:15 <glozow> Amirreza: correct, input a number of blocks, output a feerate.
 5517:15 <glozow> Amirreza: yes. and what else?
 5617:16 <larryruane> bucket index (i don't know quite what that means)
 5717:16 <Amirreza> the bucket index, indicating in which feerate bucket does this tx is.
 5817:18 <glozow> yep. so essentially, when it entered the mempool and at what feerate
 5917:18 <glozow> And what do we do when we see a transaction in a block?
 6017:18 <glozow> hint: https://github.com/bitcoin/bitcoin/blob/194710d8ff398838e4e5bb87b56e19ebed1d6c52/src/policy/fees.cpp#L596
 6117:19 <larryruane> well of course at a high level, we must remove it from the mempool
 6217:19 <Amirreza> we update the bucket for that feerate
 6317:20 <larryruane> why does `_removeTx` have that underscore prefix? Should we attach some meaning to that?
 6417:20 <Amirreza> To see how long it took for that tx in that feerate to be included in a block.
 6517:21 <glozow> larryruane: I think that's just a naming convention for a private method
 6617:21 <glozow> Does the fee estimator track any transactions that are not in the mempool, were removed, or were mined but not in the mempool beforehand?
 6717:23 <Amirreza> I think no, not for mined but not in the mempool beforehand.
 6817:23 <glozow> Amirreza: exactly. if it wasn't in our mempool, we have no idea how long it took to get confirmed.
 6917:23 <larryruane> I would say no also ... why isn't this an assert? https://github.com/bitcoin/bitcoin/blob/194710d8ff398838e4e5bb87b56e19ebed1d6c52/src/policy/fees.cpp#L611
 7017:24 <larryruane> (it's okay if you want to ignore, if too much of a side-trip)
 7117:26 <glozow> larryruane: no idea. looks like it was written 8 years ago
 7217:26 <Amirreza> Can someone explain to me what does ClearCurrent do? https://github.com/bitcoin/bitcoin/blob/194710d8ff398838e4e5bb87b56e19ebed1d6c52/src/policy/fees.cpp#L203
 7317:31 <glozow> Amirreza: tbh not 100% sure. looks like we're shifting the data on how many unconfirmed txs for each bucket.
 7417:32 <glozow> Next question: this PR detects which transactions in a block were CPFP'd. How does it do this?
 7517:32 <glozow> hint: https://github.com/bitcoin-core-review-club/bitcoin/commit/5485acfe88051234a09862819ddc2953f9d42058
 7617:33 <ishaanam[m]> When we encounter a transaction with an unconfirmed parent whose feerate is lower than its own, we drop the parent transaction.
 7717:34 <lightlike> We check if the individual feerate of a child is higher than its ancestor score. If that is a case, we remove all parents with a feerate lower than that of the child from the fee calculation.
 7817:34 <bitcoinbassist> glozow: it checks if its individual fee rate is greater than the collective fee rate of all it's ancestors?
 7917:34 <bitcoinbassist> collective being the average over all ancestors?
 8017:35 <ishaanam[m]> What is the difference between the ancestor score and the parent feerate?
 8117:36 <bitcoinbassist> looks like ancestor score is an average feerate over all ancestors? (including the txn itself?)
 8217:36 <glozow> ishaanam: the ancestor score is the aggregate feerate for a transaction and all of its ancestors. the parent feerate is the individual feerate of the parent alone.
 8317:36 <lightlike> I wonder why we do the first check, comparing with the ancestor score. Why not just drop this check and remove parents that have a lower fee than their child?
 8417:36 <glozow> lightlike: yes exactly, there are 2 conditions: individual is higher than ancestor score and the parent's individual feerate is lower than the child feerate.
 8517:37 <glozow> as for why,
 8617:37 <glozow> i think it's possible the parent is bumped by a different child?
 8717:37 <ishaanam[m]> glozow: ok, thanks
 8817:38 <lightlike> glozow: but that different child would need to be in the block too, so the parent would be removed regardless?
 8917:38 <glozow> oh i guess we should still drop the parent in that case, even if this isn't the sponsor.
 9017:38 <glozow> yeah
 9117:40 <glozow> will think about this
 9217:41 <lightlike> i think, with the current rule we'd not drop a CPFP parent if its child happens to depend on another high-paying parent (that would have been included in the block regardless of the CPFP'ing with the low-fee parent). That other parent could increase the ancestor score of the child, right?
 9317:42 <glozow> lightlike: yeah. the high-feerate parent would increase the ancestor score and we wouldn't drop the low-feerate parent. maybe not good.
 9417:43 <lightlike> glozow: yes, unless there is some other reason why we'd really need that ancestor rule. I'll ask in the PR, maybe I'm missing something.
 9517:43 <bitcoinbassist> does ancestor here mean only ancestors which are still in the mempool?
 9617:43 <glozow> i also wonder if it's too drop-happy if we always ignore transactions with a slightly-higher-feerate child
 9717:44 <glozow> another Question: does anybody have an alternate approach they'd like to share?
 9817:44 <glozow> bitcoinbassist: yes absolutely. only the unconfirmed ancestors.
 9917:45 <glozow> Also a good time to discuss the bonus question: How would you modify the fee estimator to more accurately account for CPFP fee-bumping?
10017:45 <lightlike> i think the nuclear approach would be to simply drop all txes that are parents with a child in the block. That way we wouldn't make any mistakes, but the downside is we'd have less samples for fee estimation.
10117:46 <glozow> lightlike: indeed. i wonder what % of mempool transactions have no ancestors/descendants
10217:47 <bitcoinbassist> I wonder if the check of being greater than ancestor_score is saying that the fee of the child must be sufficiently higher than its ancestors, otherwise its ancestors would have already been confirmed? (so it's likely not paying for it's parent?)
10317:47 <lightlike> glozow: Also, what % of transactions have descendants in the same block they are mined.
10417:48 <larryruane> I wonder if anyone saves (bitcoin core) mempool snapshots at each new block so that question (that glozow asked) could be answered
10517:48 <bitcoinbassist> at least one of its unconfirmed ancestors*
10617:48 <larryruane> (you know just for analytics)
10717:49 <glozow> bitcoinbassist: yes that's the idea. lower ancestor score = it must have at least one ancestor with a lower feerate
10817:50 <glozow> larryruane: i'm sure somebody does
10917:51 <glozow> Did anybody come up with an answer to "How would you modify the fee estimator to more accurately account for CPFP fee-bumping?" ?
11017:52 <lightlike> it's convenient that since all transaction that are not on the top level (=parents) are already excluded from the fee estimation anyway, so I think we might really need to take complicated parent-child relationships into account where an ancestor score helps.
11117:53 <lightlike> *NOT need to take into account
11217:53 <larryruane> glozow: I think that would be here https://github.com/bitcoin/bitcoin/pull/23074 but I can't say I understand very well
11317:55 <glozow> lightlike: hm. I think you can still definitely have a fee-bumping tx at the bottom of a n-generation chain
11417:56 <glozow> oh i think maybe that's why it has both parent and ancestor conditions. the second condition would have been "iterate through each ancestor and drop the one that's low feerate," but we only have access to parents from the `CTxMemPoolEntry`.
11517:58 <glozow> ok, last question: Why is the value for totalunconfirmed, described as a “number of txs,” equal to 1.96 here https://github.com/bitcoin-core-review-club/bitcoin/blob/200a49f7e7197cfa5ba8b8123d3597e84eab0aa1/test/functional/feature_fee_estimation.py#L323?
11617:58 <lightlike> glozow: right, but the algorithm in this PR wouldn't detect it as CPFP, right? E.g. if the Parent has one low-fee child C1, and C1 has a large-fee child C2, then we wouldn't remove the parent because we don't loop through all ancestors.
11717:59 <glozow> lightlike: right, the current approach wouldn't catch that.
11818:00 <glozow> Ok that's all the time we have for today!
11918:01 <glozow> The last question was secretly a question about how the decay works
12018:01 <glozow> left as an exercise to the reader
12118:01 <glozow> #endmeeting