Bitcoin Core PR Review Club

Txmempool -/-> validation 1/2: improve performance of check() and remove dependency on validation (mempool)

Oct 6, 2021

https://github.com/bitcoin/bitcoin/pull/23157

Host: jnewbery - PR author: glozow

The PR branch HEAD was 082c5bf099 at the time of this review club meeting.

We recently looked at PR #22677, which cuts the circular dependency between validation and txmempool. This week’s PR was split off from that PR, taking a slightly different approach to simplify the CTxMemPool::check() function.

Notes

The CTxMemPool class is a data structure that stores the set of unconfirmed transactions, along with various methods to update or query that set. As well as the transactions themselves, the mempool stores metadata about the transactions (such as ancestor/descendant dependencies) that makes it highly optimized for various operations, such as constructing blocks.
The mempool must only contain non-conflicting, valid transactions which could theoretically be included in the next block. There are therefore many invariants such as:
- a transaction may only appear in the mempool if all of its inputs are either in the UTXO set or are the outputs of other mempool transactions
- no two transactions in the mempool may spend the same output
- the {descendant|ancestor} {fees|count|size} cached for the transaction must be the correct value
There are several public methods that can alter the contents of the mempool. Those public methods can be called in many different circumstances, such as:
- removing transactions when a block is connected
- expiring old transactions or evicting transactions by feerate when limiting the mempool size
- replacing transactions for RBF
- re-inserting transactions from disconnected blocks during a re-org
Maintaining the mempool’s invariants during all of those operations is very delicate, and failure to do so can lead to subtle bugs such as those fixed in PR #2876 or PR #5267.
For that reason, PR #2876 introduced a CTxMemPool::check() method, which asserts many of the mempool’s invariants. Later, PR #5267 extended those checks. The checks are computationally expensive, so the -checkmempool command line option can control how frequently they’re run. By default, mempool checks are always disabled for mainnet, signet and testnet nodes, and always enabled for regtest nodes.
This PR significantly improves the performance of CTxMemPool::check(), and removes a dependency from that function to validation.h.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?
Did you run the bench tests before and after the changes to CTxMemPool::check()? Were your results the same as reported in the PR?
Is it important for these checks to be fast? Why/why not?
What does the UpdateCoins() function do? What is it replaced with in this PR?
What was waitingOnDependants used for before this PR? Explain what this while loop was doing. Why can it now be removed?
How does the GetSortedDepthAndScore() function work? Why does the returned vector not contain any descendants before their ancestors?
GetSortedDepthAndScore() uses the CTxMemPoolEntry’s cached ancestor count to sort the transactions. What would happen if those cached values were incorrect?

Meeting Log

17:00

<jnewbery> #startmeeting

17:00

<jnewbery> Hi folks! Welcome to Bitcoin Core PR Review club!

17:00

<jnewbery> Feel free to say hi to let people know that you're here.

17:00

<raj> hi..

17:01

<theStack> hi!

17:01

<svav> Hi

17:01

<KaizenKintsugi> hello!

17:01

<gene> hi

17:01

<KaizenKintsugi> This is my first time here so nice to meet everyone

17:01

<Azorcode> Hello Everyone

17:02

<jnewbery> KaizenKintsugi: Welcome! We love new review clubbers :)

17:02

<schmidty> hi

17:02

<jnewbery> anyone else here for the first time?

17:02

<KaizenKintsugi> jnewbery: ty

17:02

<jnewbery> (there are some tips here for first timers: https://bitcoincore.reviews/your-first-meeting)

17:02

<larryruane> hi!

17:02

<KaizenKintsugi> I have read

17:02

<jnewbery> Notes and questions for this week are here at https://bitcoincore.reviews/23157

17:03

<jnewbery> Who had a chance to read the notes / review the PR? (y/n)

17:03

<gene> y/y

17:03

<raj> y

17:03

<theStack> y/0.5y

17:04

<jnewbery> lots of 'y's. Great!

17:04

<jnewbery> ok, let's get into it. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

17:04

<raj> Concept ACK. Yet to run the benches..

17:05

<raj> Also seems like a very clever approach to reduce computation load..

17:06

<jnewbery> did anyone run the bench?

17:06

<gene> tested ACK

17:06

* gene ran bench

17:07

<larryruane> y

17:07

<jnewbery> gene: great. I see you posted your results to the PR as well. Thank you!

17:07

<jnewbery> larryruane: excellent. I see you also posted your results. 18x is a pretty good optimization.

17:07

<jnewbery> ok, next question. Is it important for these checks to be fast? Why/why not?

17:08

<gene> performance gains are definitely good

17:08

<raj> Because if the checks are fast enough, we can do it more often? Or maybe set a default interval for mainnet and not have it disable by default?

17:09

<jnewbery> gene: I agree that ceteris paribus, more performant is better

17:09

<schmidty> Encourages running the tests if they are faster :)

17:09

<jnewbery> schmidty: I agree. I think that's the most compelling justification

17:09

<jnewbery> Does anyone know when CTxMemPool::check() is run?

17:10

<larryruane> If this check is enabled via `-checkmempool`, does that mean the check code runs on each and every modification to the mempool? I would think that's too much for mainnet (even with these improvements)

17:10

<KaizenKintsugi> It looks like it is run on an interval?

17:11

<gene> by default turned off for all but regtest?

17:12

<jnewbery> larryruane: I agree that running on mainnet is probably not what we want to do. Even with the performance gains, I think it'd add too much load (and we maybe don't want to be adding asserts to nodes running mainnet)

17:12

<jnewbery> gene: right, only on by default for regtest

17:13

<jnewbery> KaizenKintsugi: not on an interval. Only after certain events

17:13

<KaizenKintsugi> ty

17:14

<raj> jnewbery, I can't seem to find the caller of CTxMemPool::check(), any pointer on that?

17:14

<jnewbery> It's actually only called in two places from net_processing, and once in validation. If you grep for ".check(" or ">check(" you'll find them

17:15

<theStack> one place seems to be the reception of a tx that was successfully accepted to the mempool

17:15

<jnewbery> theStack: correct. And the other one in net_processing is after processing an orphan transaction. The call from validation is at the end of ActivateBestChainStep()

17:16

<raj> jnewbery, Thanks..

17:16

<jnewbery> but again, these only actually do anything if checks are enabled, which only defaults to on for regtest

17:17

<jnewbery> Did anyone try running the functional tests? I wonder if these changes make any difference to runtime (possibly not, since the mempools in the functional tests are mostly empty)

17:17

<jnewbery> ok, next question. What does the UpdateCoins() function do? What is it replaced with in this PR?

17:18

<raj> UpdateCoin marks the input of tx as spent, and update the coinsview with that transaction.

17:19

<raj> After this PR we just spend the coins from mempoolDuplicate directly.

17:19

<jnewbery> raj: exactly right

17:20

<jnewbery> UpdateCoin also populates a CTxUndo reference, which we don't need in the mempool check call

17:20

<jnewbery> next question. What was waitingOnDependants used for before this PR? Explain what this while loop was doing. Why can it now be removed?

17:21

<raj> waitingonDependant seems like just a list to store txs whose parent's haven't been validated yet.

17:22

<raj> The loop iterates over the waitline, check if the inputs are in mempoolDuplicate, if yes apply the tx, if not put it back into the list.

17:22

<jnewbery> raj: exactly right

17:22

<jnewbery> so what's the worst case here?

17:23

<raj> That none of the inputs are validated? and the tx has a long ancestry list?

17:25

<jnewbery> check() is iterating over the entries in mapTx, which may be in any order. Say my mempool contains 5 transactions A -> B -> C -> D -> E. What's the worst order they could be in?

17:25

<KaizenKintsugi> I will guess reverse?

17:25

<raj> E D C B A?

17:26

<gene> ^

17:27

<jnewbery> right, so in the first iteration, we'll go through all the transactions and add (E,D,C,B) to the waitingOnDependants list, and then verify A

17:27

<larryruane> jnewbery: can you clarify the meaning of the arrows .. is A the parent of B?

17:27

<jnewbery> then in the while loop, we'll iterate over E,D,C and process B, then iterate over E,D and process C, and so on, then iterate over E and process D, and finally process E.

17:27

<KaizenKintsugi> larryruane: that is how I interpret it

17:28

<jnewbery> larryruane: apologies. B spends A's output (A is the parent of B)

17:28

<KaizenKintsugi> I assume we want to sort first, then process

17:28

<jnewbery> KaizenKintsugi: excellent idea! What do we sort by?

17:29

<raj> Number of ancestor, in ascending order. :)

17:29

<KaizenKintsugi> I will guess the amount of dependents, I read in the git comments that we want to sort by "topological order" but I dont know what that means

17:30

<larryruane> KaizenKintsugi: run `man tsort`

17:30

<KaizenKintsugi> I think it has something to do with a graph

17:30

<larryruane> (actually `info tsort` is better)

17:31

<jnewbery> raj: exactly right. What does CTxMemPoolEntry.nCountWithAncestors cache?

17:31

<KaizenKintsugi> larryruane: thanks

17:31

<raj> KaizenKintsugi, https://www.geeksforgeeks.org/topological-sorting/

17:32

<theStack> larryruane: heh, nice. TIL a new unix command

17:32

<raj> jnewbery, assuming you are asking what it does, it simply stores the number of ancestor including the tx itself?

17:33

<jnewbery> raj: correct

17:33

<jnewbery> and so if we sort by ascending ancestor count, then we guarantee that no child appears before its parent. Does that make sense to everyone?

17:34

<KaizenKintsugi> are the number of ancestors strictly in the mempool? or do these ancestors trace back to the coinbase?

17:34

<raj> parallel question, why is the ancestor count is termed as topology? Its just a linear graph right?

17:34

<jnewbery> KaizenKintsugi: an 'ancestor' here is an unconfirmed transaction that's in the mempool

17:34

<KaizenKintsugi> raj: I think it can branch

17:35

<KaizenKintsugi> jnewbery: ty

17:35

<gene> raj: you could also see the graph as a tree, with ancestors being parent nodes

17:35

<raj> But isn't a two transaction not suppose to have same parent? Thats a double spend..

17:36

<jnewbery> raj: sorting by ascending ancestor count gives a partial ordering. The output of the topo sort is a total ordering that doesn't violate any of the partially ordered pairs

17:36

<jnewbery> raj: two transactions can have the same parent, if that parent created two transaction outputs

17:36

<gene> ^

17:37

<raj> jnewbery, yes, but the inputs would be different (vout value). So that would still be considered as same parent?

17:37

<gene> but if you meant two parents, then yeah that would be double spend

17:37

<jnewbery> raj: yes, that's the same parent transaction

17:37

<raj> oh ok.. Thanks..

17:38

<jnewbery> gene: no, a transaction can have inputs from multiple transactions. In that case, it has multiple parents

17:38

<gene> :) you're right

17:38

<KaizenKintsugi> ah yes, we can have transactions with N inputs and M outputs

17:38

<gene> so double spend == multiple parents with same UTXO input?

17:39

<jnewbery> a double spend is where two transactions are spending the same transaction output

17:39

<jnewbery> We've mostly done the next question, but perhaps someone could add some more detail. How does the GetSortedDepthAndScore() function work? Why does the returned vector not contain any descendants before their ancestors?

17:40

<raj> The sort works by, first sorting by ancestor count in ascending order, then by fee in descending order, then by hash value in descending order.

17:41

<raj> The last two parts are termed as "score", and the first part is termed as "depth".

17:41

<jnewbery> raj: right. We actually only care about the ancestor count for our purposes here

17:42

<jnewbery> but this SortedDepthAndScore() is also used elsewhere

17:42

<theStack> hope to not being to off-topic, but any specific reason why the hash value is included as third criteria for sorting? i can't think of a scenario where this really matters

17:42

<sipa> to make it unambiguous

17:42

<jnewbery> theStack: just as a tie-break I expect

17:43

<theStack> ok, makes sense

17:43

<raj> theStack, the sort will only take place when there's a tie in previous sort.

17:43

<larryruane> theStack: probably so the results are deterministic, may make testing easier (direct comparison of results to expected)

17:43

<raj> as far as I understand.

17:43

<sipa> what other tie breaker would you use? memory addresses? that might be a privacy leak even

17:43

<jnewbery> it's possible for depth and fee to be identical for two mempool transactions. txid breaks that tie

17:44

<jnewbery> ok, final question. GetSortedDepthAndScore() uses the CTxMemPoolEntry’s cached ancestor count to sort the transactions. What would happen if those cached values were incorrect?

17:45

<raj> jnewbery, the sort will not be correct, which will lead to trying validation of descendant before parent, and that will cause some error?

17:45

<gene> txs with chained inputs might fail

17:45

<raj> Also there is an sanity assertion check, that seems like it would fail.

17:45

<jnewbery> raj: yes, exactly right. https://github.com/bitcoin/bitcoin/blob/66d11b14357c474416181227e197406ca8fb5dee/src/txmempool.cpp#L750

17:46

<jnewbery> part of check() is checking that those cached values are correct. If any of those are wrong, then the mempool will be indexed incorrectly, which could lead to all kinds of problems

17:46

<jnewbery> ok, those were the only questions I'd prepared. Did anyone else have any other questions or comments?

17:47

<raj> as we are not doing the check() by default, so in regular cases we are just hoping that these kind of bad stuffs of mempool inconsistency doesn't happen?

17:47

<raj> What does the node do when it does happen? will i see a crash?

17:48

<jnewbery> we run check() by default in the functional tests. It's pretty usual to have these kinds of sanity checks enabled in test and disabled in production

17:48

<gene> how many fuzzers target the mempool code?

17:48

<jnewbery> gene: good question! The code is in src/test/fuzz/tx_pool.cpp

17:49

<gene> jnewbery: thanks!

17:49

<raj> jnewbery, ya makes sense. so its more like consistency for the protocol, not a running node. would that be a correct way to put it?

17:49

<raj> *consistency check

17:50

<jnewbery> I wouldn't use the word protocol there, since the mempool is an internal implementation detail.

17:50

<jnewbery> it's a consistency check that we run in our tests, but not (by default) in production

17:50

<raj> understood.. thanks..

17:51

<gene> jnewbery: so if another node impl has different mempool code, would that cause any net/consensus issues?

17:51

<larryruane> so this PR modifies checking code ... that could possibly break it ... Anyone have ideas on how to check the checking code?

17:51

<raj> just curious, in case of some other impls (say btcd) would the consistency check logic look same? Or thats dependent on mempool implementation?

17:51

<jnewbery> gene: no! That's basically what I mean by implementation detail

17:52

<jnewbery> nodes are free to implement whatever mempool they want, and whatever policy rules for accepting transactions into their mempool that they want.

17:52

<gene> awesome, that is counterintuitive. good to know

17:52

<jnewbery> I expect at some point pretty soon, we'll have a -nomempool option that disables the mempool entirely

17:52

<KaizenKintsugi> yea this is a surprise to me too

17:53

<larryruane> jnewbery: yes, I thought I'd heard that miners likely have made proprietary changes to the mempool code (even if running core), but we don't know for sure

17:53

<jnewbery> larryruane: very good question! Anyone have any suggestions on how to check the checking code?

17:53

<gene> run against test vectors from other impls

17:54

<jnewbery> gene: other impls won't have the same mempool code

17:54

<jnewbery> the mempool is an implementation detail

17:54

* gene facepalms

17:55

<raj> ya this was the biggest surprise to me too. Took some time to absorb.. :D

17:55

<larryruane> if i may offer an answer ... you must break the code and verify that the checking code catches it

17:55

<jnewbery> larryruane: one way might be to introduce changes to the mempool code itself, and then see if both the new check and old check assert

17:55

<jnewbery> snap!

17:55

<larryruane> (obviously this is manual testing, no need to automate this!)

17:56

<gene> automated mutations could be intereting test code

17:56

<jnewbery> yes, this is called mutation testing: https://en.wikipedia.org/wiki/Mutation_testing

17:57

<sipa> mutation testing led to the discovery of a 20% faster variant of the safegcd modular inverse algorithm ;)

17:57

<jnewbery> sipa: fun fact!

17:57

<larryruane> oh so you _can_ automate this type of testing!

17:57

<sipa> jnewbery: very fun

17:57

<raj> Whats the difference between mutation and fuzzing?

17:58

<BlockHead> I'm assuming mutation tests are a type of fuzzy test?

17:58

<jnewbery> raj: mutation is mutating the source code. Fuzzing is providing random inputs to the code

17:58

<jnewbery> ~random

17:58

<sipa> (or better than random)

17:58

<gene> fuzzing dictionaries are nice!

17:58

<jnewbery> right, not at all random in fact :)

17:58

<raj> oh.. so its like the code text itself is being changed?

17:59

<sipa> and mutation testing is a way of testing *the tests*

17:59

<sipa> fuzz testing is testing the code

17:59

<sipa> (at least the forms i'] familiar with)

17:59

<jnewbery> raj: exactly. Break the code in lots of different ways and test that the tests catch them

18:00

<raj> Got it.. thanks..

18:00

<jnewbery> This was a fun tangent. Thanks larryruane!

18:00

<jnewbery> ok, that's time

18:00

<jnewbery> #endmeeting