Cut the validation <-> txmempool circular dependency (p2p, validation, mempool)

https://github.com/bitcoin/bitcoin/pull/22677

Host: glozow  -  PR author: glozow

The PR branch HEAD was ab3fef71e at the time of this review club meeting.

Notes

  • A circular dependency exists when two or more modules depend on each other. PR#13695 added a linter which statically analyzes files to detect circular dependencies. At the time, a few already existed and were listed in EXPECTED_CIRCULAR_DEPENDENCIES.

  • One of the circular dependencies in the codebase is validation <-> txmempool. These two modules are each heavily depended on by other modules, so their interdependency creates a “domino effect” into the rest of the codebase. To experience this for yourself, compile (with ccache), make a small edit to protocol.h (where P2P protocol messages are defined), then run make again and watch it rebuild txmempool and everything mempool-related.

  • PR#22677 removes this dependency by removing txmempool’s dependency on validation. It creates a GetFiltered() function that allows validation and other clients to pass in an arbitrary predicate, iterates through all of its entries, applies the predicate function to each of them, and returns the set of filtered iterators to mempool entries. This allows validation to apply consensus rules to all of the mempool entries without access to the mempool’s internals.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. What is a circular dependency? Why should we avoid circular dependencies?

  3. What functionality lives in validation? What functionality lives in txmempool?

  4. Why should/shouldn’t validation depend on txmempool? Why should/shouldn’t txmempool depend on validation?

  5. One of the functions that causes txmempool to rely on validation is the check() function. What checks does it perform, and how does this PR split the checks into two categories?

  6. Another function that causes txmempool to rely on validation is the removeForReorg() function, which calls CheckSequenceLocks() and CheckFinalTx(). Why do we need to re-check sequence locks in a reorg? How is it possible for a coinbase spend to become premature in a reorg?

  7. Clang Thread Safety Analysis annotations give compile-time warnings about potential race conditions. How do we let the compiler know whether cs_main is held here?

  8. Do you have any suggestions for alternative approaches to removing this circular dependency?

Meeting Log

  118:00 <glozow> #startmeeting
  218:00 <glozow> Hello friends! Welcome to PR Review Club! Feel free to say hi to let us know you're here
  318:00 <sandipndev> hi!
  418:01 <glozow> We're looking at PR#22677 today: "Cut the validation <-> txmempool circular dependency"
  518:01 <emzy> hi
  618:01 <glozow> PR: https://github.com/bitcoin/bitcoin/pull/22677
  718:01 <glozow> Notes: https://bitcoincore.reviews/22677
  818:01 <schmidty> hi
  918:02 <glozow> did anyone get a chance to look at the notes or review the PR?
 1018:02 <emzy> n
 1118:03 <sandipndev> n
 1218:03 <pg2> 0.5y
 1318:03 <glozow> no problem. we'll start conceptual: What is a circular dependency, and why should we avoid circular dependencies?
 1418:04 <pg2> Two or more modules depend on each other
 1518:04 <larryruane> hi
 1618:04 <glozow> pg2: indeed!
 1718:04 <larryruane> testing is much easier
 1818:04 <glozow> and why are circular dependencies bad?
 1918:04 <sandipndev> we can't seperately use one module individually if another is linked with it (they might recursively call each other)
 2018:05 <glozow> sandipndev: right, it's much harder to isolate and test our code if it's tangled up with a bunch of other stuff
 2118:06 <glozow> What functionality lives in the validation module? What lives in txmempool?
 2218:06 <larryruane> with circular dependencies present, it's harder to get a clear mental model
 2318:06 <pg2> valiation.{h.cpp} update local knowledge of the current best chain and corresponding UTXO set, and process new blocks
 2418:06 <pg2> txmempool.{h.cpp} store the actual transaction in the pool (class `CTxMemPool`), and metadata about the transactions (class `CTxMemPoolEntry`)
 2518:07 <glozow> larryruane: yeah, i agree with that too!
 2618:08 <pg2> Removing circular dependencies is desirable, but are there any downside or risks for this kind of refactoring? (e.g. could this break any downstream systems built on top of Bitcoin Core?)
 2718:09 <glozow> pg2: good answer. i might add (for the context of this PR) that txmempool is a "dumb" data structure that shouldn't know any details about consensus rules
 2818:09 <glozow> (that's my opnion though of course)
 2918:10 <sandipndev> i see there are a few expected circular dependencies, since circular deps are bad, why are they even present and expected?
 3018:11 <glozow> pg2: good question, I suppose forks of bitcoin core could have some trouble incorporating our changes if our architectures diverge
 3118:11 <sipa> because the codebase is old, and works, so we don't want to throw it out
 3218:12 <glozow> for context, the circular dependencies linter was added in https://github.com/bitcoin/bitcoin/pull/13695
 3318:13 <glozow> listing them explicitly probably helps us see when they're removed and be warned if someone adds a new one
 3418:14 <glozow> having a list of expected circular dependencies allows us to use the linter without ripping everything out
 3518:14 <glozow> sandipndev: does that answer your question?
 3618:15 <sandipndev> yes, absolutely!
 3718:15 <glozow> cool! :)
 3818:15 <glozow> So, in case it wasn't clear, we currently have a circular dependency between txmempool and validation.
 3918:16 <glozow> There would theoretically be 2 ways to remove this: make txmempool not depend on validation anymore, or make validation not depend on txmempool anymore
 4018:16 <glozow> Why should/shouldn’t validation depend on txmempool? Why should/shouldn’t txmempool depend on validation?
 4118:17 <pg2> Validation should depend on txmempool, because we need to know what is in the mempool to validate new transactions
 4218:17 <glozow> pg2: very logical answer, I agree :D
 4318:17 <pg2> Txmempool shouldn't depend on validation, because the transactions in mempool are already validated (and therefore donot need to depend on anything in validation.{h.cpp}).
 4418:18 <glozow> i also think an argument could be made for neither depending on the other - you don't really need to know about the existence of a mempool in order to apply validation rules to a transaction
 4518:19 <pg2> glozow: thanks for your answers. after this PR, is txmempool "complete dumb", or "dumb enough"? Or there are places where it still knows about consensus rules (unnecessarily)?
 4618:20 <glozow> pg2: ooh good question
 4718:21 <glozow> since i've already started saying possibly controversial things... another thing the mempool probably doesn't need to be responsible for is the fee estimator
 4818:21 <_aj_> glozow: "neither depending on the other" -- isn't validation the *action* of accepting a block (which means the txs in the block should no longer be in the mempool since they're no longer valid on top of the new tip) ; the consensus rules themselves are in consensus/ and script/ and the like?
 4918:22 <_aj_> (also the action of accepting a tx into the mempool)
 5018:23 <glozow> _aj_: ah true, i completely agree. I guess in general I mean that consensus rules themselves (i.e. consensus/ and script/) and the mempool are not conceptually related
 5118:24 <glozow> but yes, the action of validating transactions would depend on both
 5218:26 <glozow> OK next question. One of the functions that causes txmempool to rely on validation is the `check()` function. What checks does it perform, and how does this PR split the checks into two categories?
 5318:27 <pg2> For every transaction input in mempool, `check()` checks if the input refers to any other mempool transactions, otherwise it checks if the previous transaction output is unspent.
 5418:28 <pg2> I don't find anywhere in `check()` checks the sender has enough balance to cover the sum of output amounts. Is there such a check somewhere?
 5518:29 <glozow> pg2: see `CheckTxInputs` in src/consensus/tx_verify
 5618:29 <pg2> glozow: thanks
 5718:29 <glozow> https://github.com/bitcoin/bitcoin/blob/2161a058552ac938f2079b311a2d12f5d1772d01/src/consensus/tx_verify.cpp#L201-L206
 5818:30 <glozow> pg2: yes, that's one thing that `check()` does. what else?
 5918:30 <glozow> (oops - the link I sent has the wrong line numbers. it should be a few lines above, the `bad-txns-in-belowout` check)
 6018:33 <pg2> glozow: that's the only part where I finished reading the code, that's why I can only give a partial answer
 6118:33 <glozow> no worries. just seeing if anyone is willing to answer the rest :)
 6218:34 <glozow> so I've (arbitrarily) split the `check()` assertions into 2 categories: contextual checks and internal consistency checks. Contextual is what pg2 already mentioned - we go through and make sure all the transactions' inputs refer to something available in our mempool or UTXO set.
 6318:36 <glozow> Internal consistency checks are things like making sure the ancestor/descendant counts of each entry add up correctly, checking that our total fee accounting and dynamic memory usage are accurate, etc.
 6418:36 <glozow> It may or may not make sense to split `check()` up this way; up to you as a reviewer
 6518:37 <glozow> Moving on to next question: Another function that causes txmempool to rely on validation is the `removeForReorg()` function, which calls `CheckSequenceLocks()` and `CheckFinalTx()`. Why do we need to re-check sequence locks in a reorg?
 6618:38 <larryruane> Yes, I think this conceptual distinction between contextual and non-contextual checks appears also with respect to blocks
 6718:39 <larryruane> glozow: because time is "backing up" so a tx that used to be valid is no longer valid (yet)
 6818:40 <glozow> larryruane: right! but... how is it possible for a coinbase spend in the mempool to become premature in a reorg? Don't we only reorg when we see a longer chain?!
 6918:41 <larryruane> a reorg could result in a shorter chain, because it may have more _work_
 7018:41 <glozow> larryruane: aha!!!!
 7118:43 <_aj_> the first block in a multi-block reorg will be lower than the last block before the reorg even if difficulty doesn't change? [A B C D] -> [A E F G H], E has lower height than D?
 7218:44 <_aj_> (or are we only readding txs after processing all the reorg blocks somewhere?)
 7318:45 <sipa> _aj_: i believe so
 7418:46 <glozow> _aj_: righto. i think in a multi-block reorg we fill up a disconnectpool and then call `removeForReorg` after
 7518:46 <glozow> not 100% sure tho?
 7618:47 <glozow> will check
 7718:51 <glozow> next question is about lock annotations. what are we saying to the compiler when we annotate a callable with `EXCLUSIVE_LOCKS_REQUIRED`?
 7818:55 <_aj_> glozow: (hmmmm, i think i had a bug way back whenever when i was investigating something related to this then!)
 7918:56 <glozow> _aj_: (i am... about 16% confident in that statement)
 8018:58 <glozow> hok so the answer to my previous question about lock annotations, the answer is here: https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#requires-requires-shared
 8118:59 <glozow> the rest of the questions are left as an exercise to the reader
 8218:59 <glozow> thanks for coming!
 8318:59 <glozow> #endmeeting