Skip this section if you are just interested in reviewing the code and donât need to hear the pitch for why.
The primary goal of Cluster Mempool
(tracked here) is to enable the mempool to more
intelligently assess which transactions would be most incentive compatible to keep. In simple terms: âwhich transactions
would earn a miner the most in fees?â The complexity of incentive compatibility assessment comes from dependencies
between transactions, i.e. where a transaction spends the output of another.
If the motivations for this project are unclear to you, try to spend up to 60min on these reading materials:
âBut arenât fees a trivially small portion of block rewards? Why do we care so much?â
An original motivation for trying to come up with a metric for âincentive compatibilityâ was within the context of
RBF pinning. Our
current feerate-based approach causes us to reject transactions that would be more incentive compatible, creating
censorship problems.
âWhy is cluster-tracking the solution for incentive compatibility?â
This Delving post explains why incentive
compatibility is complicated and why tracking and limiting clusters is necessary for a complete solution.
We talked about clusters in previous review club meetings about auto-bumping unconfirmed UTXOs in coin selection
(which added CTxMmemPool::GatherClusters) and Topologically Restricted Until Confirmation transactions. As we discussed,
one of the benefits of TRUC is to have an effective cluster limit within the existing framework of
ancestor/descendant limits (limit of cluster count 2 is the equivalent of limiting ancestor and
descendant counts to 2). However, a limit of 2 is quite prohibitive.
Mempool transactions and their dependency relationships can naturally be represented as a directed graph
where nodes represent transactions and edges represent UTXO spending relationships. This graph may contain multiple connected
components, each called a cluster. A transactionâs cluster includes all of the transactions it is connected to, which
can be defined recursively: all parents and children (nodes with an edge to or from this one), each of their parents and children, etc.
Cluster Mempool includes some large work components, including (1) adding cluster linearization algorithms which are
used to order (and thus compare) transactions by incentive compatibility, (2) changing the mempoolâs internal structure
to enable cluster tracking and always maintain a linearization order, and (3) upgrading RBF, eviction, and mining to
use the improved incentive compatibility assessments. This PR is pretty much only relevant to the second component; you
can review it without knowing anything about the cluster linearization algorithm or specific RBF
rules. The code for the first component has been merged (see PR #30126
and its followups), and work-in-progress code for the third component can be found at PR #28676).
Here is everything you need to know about linearization for this PR:
A linearization is topologically valid if it ensures that a parent transaction appears somewhere before its child.
There are often multiple valid linearizations for the same cluster, and linearization L1 is better than L2 if taking the
first N bytes of L1 would yield higher fees than taking the first N bytes of L2. The optimal linearization of a
cluster is better or as good as all the others.
Computation needed for linearization scales with the size of the cluster, which is why their size needs to be limited.
A cluster requires re-linearization each time a transaction is added or removed.
If you have more time to spend on background concepts or need more convincing, here is additional optional reading material.
BIP 431 provides an overview of why RBF pinning
is problematic and some of the other pieces that come together to address its issues.
This Delving post provides an overview of the
algorithms used in cluster linearization.
This Delving post discusses how to evaluate
replacements using feerate diagrams, which are enabled with cluster mempool.
Background on Mempool Architecture and Interfaces
Skip this section if you are already familiar with current mempool code.
âThe Mempoolâ usually refers to the class CTxMemPool.
This class holds a map of
CTxMemPoolEntrys
and other supporting data structures. It is responsible for keeping itself
internally consistent. Much of the code is dedicated to dealing with dependency relationships.
Outside of the CTxMemPool class, here are the major pieces of mempool functionality to think about
while reviewing cluster mempool:
Mempool validation
decides what unconfirmed transactions to keep in memory. This code also enforces transaction relay policies.
Block connection
removes mempool transactions that would be inconsistent with chainstate.
Block disconnection
(aka reorganization) resubmits the transactions from the previous chain that are no longer in the new chain.
Block assembly
(aka mining) fills a block template using mempool transactions, optimizing for fee revenue.
Mempool eviction
is an internal process that decides what transactions to delete when the memory usage exceeds configured limits. It is
triggered in multiple places, including Mempool validation and Block disconnection.
On master, Bitcoin Coreâs mempool does not track clusters, but it does track ancestor and descendant sets.
CTxMemPoolEntry
stores references to its parents and children and tracks the {count, total vsize, total fees} of its ancestor and
descendant sets. Each time a transaction is added or removed, CTxMemPool uses the references to traverse ancestors
and/or descendants recursively, updating the cached ancestor and descendant information.
Design and Usage with Existing Mempool
While the current (on master) CTxMemPool class does not contain a data structure dedicated to storing the graph representation
itself, âthe graph already existsâ in the form of CTxMemPoolEntryâs parent/child references and cached
ancestor/descendant information.
The responsibility of keeping this graph up-to-date and within its specified limits largely belongs to CTxMemPool, but
occasionally bleeds out to other areas. For example, in mempool validation, MemPoolAccept queries CTxMemPool for the
ancestors of a transaction in order to:
Calculate whether this transaction would exceed the ancestor/descendant limits. However, sometimes, when evaluating a
replacement, we apply an RBF Carve Out,
literally increasing the limit by 1, to avoid double-counting the replacement and to-be-replaced transactions. Yuck!
When evaluating a package, the precise ancestor/descendant counts of each package transaction and connected mempool
transactions are not calculated before submission. Instead, the aggregate package count and
size are used. This
approximation avoids complexity, but can result in overestimation for certain topologies.
These examples also illustrate current accuracy problems with enforcing topology limits.
Design and Usage with TxGraph
PR #31363
adds a TxGraph class, used by CTxMemPool to encapsulate the transaction graph. The
PR description provides an
overview of its interface and implementation. TxGraph supports âmainâ and âstagedâ changes:
transaction additions and removals can be staged and later discarded or applied. Additionally,
removals and additions are applied lazily, allowing them to be batched.
You can view the full cluster mempool implementation PR to see the usage of TxGraph
and how these new functions are used in the surrounding logic and contrast them with the existing code:
This commit
makes CTxMemPoolentry derive from TxGraph::Ref.
Mempool validation stages the addition of the new transaction(s) to a changeset,
which stages changes to the mempool TxGraph. To check cluster limits, it just
queries
whether it is oversized. It no longer needs to know what the ancestor/descendant limits are or apply carveouts in a
replacement.
RBF incentive compatibility simply entails
asking
the TxGraph if the staged subgraphâs feerate diagram improves upon the original feerate diagram.
PrioritiseTransactionnow
just calls TxGraph::SetTransactionFee() and expects it to handle everything from there.
Before,
it needed to iterate through all ancestors and descendants to update their cached fees.
Additionally, prior to being queried for oversize or feerate diagrams, no cluster calculation and linearization is done.
Taking a look at PR #28676 can help illustrate some of the benefits of having TxGraph, including:
The mechanical details of updating the dependency graph and its impact on incentive compatibility are abstracted away
from higher level mempool logic.
CTxMemPoolâs code is less cluttered with graph-wrangling logic. Lots of code is removed, and the much of the
remaining ancestor/descendant tracking can be cleaned up in the future.
TxGraph doesnât know that it represents a mempool. It doesnât know what a transaction is, and
doesnât care whether elements are being added and removed for replacements or block (dis)connection. It has a
smaller scope and well-specified behavior.
We now have the ability to reason accurately about the addition of multiple transactions and removal of their
collective replacements.
Batched updates and linearization improve the overall performance of mempool operations.
Deferred linearization also can allow for slower linearization improvements to be applied later, e.g. in the
background, giving us the benefits of optimal linearizations without sacrificing performance.
Implementation
The rest of the notes are about reviewing the PR itself. Advice on review approach:
The author has written recommendations for reviewing the code in this comment.
The PR is largely split into feature and optimization commits. The features are introduced iteratively, so it
may be easiest to review the PR in chunks of commits for each feature if the overall changes seem intimidating.
Another approach is to first read through the overall changes to TxGraph and TxGraphImplâs
respective interfaces, and then review implementation. You could also review the interface, then
the fuzzer implementation, then the real implementation.
These notes are split by feature to make it easier to review the PR iteratively.
A TxGraph has a vector of Entrys (where each entry is a single transaction), and a vector of Clusters (which
represent clusters as defined earlier). Clients can AddTransaction, RemoveTransaction, AddDependency, and
SetTransactionFee. When a transaction is added, it starts in a cluster of its own. When AddDependency is called for
two transactions that are not already in the same cluster, their clusters are merged.
It also has queues of âworkâ to do which may or may not need to be completed in order to perform certain tasks or answer
external queries. Some tasks have other tasks as prerequisites.
This is the order in which things have to be resolved:
ApplyRemovals: Apply all of the queued removals.
SplitAll: Split up clusters if they contain components that are no longer connected to each other.
GroupClusters: Calculate what clusters would need to be merged if dependencies are applied. However, these changes are not yet applied.
ApplyDependencies: Apply new dependencies and merge clusters as computed in the previous step.
Relinearize: Relinearize clusters.
Much of the work is done lazily. For example, calling RemoveTransaction does not actually mean its etry is deleted, just that
the index has been added to m_to_remove. However, TxGraph cannot answer a call to Exists without first calling ApplyRemovals.
An OversizedTxGraph is one that has at least one cluster that violates the cluster count or size limits.
A TxGraph only knows if it is oversized after it has completed steps 1-3.
When a TxGraph is oversized, mutation functions (like RemoveTransaction and AddDependency) and
GetIndividualFeerate can still be called, but inspection functions like GetChunkFeerate are unavailable because we wonât
merge or linearize clusters that are oversized. (GetChunkFeerate becomes GetMainChunkFeerate in a later commit).
Additionally, step 4 will not proceed if the TxGraph is oversized; step 5 is thus also impossible.
The GroupClusters function implements union-find, a
favorite amongst software engineering interviewers.
This feature allows users of TxGraph to stage changes and then apply or discard them, much like git.
Much of the TxGraph members are grouped together as a ClusterSet. This refactor is done so that a TxGraph can have multiple subgraphs or levels represented as
ClusterSets: âmainâ (level 0) and âstagedâ (level 1).
By default, mutations like AddTransaction, RemoveTransaction, AddDependency are applied directly to main, but
clients of TxGraph can create a staging session:
To StartStaging,
a new subgraph is added containing copies of information from the main graph. It starts with no clusters and no
transactions.
To CommitStaging, everything is copied from staged to main after finding conflicting clusters and deleting them from
main. Conflicting clusters include those belonging to all staged removals and any transactions the subgraphs have in common.
Then, clusters, work queues, and statistics are copied from staged to main.
AbortStaging largely consists of simply deleting the staged subgraph.
There are various functions added to TxGraph to obtain specialized information.
It may be useful to look at the code surrounding mempool to understand why:
CountDistinctClusters and GetMainStagingDiagram are used in RBF logic?
GetBlockBuilder is used in block assembly?
GetWorstMainChunk is used in mempool eviction?
Trim is used in reorgs? It instructs TxGraph to delete transactions until it is no longer oversized,
and allows it to decide which transactions to evict.
What is the mempool âgraphâ and to what extent does it exist in the mempool code on master?
What are the benefits of having a TxGraph, in your own words? Can you think of drawbacks?
Can TxGraph help to more accurately calculate (and thus enforce) topology limits, or is it a just refactor?
Basic TxGraph, Lazy and Oversized
What is the difference between TxGraph and Cluster?
What does it mean for a TxGraph to be oversized? Is that the same as the mempool being full?
If a TxGraph is oversized, which functions can still be called, and which ones canât?
Under what circumstances could GetIndividualFeerate and GetChunkFeerate return an empty FeeFrac?
Is it possible that one returns an empty FeeFrac and the other doesnât? How?
(Note we are talking about pre-staging support code).
What are the 4 mutation functions? Which ones necessitate relinearization?
After calling TxGraph::RemoveTransaction, has the transaction been removed from m_entries? If not, what happens
instead, and when are transactions removed from m_entries?
Why canât ApplyDependencies continue if the TxGraph is oversized?
Why isnât Exists a const function like one would expect?
What is the benefit of processing removals, dependencies, merges, and linearizations lazily?
Why is AddTransaction annotated with [[nodiscard]]?
For the QualityLevels NEEDS_RELINEARIZE, ACCEPTABLE, and OPTIMAL, what can / canât {TxGraph internally, a
client of TxGraph} do with a cluster that is linearized to that quality?
Staging Support
What is the purpose of TxGraph supporting the staging of update operations? In what situation
would we use it, and do you think the benefits warrant the extra code complexity?
What are the TxGraph functions analogous to git add, git commit, and git stash? In the
other direction, what is analogous to AddTransaction, RemoveTransaction, and AddDependency?
What does PullIn do, and when is it necessary? How many times can the while (level < to_level) loop run?
In the commit to âcache oversizedness,â why is it ok that the m_oversized value for staged is set to that of main? Why is calling SplitAll() necessary?
How many levels are possible?
Which of the four combinations are possible {main, staged} {oversized, not oversized}?
Specialized Functionality
Why are these specialized functions (CountDistinctClusters,
GetMainStagingDiagram, GetBlockBuilder, GetWorstMainChunk, Trim) necessary? Can you think
of a slimmer interface to serve these purposes?
What are all of the places where the mempool may need to evict transactions in order to stay
within cluster limits?
Should TxGraph be allowed to decide which transactions to evict when it is oversized? Why or
why not - does it depend on the situation?
<sipa> marcofleon: well, arguably txgraph doesn't change that... there is still just the set of ancestors and descendants for each transaction there, but it's better encapsulated, and uses more efficient data structures
<marcofleon> Just more structured and better encapsulated as sipa said previously. Makes it easier to reason about. Also the atomic operations with staging seems like a good approach. I'm not sure if we do something like that today?
<stringintech> First thing I noticed when looking at the code was how relatively easy it was to understand things. Even not knowing the previous logic.
<abubakarsadiq> many benefits, we now group connected transactions into clusters, we maintain the list of this clusters, which make it easy for getting ancestors of transaction, linearization, maintaining cluster size, mining and eviction
<sipa> as for comparison with *today*, i think the answer is primarily that it brings in all of the cluster mempool goodness: chunk feerates, linearizations, the ability to observe a full ordering of transactions in the mempool
<sipa> as for comparison with a theoretical alternative where cluster mempool was brought it, without an intermediary layer that models the transaction graph: encapsulation, and the fact that it's fully specified (almost...) which significantly increases the power of tests for it
<glozow> I really like that it abstracts away all the topological details so e.g. validation isn't trying to figure out how to not double-count replacements and combine package/mempool ancestors etc
<marcofleon> 0 drawbacks for sure... no i guess some overhead with all the additional data structures? but I think the encapsulation and the fact that its much easier to test makes it worth it
<stickies-v> wrt drawbacks: I suppose encapsulation sometimes doesn't allow you to make certain (hacky) performance optimizations without changing the interface?
<sipa> jasan: in the PR description i state that TxGraph doesn't know about prioritization; that doesn't mean it's incompatible with it - the point is just that it doesn't *care* why the mempool things a particular transaction has a particular fee, but it's perfectly possible for the mempool to apply prioritization before handing fees to txgraph (and will)
<sipa> monlovesmango: there is indeed some overhead, though it's tiny (and probably well-compensated by using more efficient data structures internally)... the fact that transaction references always need to be translated from/to TxGraph::Ref* pointers is a runtime cost throughout the design
<sipa> glozow: no reason why functionality for testing for particular topologies can't be added to txgraph, but the abstraction does mean there is an extra cost
<sipa> schmidty: no, because oversized clusters internally just cannot be _represented_, so if there are batched operations present which would require an oversized Cluster to be created, they just don't get applied, and instead the TxGraph goes into "oversized" state
<sipa> instagibbs: i think the threshold is anything that requires O(n^2) work or more (which includes computing the ancestors/descendants of a transaction)
<glozow> Under what circumstances could GetIndividualFeerate and GetChunkFeerate return an empty FeeFrac? Is it possible that one returns an empty FeeFrac and the other doesnât?
<sipa> monlovesmango: GetChunkFeerate is just *not allowed* to be called when oversized, so talking about its correctness or what it would return isn't really well-defined
<sipa> marcofleon: or rather, if it might fail, you're just not allowed to call anything that needs MakeAcceptable (because that implies you're oversized)
<glozow> the answer I was looking for is essentially: txgraph is oversized but also lazy (wow that's kind of mean to say) and none of the mutation functions synchronously do cluster merging or linearization
<stringintech> Earlier glozow: asked if `main` can be oversized too which became my question too :)) I guess it was not answered or I missed the answer.
<stringintech> As far as I saw in the code (not deep enough though), operations were happening on the upmost clusterset. So if we do not have staging the main could be oversized.
<glozow> stringintech: yes, it depends on usage. Intended usage is to first stage changes and discard them if they result in an oversized txgraph. But you can also apply things directly to main and make it oversized
<sipa> stringintech: in practice (after follow-up PRs to make the mempool code actually use txgraph), staging will be used for rbf evaluations, while main will be used directly from blocks... and a reorg can add transactions back to the mempool, which could temporarily cause main to be oversized
<sipa> marcofleon: sorry, it was a trick question (actually, i just forgot that had changed recently), Entrys are not removed from m_entries until the Ref is destructed