Block building (done by the
BlockAssembler)
is the process of selecting ~4MWU worth of mempool transactions while trying to maximize the total fees.
Non-mining nodes might never use the block builder, but are still interested in comparing the incentive compatibility
of transactions as a metric for whether it is worth keeping: if the mempool exceeds its maximum memory allowance, it
should evict the transactions that are least likely to be mined soon.
The block building algorithm
on master uses ancestor set-based sort to dynamically linearize mempool transactions by including ancestor sets in order of
highest ancestor score (minimum between ancestor feerate and individual feerate). As transactions are included,
their descendants’ ancestor scores can change; the BlockAssembler keeps a copy of mempool entries with their updated
ancestor information in a separate map, which avoids modifying the mempool itself. We covered this algorithm in more
detail in a previous meeting.
Ancestor set-based sorting can be used to linearize the entire mempool to find the “worst” transaction for eviction,
but this algorithm would be too slow to use in practice.
Instead, eviction runs a similar algorithm with ascending descendant score (maximum between descendant
feerate and individual feerate). This also linearizes transactions in an approximate order of least to most likely
to be mined, but the linearization is not exactly opposite to the result of ancestor set-based sorting. This asymmetry
is problematic:
What if the first transaction selected for eviction is also the transaction that would be selected first for
mining?
Similarly, what if we have transactions that are “junk” (e.g. fall below the node’s configured -blockmintxfee
and would thus never be selected for mining) but can’t be kicked out because they have a high descendant score?
This mempool limitation necessitated the requirement that package transactions be above the minimum relay feerate
(see PR #26933).
Due to similar limitations, Replace-by-Fee code cannot properly determine whether the proposed replacement(s) are more incentive compatible, so
it uses approximations like individual feerate comparisons. These imperfect heuristics are a major cause of pinning problems and
fee-bumping inefficiencies.
We have reviewed TxGraph basic functionality in a previous meeting. If TxGraph is new to you, the host
recommends reading some of those notes and getting a feel for TxGraph by reviewing SimTxGraph in the fuzzer, which
has similar functionality but a far simpler design:
It uses a single DepGraph to represent all transactions across all clusters, which means it doesn’t need to implement merges or track much information about the set of clusters.
It implements the staging level by creating a copy of main and deleting either the main or the staging to commit or abort staged changes. This approach is much more memory and CPU-intensive, but doesn’t need to track the differences between the levels.
A cluster mempool essentially keeps the entire mempool linearized at all times (lazily using TxGraph), which
makes it easy to quickly determine a transaction’s linearization position (including what the highest and lowest
transactions are) and compare the current mempool with a potential one.
PR #31444 adds TxGraph functionality for these purposes:
TxGraph adds a chunk index, a total ordering of all chunks in the graph across all clusters. A
ChunkOrder
comparator
defines the order in which chunks would be mined.
BlockBuilder can just iterate through the chunks in order. Also, since a total orderering of all graph entries
exists, we can get the “least likely to be mined” transaction without building a full mempool worth of block
templates: simply take the last chunk in the index.
TxGraph exposes a BlockBuilder to iterate over the chunks from best to worst. BlockBuilder can Skip chunks (i.e.
if they are too large to fit in the remaining block space), and those chunks’ descendants will be subsequently
skipped as well.
TxGraph also exposes a GetWorstMainChunk method to identify the transactions that would be included in a block last.
The PR also adds a GetMainStagingDiagrams method to make it possible to compare the feerate diagram of the TxGraph
with and without its staged changes.
This Delving post discusses the theory behind
evaluating replacements using feerate diagrams.
Feerate diagram comparison is already used in package RBF. Package RBF is limited to clusters of size 2, which are easy to linearize even without cluster mempool.
Why are block building and eviction relevant to each other? Wouldn’t it be easier to evict transactions by the order
they entered the mempool?
True / false: if all clusters are singletons (have 1 transaction each), m_main_chunkindex would just be sorting
the transactions by feerate.
In English, using the approach in this PR, what is the algorithm for selecting transactions in order for block building? And for eviction?
How would a client of BlockBuilder use it to build a block? When would GetCurrentChunk, Include, and Skip be called?
What is the expected lifetime of BlockBuilder (is it similar to CTxMemPool’s or very different)?
Can you create a BlockBuilder when staging exists? Can you build a block using the TxGraph’s state with its staged changes?
Does BlockBuilder modify TxGraph (a “yes and no”-style answer is ok)?
Why does BlockBuilder need to remember the set of skipped transactions? Why can it be represented as a set of Clusters?
This commit
adds new fields in data structures that need to point to each other: Entry now contains an iterator to the
transaction’s ChunkData in m_main_chunkindex, and ChunkData refrence Entrys by their position in m_entries. In
your review, how did you check that these pointers are always kept up-to-date?
Conceptually, what are all the ways that an entry’s chunk index can change?
In the ChunkOrder comparator,
when cmp_feerate != 0, why can it be returned directly without comparing position within the cluster?
m_main_chunkindex_observers indicates the existence of aBlockBuilder. Why is it an integer instead of a boolean?
This
call
to GetClusterRefs gets the vector of Refs corresponding to the chunk. Why are the arguments ret.first and
start_pos = chunk_end_entry.m_main_lin_index + 1 - chunk_data.m_chunk_count correct?
Continuing from the last question, why is std::reverse called on the result?
What is GetMainStagingDiagrams useful for? Why might we exclude the clusters that are identical in main and staging?
<glozow> Why are block building and eviction relevant to each other? Wouldn’t it be easier to evict transactions by the order they entered the mempool?
<sipa> monlovesmango: just in case you're not aware - running fuzz tests generally runs indefinitely; it keeps trying to make randomized changes to the input, and seeing if those trigger code coverage change and (even better) assertion failures
<monlovesmango> they are both looking for the same data (ording of tx clusters by fee rate), just opposite goals. one wants the top fee rates, the other the lowest fee rate.
<pseudoramdom> I see. Yeah, it makes sense to have them accurately ordered. Block building can select the "best" end of the list. And eviction removes from the "worst" end
<Murch[m]> monlovesmango: You can just interrupt fuzz tests any time, or you can set a `max_total_time` or `runs` limit. You can ping me later if you want
<abubakarsadiq> @monlovesmango: I run the fuzz test to verify my understand that if I make modification to a commit I expect an assertion I added to be triggered which always do.
<sipa> glozow: is your question more about what BlockBuildImpl does internally, or how you imagine high-level code would use BlockBuilder based on its interface?
<monlovesmango> the algorithm for selecting transactions is to group related tx into clusters, and then linearize each cluster into chunks by their fee rate (this part i'm still fuzzy on), and then order all chunks by fee rate, and then pick chunks by decreasing fee rate (skipping chunks from clusters that have had a chunk skipped)
<abubakarsadiq> It's a linear algorithm that just get the current chunk recommended by block builder and include it when it satisfy a condition or skip it when it didn't
<sipa> monlovesmango: once any chunk in a cluster has been skipped, nothing else from the cluster can't be included anymore, because the later transactions in the cluster may depend on transactions from the skipped chunk
<sipa> abubakarsadiq: depends what you mean with correct; it won't be a linearization for the full cluster anymore as some transactions will clearly be missing, but it may still be a valid linearization for what remains of the cluster after removing the skipped chunk
<sipa> but BlockBuilderImpl (currently) doesn't try to reason about that because it's hard, and breaks the abstraction offered by linearizations/chunks
<sipa> abubakarsadiq: even the fact that we're only considering transactions in just a single order (the linearization) may result in small amounts of lost fees
<monlovesmango> bc everytime TxGraph is updated it will need re-linearization, which you don't want to do while something is actively observing the ordering
<pseudoramdom> Why should `TxGraph` disallow changes while an observer exists? - the ordering mught change, possibility of missing a tx if the underlying graph changed?
<sipa> specifically, if you were making changes to main while a block builder exists, what would go wrong (ignoring the fact that it's not allowed, and might trigger Assume failes)
<glozow> This commit adds new fields in data structures that need to point to each other: Entry now contains an iterator to the transaction’s ChunkData in m_main_chunkindex, and ChunkData refrence Entrys by their position in m_entries. In your review, how did you check that these pointers are always kept up-to-date?
<monlovesmango> wasn't able to finish trying to answer this question, but I would imagine that you want to check all the places where txgraph mutates (ApplyDependencies and Relinearize)
<glozow> Yeah, this is more of a "how do you review stuff?" question. I counted up the pointers and checked that there were assertions for them in `SanityCheck()`
<glozow> monlovesmango: sure. generally I think that if pointer consistency is checked in `SanityCheck` and the fuzzer is good at finding possible (mis)uses of `TxGraph`, I can feel reasonably confident that `TxGraph` is updating those pointers correctly
<sipa> pseudoramdom: that's the same thing... chunk feerates are a function of the linearization of a cluster, so anything that changes the linearization can affect it... and that includes RBF, but that is effectively through the cluster changes by adding/removing transactions from it
<sipa> glozow: and as abubakarsadiq already mentioned, to get confidence that the fuzzer is indeed capable of finding such problems, it's often good to just try introducing a bug you expect SanityCheck (or other assertion) to find, and see if it actually does