A cluster is a group of transactions where some transactions depend on others, forming parent-child relationships. The simplest cluster is a single transaction, and most clusters consist of just a few transactions. However, there can be more complex configurations that form large directed acyclic graphs (with grandparents, siblings, uncles, etc). The current limit on cluster size is 64 transactions.
At a high level, cluster linearization is the process of sorting the transactions within a given cluster in a way that respects dependencies (parents before children) while maximizing the effective fee rate when considering “chunks” of this ordered list. The goal is to produce an optimal linearization: an ordering with the best possible sequence of chunk fee rates (fee rate profile). This linearization allows miners to include the most profitable transactions first. (for more in-depth reading, see here)
The data structures and algorithms for cluster linearization can be found in cluster_linearize.h.
The linearization code is complex and therefore relies on fuzz testing to ensure correctness. Using the wide variety of data generated by the fuzzer to build dependency graphs (DepGraph instances) and produce/update linearizations is probably our best bet for finding any subtle bugs or edge cases in the code.
The cluster_linearize fuzz test includes a few helper classes and functions that are used to compare against the actual implementation. The motivation for this PR is to construct new fuzz targets that are used to make sure these helpers work properly. By having dedicated targets that test the test, we can be even more confident that our cluster linearization code is bug-free. Woo!
Note: the current cluster linearization algorithm may be replaced by a new and improved spanning-forest linearization (SFL) algorithm in the future. SFL is essentially a drop-in replacement, so while the underlying algorithm changes, it doesn’t change the tests much. It’s outside of the scope of this review club, but you can look at PR 32545 or this delving post if you’re interested.
Exercise
Try fuzzing the clusterlin_linearize target. You can refer to the fuzzing docs for assistance. Using --preset=libfuzzer-nosan will likely make building easier.
Now change chunking to simple_chunking at this line and run the target again.
After a few minutes, you should see a crash. Welcome to fuzzing. This is a simple example of a tiny bug in the test itself which fuzz testing catches quickly. A lot of the time, a crash happens because something is wrong in the fuzz harness itself, as opposed to a bug in the production code. This is why it can be useful to have targets or assertions that ensure that the fuzz test itself is correct.
What are the two new fuzz targets introduced and what are they testing? In terms of what they’re testing, how are they different from the other targets?
What are the benefits of separating out the fuzzing of the internal test helpers?
Were you able to run the clusterlin_linearize target? How many iterations (the first number libFuzzer shows at each line) did it take to produce a crash after making the s/chunking/simple_chunking/ change?
In clusterlin_simple_finder, when finding a topologically valid subset to remove from the graph, why is it important to set non_empty to true? What could happen if we allowed empty sets?
In clusterlin_simple_linearize, why does the code only verify against all possible permutations when depgraph.TxCount() <= 8? What would happen if we tried to do this for larger transaction counts?
In commit 1e4f345, when a non-topological permutation is found, the code now fast forwards by reversing part of the permutation. Why does this optimization work, and how many permutations can it skip in the best case?
In SimpleCandidateFinder::FindCandidateSet(), the algorithm keeps track of two sets for each work unit: inc (transactions definitely included) and und (transactions that can still be added). Why does it need to track both sets instead of just tracking undecided transactions?
<marcofleon> What are the two new fuzz targets introduced and what are they testing? In terms of what they’re testing, how are they different from the other targets?
<sliv3r__> here I have a question bc I though ones were testing the functionality and the other ones were testing the tests but when I checked the functions I saw that were just simpler re-implementations of the ones found in cluster_linearize.h
<monlovesmango> clusterlin_simple_finder and clusterlin_simple_linearize. they are different than other targets bc they are fuzz testing functions implemented for fuzz tests
<sipa> sliv3r__: the short answer is that SimpleCandidateFinder, ExhaustiveCandidateFinder, SimpleLinearize are part of the test suite, not of production code, even though they mimick the functionality of production code funcstions/classes
<sipa> one tool that we (or i, as author, in this case) have available to make it easier on reviewers, is by writing a simpler, more obviously correct, but potentially slower/less feature-rich reimplementation that does the same thing, plus tests that they are identical
<sipa> so then if you as a reviewer have confidence in the reimplementation + the equivalence test, you also gain confidence in the original implementation
<sipa> sliv3r__: well that is what the fuzz tests do: run a scenario (controlled by the fuzzer) of operation to both the production version and the simpler version, and see they produce the same result
<sipa> and in this case, even the "simpler" version is fairly nontrivial, so in addition, there is a third, *even* simpler and *even* slower version implemented
<monlovesmango> I think that makes sense for clusterlin_simple_finder bc it doesn't seem to compare with other prod code, but for clusterlin_simple_linearization it is comparing to ChunkLinearization which doesn't necessarily feel simpler
<sipa> marcofleon: there is the candidate-finding logic, where we have SearchCandidateFinder (prod), SimpleCandidateFinder (simpler), ExhaustiveCandidateFinder (simplest)
<sipa> and there is the linearization logic on top, where there are also 3 version: Linearize (prod), SimpleLinearize (simpler), and the std::next_permutation based exhaustive search (simplest, but not a separate function)
<sipa> abubakarsadiq: there is no SearchCandidateFinder anymore post-SFL; SFL is a replacement for Linearize directly, which doesn't involve candidate finding anymore
<sipa> SimpleCandidateFinder and ExhaustiveCandidateFinder stay, but they're just used in/for SimpleLinearize, which the SFL-based Linearize is then compared against
<marcofleon> monlovesmango if you ran the fuzz tests a bit, question 4: Were you able to run the clusterlin_linearize target? How many iterations (the first number libFuzzer shows at each line) did it take to produce a crash after making the s/chunking/simple_chunking/ change?
<sipa> abubakarsadiq: but i don't think it gains us very much, as it's pretty non-trivial anyway, so the "added confidence" isn't as much as compared with the much simpler reimplementations gives us
<monlovesmango> marcofleon: I actually had a question about that. I did get the failure, but I was using the python test runner and it doesn't show any output with the counts, just the error
<sipa> sliv3r__: it is the input to the fuzz harness, which interprets bytes from it (each of the provider.ConsumeIntegral... functions decodes some bytes from it)
<monlovesmango> sliv3r__: thank you I see that example now... I think I didn't really know what that param did bc I didn't realize 'process_message' was a test
<monlovesmango> it was moved bc otherwise it decrements for splits that don't actually need to be considered (which I imagine might also mess with the result being optimal assumption)
<sliv3r__> If I didn't understand wrong we want to only reduce the counter when we add a new "valid" subset. So the number of iterations is proportional to the number of different connected subsets that cna be created instead to be realted to the total num of elements in the queue
<sipa> so the idea is that we want some kind of "cost metric" to control how much time is spent inside SimpleCandidateFinder, just so that it does not run forever, but can stop it after "too much" work has been performed
<marcofleon> In clusterlin_simple_finder, when finding a topologically valid subset to remove from the graph, why is it important to set non_empty to true? What could happen if we allowed empty sets?
<sipa> abubakarsadiq: (bonus) how likely would it be that we *keep* removing nothing, because just occasionally not removing anything wouldn't be an infinite loop?
<marcofleon> In clusterlin_simple_linearize, why does the code only verify against all possible permutations when depgraph.TxCount() <= 8? What would happen if we tried to do this for larger transaction counts?
<abubakarsadiq> was wondering the reason you choose to evict randomly. For testing weird and all kind of behaviors yeah. Because the found txs can also be the read transaction
<marcofleon> We can throw in the next question, it's related. In commit 1e4f345, when a non-topological permutation is found, the code now fast forwards by reversing part of the permutation. Why does this optimization work, and how many permutations can it skip in the best case?
<sipa> so the number of permutations tried is equal to the number of topologically-valid ones, plus the number of topologically-invalid minimal prefixes
<sipa> sliv3r__: every dependency is a "parent must come before child in topological orderings" constraint, so easy additional dependency reduces the number of valid ones
<sipa> monlovesmango: maybe, but nothing really changes there... take Search/Simple/ExhaustiveCandidateFinder... before we had one fuzz test that compared all three (subject to cluster-not-too-large constraints etc)
<sipa> if you're already confident in SimpleCandidateFinder, you have no "need" to waste computation effort on running the Simple/Exhaustive comparison too
<sliv3r__> My idea checking the algorithm was that for inclusion if you don't have inc you may try to add a tx without an ancestor being in the anc set
<marcofleon> sure, maybe Abubakar has a clearer way to look at it, but I sort of initially thought that it was possible to get the inc with only having und
<monlovesmango> one of the sets that you add back to the queue has und - descendants(split), so I don't know how you would figure out inc from that queue set
<sliv3r__> I think keeping inc explicitly is needed bc tx have dependencies between them, not explicitly knowing what is inc may cause you try to add a tx without some ancestor in the inc
<marcofleon> And that's good to know actually, I'll keep that mind. I did realize this review club was trying to do two things at once, which didn't help
<sliv3r__> Last question answer was: just release the code and let users complain and say that Core introduces bugs right (as it's the new tred saying that)? :P
<marcofleon> glad you two were able to fuzz it a bit. But yeah monlovesmango I recommend doing the indivdual targets like how we said earlier. The test runner is fine for running through a corpus of inputs once