Cluster linearization: separate tests from tests-of-tests (tests)

https://github.com/bitcoin/bitcoin/pull/30605

Host: marcofleon  -  PR author: sipa

The PR branch HEAD was d2fc10b64505512a70bed05e7bb21d76c8f748cd at the time of this review club meeting.

Notes

  • A cluster is a group of transactions where some transactions depend on others, forming parent-child relationships. The simplest cluster is a single transaction, and most clusters consist of just a few transactions. However, there can be more complex configurations that form large directed acyclic graphs (with grandparents, siblings, uncles, etc). The current limit on cluster size is 64 transactions.

  • At a high level, cluster linearization is the process of sorting the transactions within a given cluster in a way that respects dependencies (parents before children) while maximizing the effective fee rate when considering “chunks” of this ordered list. The goal is to produce an optimal linearization: an ordering with the best possible sequence of chunk fee rates (fee rate profile). This linearization allows miners to include the most profitable transactions first. (for more in-depth reading, see here)
    • The data structures and algorithms for cluster linearization can be found in cluster_linearize.h.
  • The linearization code is complex and therefore relies on fuzz testing to ensure correctness. Using the wide variety of data generated by the fuzzer to build dependency graphs (DepGraph instances) and produce/update linearizations is probably our best bet for finding any subtle bugs or edge cases in the code.

  • The cluster_linearize fuzz test includes a few helper classes and functions that are used to compare against the actual implementation. The motivation for this PR is to construct new fuzz targets that are used to make sure these helpers work properly. By having dedicated targets that test the test, we can be even more confident that our cluster linearization code is bug-free. Woo!

  • Note: the current cluster linearization algorithm may be replaced by a new and improved spanning-forest linearization (SFL) algorithm in the future. SFL is essentially a drop-in replacement, so while the underlying algorithm changes, it doesn’t change the tests much. It’s outside of the scope of this review club, but you can look at PR 32545 or this delving post if you’re interested.

Exercise

  • Try fuzzing the clusterlin_linearize target. You can refer to the fuzzing docs for assistance. Using --preset=libfuzzer-nosan will likely make building easier.

  • Now change chunking to simple_chunking at this line and run the target again.

  • After a few minutes, you should see a crash. Welcome to fuzzing. This is a simple example of a tiny bug in the test itself which fuzz testing catches quickly. A lot of the time, a crash happens because something is wrong in the fuzz harness itself, as opposed to a bug in the production code. This is why it can be useful to have targets or assertions that ensure that the fuzz test itself is correct.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. What are the two new fuzz targets introduced and what are they testing? In terms of what they’re testing, how are they different from the other targets?

  3. What are the benefits of separating out the fuzzing of the internal test helpers?

  4. Were you able to run the clusterlin_linearize target? How many iterations (the first number libFuzzer shows at each line) did it take to produce a crash after making the s/chunking/simple_chunking/ change?

  5. In commit 2763b75, why was --iterations_left moved?

  6. In clusterlin_simple_finder, when finding a topologically valid subset to remove from the graph, why is it important to set non_empty to true? What could happen if we allowed empty sets?

  7. In clusterlin_simple_linearize, why does the code only verify against all possible permutations when depgraph.TxCount() <= 8? What would happen if we tried to do this for larger transaction counts?

  8. In commit 1e4f345, when a non-topological permutation is found, the code now fast forwards by reversing part of the permutation. Why does this optimization work, and how many permutations can it skip in the best case?

  9. In SimpleCandidateFinder::FindCandidateSet(), the algorithm keeps track of two sets for each work unit: inc (transactions definitely included) and und (transactions that can still be added). Why does it need to track both sets instead of just tracking undecided transactions?

  10. What is, no doubt, the coolest way to test code?

Meeting Log

  117:00 <marcofleon> #startmeeting
  217:00 <corebot`> marcofleon: Meeting started at 2025-06-11T17:00+0000
  317:00 <corebot`> marcofleon: Current chairs: marcofleon
  417:00 <corebot`> marcofleon: Useful commands: #action #info #idea #link #topic #motion #vote #close #endmeeting
  517:00 <corebot`> marcofleon: See also: https://hcoop-meetbot.readthedocs.io/en/stable/
  617:00 <corebot`> marcofleon: Participants should now identify themselves with '#here' or with an alias like '#here FirstLast'
  717:00 <sliv3r__> Hi!
  817:01 <stickies-v> hi
  917:01 <monlovesmango> heyy
 1017:01 <sipa> hi
 1117:01 <marcofleon> welcome! we'll be going over https://bitcoincore.reviews/30605
 1217:02 <marcofleon> litte re run of last week, we can actually go over the questions this time
 1317:02 <marcofleon> Did you review the PR/notes? Concept ACK, approach ACK, tested ACK, or NACK?
 1417:02 <stickies-v> didn't review, just lurking today
 1517:03 <marcofleon> lurkers are welcome
 1617:03 <sliv3r__> tried to :) was not an easy one
 1717:03 <monlovesmango> yes a bit. tested too.
 1817:03 <marcofleon> agreed, the cluster linearization code can be tough
 1917:04 <sipa> i hope the testing code for it is easier!
 2017:04 <marcofleon> monlovesmango: ran the fuzz tests? nice! I guess we can go over that a bit with question 4
 2117:04 <marcofleon> okay, let's dive in
 2217:04 <marcofleon> What are the two new fuzz targets introduced and what are they testing? In terms of what they’re testing, how are they different from the other targets?
 2317:05 <sliv3r__> clusterlin_simple_finder and clusterlin_simple_linearize
 2417:06 <sipa> ✔️
 2517:06 <sliv3r__> here I have a question bc I though ones were testing the functionality and the other ones were testing the tests but when I checked the functions I saw that were just simpler re-implementations of the ones found in cluster_linearize.h
 2617:06 <sliv3r__> why is that?
 2717:07 <monlovesmango> clusterlin_simple_finder and clusterlin_simple_linearize. they are different than other targets bc they are fuzz testing functions implemented for fuzz tests
 2817:07 <sipa> sliv3r__: let me explain
 2917:07 <marcofleon> sliv3r__: That could be related to the next question actually
 3017:07 <marcofleon> in terms of what are the benefits of doing that
 3117:08 <sliv3r__> marcofleon: that could be the reason why I wrote IDK as an answer to the next one in my notes :)
 3217:08 <sipa> sliv3r__: the short answer is that SimpleCandidateFinder, ExhaustiveCandidateFinder, SimpleLinearize are part of the test suite, not of production code, even though they mimick the functionality of production code funcstions/classes
 3317:09 <sipa> the point is that the production code is complicated, which means it may be hard for reviewers to get confidence in their correctness
 3417:09 <sliv3r__> and how can we be sure that the behaviour is equal or at least equivalent?
 3517:10 <marcofleon> and so the simpler implementations provide an oracle to check against
 3617:10 <sipa> one tool that we (or i, as author, in this case) have available to make it easier on reviewers, is by writing a simpler, more obviously correct, but potentially slower/less feature-rich reimplementation that does the same thing, plus tests that they are identical
 3717:10 <sipa> so then if you as a reviewer have confidence in the reimplementation + the equivalence test, you also gain confidence in the original implementation
 3817:11 <sipa> sliv3r__: well that is what the fuzz tests do: run a scenario (controlled by the fuzzer) of operation to both the production version and the simpler version, and see they produce the same result
 3917:11 <sipa> that's not a proof of course, but it is one tool that can increase confidence if no such differences are found
 4017:11 <sliv3r__> make sense
 4117:11 <sipa> if you want to get an idea of how good it is, you can try inserting deliberate bugs and see how long it takes for the fuzzer to find it
 4217:11 <abubakarsadiq> hi
 4317:12 <marcofleon> abubakarsadiq: welcome
 4417:12 <sliv3r__> nice thanks, more clear now
 4517:12 <sipa> and in this case, even the "simpler" version is fairly nontrivial, so in addition, there is a third, *even* simpler and *even* slower version implemented
 4617:13 <marcofleon> i'll move on to question 4, unless there's more benefits other than increasing confidence of correctness? for question 3 I mean
 4717:13 <sipa> where the current fuzz tests compare all 3 with each other
 4817:13 <marcofleon> That's the exhaustive one yes?
 4917:13 <sipa> marcofleon: yeah
 5017:13 <monlovesmango> I think that makes sense for clusterlin_simple_finder bc it doesn't seem to compare with other prod code, but for clusterlin_simple_linearization it is comparing to ChunkLinearization which doesn't necessarily feel simpler
 5117:13 <sipa> marcofleon: there are 2 dimensions i think
 5217:14 <sipa> marcofleon: there is the candidate-finding logic, where we have SearchCandidateFinder (prod), SimpleCandidateFinder (simpler), ExhaustiveCandidateFinder (simplest)
 5317:14 <sipa> and there is the linearization logic on top, where there are also 3 version: Linearize (prod), SimpleLinearize (simpler), and the std::next_permutation based exhaustive search (simplest, but not a separate function)
 5417:15 <abubakarsadiq> sipa: haven't look yet, but after SFL simple candidate finder would be changed to match the new algorithm yeah?
 5517:16 <sipa> abubakarsadiq: there is no SearchCandidateFinder anymore post-SFL; SFL is a replacement for Linearize directly, which doesn't involve candidate finding anymore
 5617:16 <marcofleon> So building layers of trust essentially, makes sense
 5717:16 <sipa> SimpleCandidateFinder and ExhaustiveCandidateFinder stay, but they're just used in/for SimpleLinearize, which the SFL-based Linearize is then compared against
 5817:16 <sipa> kind of vestigial
 5917:17 <abubakarsadiq> So we will use exhaustive search to validate correctness of SFL?
 6017:17 <sipa> that would be a possibility!
 6117:17 <marcofleon> monlovesmango if you ran the fuzz tests a bit, question 4: Were you able to run the clusterlin_linearize target? How many iterations (the first number libFuzzer shows at each line) did it take to produce a crash after making the s/chunking/simple_chunking/ change?
 6217:17 <monlovesmango> sipa: ok std::next_permutation was what I was missing
 6317:17 <sipa> abubakarsadiq: but i don't think it gains us very much, as it's pretty non-trivial anyway, so the "added confidence" isn't as much as compared with the much simpler reimplementations gives us
 6417:17 <abubakarsadiq> Or the permutation of the graph linearizations I think will be more valid candidate
 6517:17 <monlovesmango> marcofleon: I actually had a question about that. I did get the failure, but I was using the python test runner and it doesn't show any output with the counts, just the error
 6617:18 <sliv3r__> marcofleon: it took for me 566593 iteartions
 6717:18 <monlovesmango> is there another way I should be running individual tests?
 6817:18 <sipa> i always run the tests individually
 6917:18 <sliv3r__> I did it running: FUZZ=clusterlin_linearize build_fuzz_nosan/bin/fuzz
 7017:18 <sipa> that ^
 7117:19 <marcofleon> exactly, yeah
 7217:19 <monlovesmango> sliv3r__: ok that helps thank you :)
 7317:19 <sipa> i usually add -use_value_profile=1 -workers=30 -jobs=30, to get 30 concurrent processes, with more tendency to explore more paths
 7417:19 <marcofleon> and yeah ~500k iterations was where i was at too. It crashes pretty quickly
 7517:19 <sliv3r__> monlovesmango: in the fuzzing docs you have an example at the very begining https://github.com/bitcoin/bitcoin/blob/master/doc/fuzzing.md
 7617:20 <marcofleon> And then running it like how sliv3r__ said you can add a directory at the end to save the inputs if you'd like. Keep a corpus for later
 7717:20 <sliv3r__> marcofleon: Then it says something like: Test unit written to ./crash-1cea6b51b877d277ba4d1ba7f522b7d3ac182349
 7817:20 <sliv3r__> what's that? bc it's impossible to human-read it :)
 7917:20 <sipa> marcofleon: maybe it makes sense to move the std::next_permutation based logic to an ExhaustiveLinearize function
 8017:21 <marcofleon> sipa: it could make it clearer, agreed
 8117:21 <marcofleon> sliv3r__: that's the input that caused the crash
 8217:21 <sipa> sliv3r__: it is the input to the fuzz harness, which interprets bytes from it (each of the provider.ConsumeIntegral... functions decodes some bytes from it)
 8317:22 <marcofleon> if you run the fuzz command and that input after, it should reproduce, hopefully
 8417:22 <sipa> you can re-run it, using FUZZ=clusterlin_linearize build_fuzz_nosan/bin/fuzz ./crash-1cea6b51b877d277ba4d1ba7f522b7d3ac182349
 8517:22 <sipa> which will not do any fuzzing, just run that one fuzz input through the harness
 8617:22 <monlovesmango> sipa: I think ExhaustiveLinearize would be good and consistent
 8717:22 <sliv3r__> yep runs one iteration and crashes instantly
 8817:23 <sipa> but now you can also modify the code, attempt to fix the bug, and retry with just that case
 8917:23 <marcofleon> that's when the debugging starts
 9017:23 <marcofleon> question 5: In commit 2763b75, why was --iterations_left moved?
 9117:23 <monlovesmango> sliv3r__: thank you I see that example now... I think I didn't really know what that param did bc I didn't realize 'process_message' was a test
 9217:24 <marcofleon> monlovesmango: sorry if I moved on too quickly! Yeah, the thing after FUZZ= is the fuzz target
 9317:24 <monlovesmango> marcofleon: no you're good!
 9417:24 <sliv3r__> sipa: right setting it back to chunking works :P
 9517:24 <sliv3r__> are we at q 5?
 9617:25 <marcofleon> we have a bunch of them. all looking something like "FUZZ_TARGET(clusterlin_depgraph_sim)" for example and then the test itself
 9717:25 <marcofleon> yeah, i'll paste again
 9817:25 <marcofleon> In commit 2763b75, why was --iterations_left moved?
 9917:25 <monlovesmango> it was moved bc otherwise it decrements for splits that don't actually need to be considered (which I imagine might also mess with the result being optimal assumption)
10017:26 <sipa> i think the answer here is kind of nuanced
10117:27 <marcofleon> I'm actually not sure if it was a bug fix per se... I think just an optimization?
10217:27 <sipa> but it also doesn't matter much
10317:27 <sliv3r__> If I didn't understand wrong we want to only reduce the counter when we add a new "valid" subset. So the number of iterations is proportional to the number of different connected subsets that cna be created instead to be realted to the total num of elements in the queue
10417:27 <sipa> so the idea is that we want some kind of "cost metric" to control how much time is spent inside SimpleCandidateFinder, just so that it does not run forever, but can stop it after "too much" work has been performed
10517:28 <sipa> it doesn't really matter what that metric is, as long as it's roughly proportional to runtime
10617:28 <sipa> the metric that was being used so far was "queue elements processed"
10717:28 <sipa> moving the --iterations_left is really changing the metric to something else: the number of candidate sets considered
10817:29 <monlovesmango> interesting
10917:29 <sipa> they're very similar, but each candidate set being considered can give rise to two queue elements
11017:29 <sipa> and there is an initial queue element on the stack that's not a candidate set
11117:29 <sipa> so the old metric is at most 2*N+1, where N is the new metric
11217:30 <marcofleon> In clusterlin_simple_finder, when finding a topologically valid subset to remove from the graph, why is it important to set non_empty to true? What could happen if we allowed empty sets?
11317:30 <sliv3r__> infinite loop
11417:30 <sipa> so why change is: the new one is easier to reason about, because it's easy to count how many candidate sets a cluster can have (2^(N-1))
11517:31 <sipa> so that's something that can be concretely tested for: if the limit is set to at least 2^(N-1), the optimal must be found
11617:31 <sipa> makes sense?
11717:32 <monlovesmango> seems like a better metric. yes.
11817:32 <sipa> it's not a big change, just making things slightly easier to reason about
11917:32 <sliv3r__> it makes more sense
12017:32 <sipa> but the most important thing is that there is some metric, and it's being hit
12117:33 <marcofleon> sure, I do thiink it makes sense to have it be only when a valid transaction is found
12217:33 <marcofleon> a better metric, then just next in the queue
12317:34 <marcofleon> *than
12417:34 <sipa> *candidate set, not transaction
12517:34 <marcofleon> valid set, sorry yeah
12617:35 <abubakarsadiq> We have to find something to remove in the prior commit we evict the found transactions when the it's empty
12717:35 <abubakarsadiq> infinite loop if we don't remove anything
12817:35 <marcofleon> for q6, nice yeah
12917:36 <sliv3r__> yeah so basically we have an empty one the next iteration is equal to the current one so loop loop loop :)
13017:36 <marcofleon> it would just be the same set if nothing is removed
13117:36 <sliv3r__> if we have*
13217:36 <sipa> abubakarsadiq: (bonus) how likely would it be that we *keep* removing nothing, because just occasionally not removing anything wouldn't be an infinite loop?
13317:38 <abubakarsadiq> Yeah highly unlikely
13417:38 <sipa> actually no, extremely likely :p
13517:38 <sliv3r__> I'm lost haha
13617:38 <sipa> it would happen anytime you hit the end of the fuzz input
13717:38 <sipa> because once you hit the end, ConsumeIntegral and friends will keep forever returning 0
13817:38 <sipa> which would be interpreted as the empty set
13917:39 <sipa> fuzz input is **not** random
14017:39 <marcofleon> oh yes of course, there'd be no more data from the provider at some point
14117:39 <sipa> it's hopefully better than random, but it can also be a lot worse
14217:40 <sipa> abubakarsadiq: sorry, that was a trick question :)
14317:40 <marcofleon> In clusterlin_simple_linearize, why does the code only verify against all possible permutations when depgraph.TxCount() <= 8? What would happen if we tried to do this for larger transaction counts?
14417:40 <monlovesmango> very thought provoking trick question
14517:41 <monlovesmango> the exhaustive search would start taking too long
14617:41 <sliv3r__> I think I can answer for TxCount <= 7 bc I don't understand the optimization included in cce29ef63ecf114003b529093fdfd2574b830afc
14717:41 <abubakarsadiq> was wondering the reason you choose to evict randomly. For testing weird and all kind of behaviors yeah. Because the found txs can also be the read transaction
14817:41 <abubakarsadiq> sipa: it's okay I learn something :P
14917:42 <marcofleon> We can throw in the next question, it's related. In commit 1e4f345, when a non-topological permutation is found, the code now fast forwards by reversing part of the permutation. Why does this optimization work, and how many permutations can it skip in the best case?
15017:42 <sliv3r__> So basically the cost would be huge as the possible permutations is N!. For 7 would be 5050 and higher numbers increase the number heavily
15117:42 <marcofleon> i tried with 9 and the execs/sec were about 4x slower
15217:43 <sliv3r__> But with 1e4f345 the worst case still being 8! which is huge right? Why is this acceptable if it wasn't before? (bc we had 7)
15317:43 <sipa> yeah, that's the answer... the number of iterations can grow factorially
15417:44 <sipa> sliv3r__: just 8! = 40320, that's not enormous
15517:44 <sipa> fuzz tests work best when they stay above 100-1000 attempts per second
15617:44 <sliv3r__> why was it 7 before if 8 is ok?
15717:44 <sipa> hmm?
15817:45 <sipa> oh, with the optimization there are just fewer permutations searched through
15917:45 <marcofleon> The optimization of skipping all invalid permutations makes it possible to bump up
16017:45 <marcofleon> sorry, i should be more specific
16117:45 <sipa> it's not actually possible for all 8! permutations to be topological, so many/most of them will be skipped by the optimization
16217:46 <sipa> and the numbers 7 and 8 are just set based on benchmarks... which ones make the test too slow
16317:46 <marcofleon> all the permutations, after having found an invalid prefix yes?
16417:46 <sipa> yes, it'll skip all the permutations with the *same* invalid prefix
16517:46 <sliv3r__> oh ok so there's no "human logic" to that :) I was getting too picky with the numbers
16617:47 <abubakarsadiq> It's human logic :p
16717:47 <sipa> so the number of permutations tried is equal to the number of topologically-valid ones, plus the number of topologically-invalid minimal prefixes
16817:47 <sipa> i don't have numbers for what those are, it just works in practice (TM)
16917:47 <marcofleon> and how many permutations can it skip in the best case then, just to complete the question
17017:48 <marcofleon> also sipa: did you ever figure out how many permutations can be tried in total knowing that there are K valid ones?
17117:48 <sipa> marcofleon: i have not
17217:48 <abubakarsadiq> Depends on the cluster
17317:48 <marcofleon> ^ that's what i was thinking too
17417:48 <sipa> i thought it'd be easy to find a formula, but it does not seem simple at all
17517:48 <sipa> after some experimentation
17617:49 <sipa> but it skips a lot in practice, which is enough :p
17717:49 <monlovesmango> 7! ?
17817:49 <sliv3r__> sipa: why is not possible for all 8! permutations be topological?
17917:49 <sipa> sliv3r__: that would imply there are no dependencies in the cluster, so it isn't a cluster
18017:49 <marcofleon> monlovesmango: basically yeah, although I think you would subtract one from that
18117:49 <sliv3r__> oh ok that was a stupid question :P
18217:50 <sipa> sliv3r__: every dependency is a "parent must come before child in topological orderings" constraint, so easy additional dependency reduces the number of valid ones
18317:50 <monlovesmango> whats the minus 1 for?
18417:50 <sipa> by how much depends on the topology
18517:50 <marcofleon> (n-1)! - 1 for n txs
18617:51 <sipa> you skip from the first topologically-invalid order with a given prefix to the last one with the same prefix
18717:51 <marcofleon> the -1 is the one tx that you did just try
18817:51 <sipa> you don't immediately skip to the next prefix
18917:51 <monlovesmango> marcofleon: makes sense thanks!
19017:52 <sipa> i think we never answered question 3?
19117:52 <marcofleon> What are the benefits of separating out the fuzzing of the internal test helpers?
19217:52 <marcofleon> I thought we touched on it, but that's true, never explicitly answered
19317:53 <monlovesmango> oh also I was able to test the fuzz crash and got 340233
19417:53 <marcofleon> nice. I love fuzz crashes
19517:53 <sliv3r__> 😂
19617:53 <sipa> marcofleon: fwiw, i suspect the vast majority of the fuzz crashes in these PRs happen before the PR is opened
19717:53 <monlovesmango> I think it help instill confidence on what is actually being tested
19817:53 <sipa> as in, i use them to help development
19917:54 <sipa> monlovesmango: maybe, but nothing really changes there... take Search/Simple/ExhaustiveCandidateFinder... before we had one fuzz test that compared all three (subject to cluster-not-too-large constraints etc)
20017:54 <sipa> now we have two tests, one comparing Search with Simple, one comparing Simple with Exhaustive
20117:55 <sipa> what's the benefit of the split?
20217:55 <sliv3r__> Avoid redundancy
20317:55 <sipa> yes!
20417:55 <monlovesmango> yes for me thats much easier to reason about and so for me I can be more confident in comparing Search with Simple
20517:55 <marcofleon> sipa: makes sense, it's a fairly easy and quick way to help initial bugs surface
20617:55 <sliv3r__> and also easier to review the code
20717:55 <sipa> if you're already confident in SimpleCandidateFinder, you have no "need" to waste computation effort on running the Simple/Exhaustive comparison too
20817:56 <sipa> if you're not, you should be focusing on just establishing that confidence first without putting effort into comparing with the production code
20917:56 <sipa> (whether that means more review, running fuzz tests, attending a review club, ...)
21017:57 <sipa> so i feel the goals of what you obtain from these tests is different, and it makes sense to separate them for that reason
21117:57 <marcofleon> Yeah, I think it's good to have the separate targets here
21217:57 <sipa> but yes, clearer code too :0
21317:58 <sipa> i will need to drop off now, but thanks for hosting marcofleon, and thanks all for spending time looking at my PR!
21417:58 <sliv3r__> thanks for the explanations :)
21517:58 <monlovesmango> thanks sipa!
21617:58 <marcofleon> thanks for all your help sipa
21717:58 <abubakarsadiq> thanks for hosting, thanks for your work sipa
21817:58 <marcofleon> okay let's jump into SFL then?
21917:59 <marcofleon> :P
22017:59 <abubakarsadiq> We have txgraph reorg waiting though
22117:59 <marcofleon> another day. Thanks for attending everybody
22217:59 <abubakarsadiq> It's also juicy
22317:59 <marcofleon> oh yeah I kinda forgot about that one, good call
22417:59 <sliv3r__> Could you guys share the answer for the last-1 question? Didn't get a clear answer of that
22518:00 <monlovesmango> I think for last question inc and und are not complementary
22618:01 <sliv3r__> My idea checking the algorithm was that for inclusion if you don't have inc you may try to add a tx without an ancestor being in the anc set
22718:01 <marcofleon> sure, maybe Abubakar has a clearer way to look at it, but I sort of initially thought that it was possible to get the inc with only having und
22818:02 <marcofleon> but then I realized that in order to keep track of feerate, you would need inc I believe
22918:03 <sipa> it's really a tristate: undecided, included (definitely in the candidate set), excluded (definitely not in the candidate set)
23018:03 <monlovesmango> one of the sets that you add back to the queue has und - descendants(split), so I don't know how you would figure out inc from that queue set
23118:03 <sipa> excluded is just implicitly everything not included and not undecided
23218:04 <monlovesmango> tristate is a good way of phrasing it
23318:05 <marcofleon> could you implicitly have something like included = m_todo - undecided - excluded?
23418:07 <monlovesmango> would that be better? I'm not sure
23518:08 <marcofleon> No probably not, that's just what prompted the question for me. Better to keep track of included explicitly
23618:09 <sliv3r__> I think keeping inc explicitly is needed bc tx have dependencies between them, not explicitly knowing what is inc may cause you try to add a tx without some ancestor in the inc
23718:11 <marcofleon> Right that makes sense
23818:11 <marcofleon> still learning a lot about the clusterlin stuff, it's not easy
23918:12 <sliv3r__> no it's not, this was first time I read about it and also about fuzz, too many new things in one pr :)
24018:12 <marcofleon> sliv3r__ i saw your question on the PR and I believe smallest set breaks the tie
24118:12 <marcofleon> And that's good to know actually, I'll keep that mind. I did realize this review club was trying to do two things at once, which didn't help
24218:13 <sliv3r__> marcofleon: if that's the case would it make sense to assert the size?
24318:13 <monlovesmango> this is my second fuzz pr review and still learning... haha
24418:13 <sliv3r__> marcofleon: nono it was great :) double knowledge
24518:14 <monlovesmango> sliv3r__: yes agree
24618:15 <marcofleon> fuzzing is awesome, it's a great tool
24718:16 <sliv3r__> Last question answer was: just release the code and let users complain and say that Core introduces bugs right (as it's the new tred saying that)? :P
24818:16 <marcofleon> sliv3r__ I at least know in cluster_linearize.h FindCandidateSet says smallest set is the tiebreaker
24918:17 <marcofleon> oh yeah haha q10 What is, no doubt, the coolest way to test code?
25018:18 <sliv3r__> marcofleon: Yeah I saw the comment in the code. Just added a line in my prev comment so sipa can see it and say
25118:19 <marcofleon> glad you two were able to fuzz it a bit. But yeah monlovesmango I recommend doing the indivdual targets like how we said earlier. The test runner is fine for running through a corpus of inputs once
25218:19 <monlovesmango> yeah I agree its much better. I learned a lot more about fuzz best practices
25318:20 <monlovesmango> thank you all for walking me through that :)
25418:20 <sliv3r__> me too this was actually really usefull
25518:20 <marcofleon> Glad to hear it. Okay that's it for me. Thanks again for coming, see you both soon!
25618:20 <sliv3r__> Thanks for hosting!
25718:21 <monlovesmango> thanks for hosting marcofleon !! was a good one
25818:26 <sliv3r__> marcofleon: I think you forgot to end the meeting ;)
25918:27 <marcofleon> #endmeeting