Refactor BnB tests (tests)

https://github.com/bitcoin/bitcoin/pull/29532

Host: murchandamus  -  PR author: murchandamus

Notes

Overview of Bitcoin Core’s Coin Selection

Coin selection refers to the process of picking inputs to fund transactions. Bitcoin Core uses currently four different algorithms that each produce candidate input sets, and then picks one of those input sets to fund the transaction. Here is a rough sketch of how the algorithms work:

  • Knapsack” Do 1000 random walks selecting and deselecting available coins, remember the input set that overshot the target the least. Knapsack will usually create a change output, but can create a changeless transaction when the overshoot is exactly 0, or all funds are used.
  • Single Random Draw (SRD) Randomly pick UTXOs from the available coins until enough funds are selected to build a transaction with a change output of a reasonable amount.
  • Branch’n’Bound (BnB) Deterministically search all combinations of the available coins to find the input set that is the least wasteful that does not create a change output.
  • CoinGrinder Deterministically search all combinations of the available coins to find the input set that has the lowest weight that does create a change output.
Algorithm Deterministic Creates Change Outputs Notes on Special Cases
Knapsack No Usually Selects all available UTXOs if target is exceeded, but target + minChange is not reached.
Single Random Draw No Always Will only return a selection if target + minChange is exceeded.
Branch and Bound Yes Never Disabled when SFFO is active. Only returns a selection if a combination produces a changeless transaction (i.e. one combination sums to a value between target and target + cost_of_change.
CoinGrinder Yes Always Disabled below 30 ṩ/vB. Will only return a selection if target + minChange is exceeded.

This pull request is focused on the tests for the Branch and Bound algorithm.

Motivation

The author of this PR noticed that many coin selection tests use coin selection parameters that generally do not appear in production use. The unusual parameters could lead to unexpected coin selection behavior.

The quirky outcomes would cause confusion when writing or reviewing new tests because coin selection results could be heavier or lighter (in transaction weight) than expected, selection could be unstable when it was expected to be deterministic, waste scores could diverge from the obvious values, or changeful solutions could unexpectedly be preferred to changeless solutions.

For example, when the cost_of_change parameter was set to 0, any selection result would be interpreted as resulting in a changeless transaction, i.e., when cost_of_change was 0 and Single Random Draw picked a 1₿-input for a 0.1₿ selection target, the result would be evaluated as if dropping 0.9₿ to fees instead of creating a ~0.9₿ change output.

Additionally, most coin selection tests use a feerate of 0 ṩ/vB, which results in the amount of UTXOs being equal to their effective value (amount - input fee). This makes it easy to predict the change amount from the input amounts, but e.g., means that the fees for a heavy or light transaction are the same (namely 0).

The described issues were partially fixed by removing the double-meaning of cost_of_change. This pull request is the start of an effort to rewrite the coin selection tests to use non-zero feerates, and create UTXO per their effective values, so that coin selection may happen at non-zero feerates and more closely emulate in production use.

How to review the changes

It is recommended to review commit by commit. The first commit adds the new test suite coinselection_tests.cpp including default coin selection parameters, and helper functions for creating tests. Each commit after that transplants one set of tests for the BnB coin selection algorithm from the old coin selection test suite coinselector_tests.cpp to the new suite. The new tests should provide at least the same coverage as the old tests; please keep an eye on whether the test coverage may have been reduced.

If you feel adventurous, you might look into generating a test coverage report before the first commit (before any tests are moved), and after the last commit to verify that the test coverage was not adversely affected. You can find instructions in the developer notes under Compiling for test coverage. My understanding is that you will want to follow the “Using LLVM/Clang toolchain” instructions.

Advantages and Disadvantages

It was considered to move the test and then refactor them, but one confusing aspect of the old test suite was the high number of overloaded helper functions. It was thus decided that building up the new test suite step by step would be easier to follow and might give opportunities for additional clean-up.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. What are the four coin selection algorithms used by Bitcoin Core? How do they differ?

  3. What is the effective_value of a UTXO? How is it calculated?

  4. Bonus question: What criterion is used to pick from among the algorithms’ selection results?

  5. Why might have so many of the original tests used a feerate of 0 ṩ/vB? How does the AddCoins(…) function in the new test suite address the same concern?

  6. What are the advantages and disadvantages of using a helper function like TestBnBSuccess(…)?

  7. How does test: Move BnB feerate sensitivity tests extend the coverage over the original tests?

  8. What does the “clone skipping test” test? How does the expected_result in the original test line up with what you would pick manually at low and at high feerates?

  9. What is the motivation for the “Overshoot upper bound” to use different values than the original?

  10. What characteristic makes a set of UTXOs “expensive to search” for BnB?

Meeting Log

  117:00 <Murch[m]> #startmeeting
  217:00 <Murch[m]> Hello everyone, welcome to the bitcoin core PR review club.
  317:00 <sipa> hi
  417:00 <Murch[m]> Thanks for joining us
  517:00 <glozow> hi
  617:00 <monlovesmango> hi
  717:00 <kevkevin_> hi
  817:00 <Murch[m]> Today we are looking at #29532, which is one of my PRs
  917:01 <Murch[m]> Is there anyone here for the first time? Feel free to jump in regardless!
 1017:01 <Murch[m]> Who got the chance to review the PR or read the notes? (y/n)
 1117:01 <monlovesmango> y
 1217:02 <Murch[m]> If you reviewed the PR, what was your conclusion?
 1317:02 <stringintech> y
 1417:02 <monlovesmango> looks good to me
 1517:02 <kevkevin_> n
 1617:02 <Murch[m]> How did you approach the review?
 1717:03 <Murch[m]> Alright, let’s get to the meat of it
 1817:03 <Murch[m]> What are the four coin selection algorithms used by Bitcoin Core? In a sentence, how do they differ?
 1917:03 <monlovesmango> read some of the background, then reviewed commit by commit as suggested. was pretty easy to follow that way. I also rebased onto master
 2017:04 <Murch[m]> monlovesmango: Cool, glad to hear that
 2117:04 <glozow> monlovesmango: what is your approach to reviewing refactoring commits?
 2217:06 <Santos> ACK. Reviewed commit by commit and reviewed the coin control logic/BNB algorithm
 2317:06 <monlovesmango> well since this was well structured I was first reading through the tests that were removed, and then looked for the same functionality on the tests that were added
 2417:07 <Murch[m]> So regarding the algorithms, BnB is the only algorithm that will only return changeless input sets (except Knapsack which does it only when the excess is exactly 0).
 2517:08 <Murch[m]> This means that a few of the scenarios that BnB should be tested against are unique to BnB.
 2617:09 <monlovesmango> knapsack does 1000 random walks while remembering its best selection (overshoots target the least), selection must be between the target and target+minChange
 2717:09 <Murch[m]> monlovesmango: Yeah, that’s right
 2817:09 <monlovesmango> single random draw will randomly pick coins until target+minChange is exceede
 2917:10 <monlovesmango> d
 3017:10 <Murch[m]> correct
 3117:11 <monlovesmango> coingrinder seems to be similar to BnB, except it finds the input set with the lowest weight that exceeds target+minChange (also disabled below 30s/vB)
 3217:12 <monlovesmango> and BnB looks to find the least wasteful input set that doesn't have a change output
 3317:12 <Murch[m]> monlovesmango: Exactly, they are both branch and bound algorithms that deterministically iterate through all possible relevant combinations
 3417:13 <monlovesmango> I did have a question, what is SFFO?
 3517:13 <Murch[m]> 4 of 4, monlovesmango
 3617:13 <Murch[m]> SFFO stands for subtract_fee_from_output
 3717:13 <Murch[m]> It’s a way to pawn the transaction fee off on the recipient
 3817:13 <monlovesmango> ok thank you!
 3917:13 <Murch[m]> It’s the source of most bugs in coin selection :p
 4017:14 <Murch[m]> Let’s move to the next question: One important concept in the context is the "effective_value" of a UTXO. What is that? How is it calculated?
 4117:14 <monlovesmango> haha i can believe that
 4217:14 <Murch[m]> monlovesmango: The problem is that it turns a few general ideas on its head and it is often forgotten in testing when new functionality is added
 4317:14 <dzxzg38> https://github.com/bitcoin/bitcoin/blob/cdc32994feadf3f15df3cfac5baae36b4b011462/src/wallet/coinselection.h#L30-L31
 4417:14 <dzxzg38>  "The output's value minus fees required to spend it and bump its unconfirmed ancestors to the target feerate. "
 4517:15 <Murch[m]> dzxzg38: Yeah, that’s right
 4617:15 <Santos> effective_value = txout.nValue - fee
 4717:15 <stringintech> it is the actual spendable amount (after excluding required fee to spend it)
 4817:15 <monlovesmango> I believe it is the value of the input prev out minuse fees required to spend
 4917:15 <Santos> Effective Value = UTXO's Nominal Value − Fee to Spend the UTXO
 5017:15 <Murch[m]> Santos, stringintech, monlovesmango : also right
 5117:16 <Murch[m]> So, the effective_value is useful, because it gives us the amount that an input contributes toward the selection target after paying for itself
 5217:17 <Murch[m]> Before we introduced the concept, the selection target would shift with every input that got added, because the cost of the input needed to be added to the target
 5317:17 <Murch[m]> It made the problem multivariable and much harder to solve
 5417:17 <Murch[m]> Now the bonus question: if all the algorithms provide separate candidate input sets, how does the transaction building process pick among the candidates the one to use to build the transaction?
 5517:18 <Santos> It picks the one with the lowest "waste".
 5617:18 <Murch[m]> Aye, we use the waste score to pick the "best" one.
 5717:19 <Murch[m]> For a rather primitive fitness function, it works kinda okay
 5817:19 <Murch[m]> One day, we shoudl have a function that considers more things, like privacy, the wallet’s utxo composition and the wallet’s usage pattern, but that are dreams for the future
 5917:19 <Murch[m]> Why might have so many of the original tests used a feerate of 0 ṩ/vB? How does the AddCoins(…) function in the new test suite address the same concern?
 6017:21 <monlovesmango> I think it was probably bc of the problem you just mentioned, if there was a fee the selection target would shift with every input that gets added
 6117:21 <Murch[m]> monlovesmango: You are thinking in the right direction
 6217:21 <Murch[m]> https://github.com/bitcoin-core-review-club/bitcoin/commit/9773192b833fe0d0e071b0a75f72aab82cb124ef#diff-36088c93368e137d955348aba223985bd4f198f2aaecd626c830f4612ca884c8R56-R62
 6317:22 <monlovesmango> AddCoins function seems to account for this by assigning the value to be the amount+fees
 6417:22 <Murch[m]> What is the "coins" parameter in the AddCoins function?
 6517:22 <Murch[m]> bingo
 6617:23 <Murch[m]> In the old tests, the AddCoins function generated a UTXO with the amount that was provided. In the new tests, the AddCoins function adds the input’s fees, so that the provided value becomes the UTXO’s effective_value
 6717:23 <stringintech> zero feerate was to make things easier by considering "effective value = actual value" I guess
 6817:23 <stringintech> and AddCoins is supposed to do the effective value calculation for us base on configured feerates
 6917:23 <Murch[m]> stringintech: Right on the money
 7017:24 <Murch[m]> Unprepared question: why is it a problem to use a feerate of 0?
 7117:24 <Santos> It seems to me that the zero feerate was used to avoid change complexity in those tests
 7217:25 <Murch[m]> Santos: What do you mean with "change complexity"?
 7317:26 <Santos> "why is it a problem to use a feerate of 0?" The tests should cover all cases (preferably close to real scenarios) - not only the zero feerate case
 7417:27 <Murch[m]> Santos: yeah, absolutely
 7517:27 <Murch[m]> Additionally, there are a few quirks that come out at feerates of 0, because it means that inputs and outputs cost 0
 7617:27 <stringintech> not sure if this is the answer but in some scenarios we need to test the algorithm sensitivity (like bnb) to different feerates. and we have to change the feerate
 7717:27 <Santos> What do you mean with "change complexity"? Without fees, change calculations are less complicated, allowing tests to verify coin selection predictably
 7817:28 <monlovesmango> 0 fee rate is unrealistic. also the behavior of BnB search is partially dependent on fee values so its probably better to have fees as part of the tests..?
 7917:28 <Murch[m]> Santos: Ah, I see what you mean. Yeah, by avoiding fees, you can easily calculate the change from the input amounts and recipient amounts
 8017:28 <Murch[m]> stringintech: that’s part of it! Although the old tests did have some where a feerate was set
 8117:29 <Murch[m]> But generally, transactions with feerates below 1 sat/vB are non-standard (with the new exception of parent TRUC transaction now)
 8217:29 <monlovesmango> Murch: do you plan to allow any tests to have 0 fee rate?
 8317:29 <Murch[m]> monlovesmango: yeah, that’s right
 8417:29 <Murch[m]> monlovesmango: Yes, we should have a couple that test this edge case
 8517:30 <Murch[m]> Because transactions at 0-fee are permitted, just non-standard
 8617:30 <monlovesmango> got it
 8717:30 <monlovesmango> thanks!
 8817:30 <Murch[m]> now that you say that, this may be a regression in the test, because I’m not sure I have any for BnB right now
 8917:30 <Murch[m]> So this edge case was tested before, but might not be at the moment
 9017:31 <Murch[m]> monlovesmango: You could leave a comment on the PR to that effect
 9117:31 <Murch[m]> Whhat are the advantages and disadvantages of using a helper function like TestBnBSuccess(…)?
 9217:31 <monlovesmango> yeah will do!
 9317:31 <Murch[m]> https://github.com/bitcoin-core-review-club/bitcoin/commit/66200b3ffa21605fc3234ccbda7b424381f3319a#diff-36088c93368e137d955348aba223985bd4f198f2aaecd626c830f4612ca884c8R94-R108
 9417:32 <Murch[m]> Thanks :)
 9517:32 <monlovesmango> It is a lot cleaner with less boilerplate, but every scenario needs to be completely/explicitly spec'ed out
 9617:32 <Santos> Code reuse (common test code is centralized) and consistency (all tests for BnB share the same validation logic)
 9717:32 <dzxzg38> The advantage is reuse, and one disadvantage is that your test failure happens far away from the interesting and invidual details of the test case that fails
 9817:33 <dzxzg38> (mitigated in this case by logging)
 9917:33 <monlovesmango> yeah consistency is also much improved
10017:33 <Murch[m]> monlovesmango: Could you elaborate on "every scenario needs to be completely/explicitly spec'ed out"?
10117:34 <Murch[m]> dzxzg38: Yeah, that was a big one on my list too: When you have a function that runs the checks, any failure will show up with the line number in the helper function! So you need additional information to figure out which test actually failed
10217:35 <dzxzg38> I understood monlovesmango to mean generally that every test case needs to have at least as many "parameters" as the most complicated test case, even if you don't really care about some of the parameters in some of the cases
10317:36 <Murch[m]> dzxzg38: Aha, the helper function might get pretty loaded with parameters
10417:36 <monlovesmango> some of the simplier tests didn't necesarily need some of the additional parameters in TestBnBSuccess, but now they also need to account for these parameters
10517:36 <Murch[m]> Although if we pick the order carefully, we can use the defaults often and don’t have to state most explicitly
10617:36 <monlovesmango> dzxzg38: yes you explained it better than me :)
10717:36 <Murch[m]> monlovesmango: Right, gotcha
10817:37 <Murch[m]> monlovesmango: My thinking in that regard was that the boiler plate always writes out the entire test, so it’s often hard to see between two tests what the unique changes are.
10917:37 <Murch[m]> By making each test a single line and having defaults for most things, you can very easily see which values diverge from the defaults and it helps you get a sense what the test is about
11017:37 <monlovesmango> yes I totally agree with your approach
11117:38 <monlovesmango> its much easier to comprehend
11217:38 <Santos> Yes, I agree too
11317:38 <monlovesmango> its just the only disadvantage I could think of
11417:38 <Murch[m]> I do agree that we might need to explicitly define more parameters for some tests, though, e.g., when the custom_spending_vsize is changed, we of course have to give all the prior as well
11517:38 <Murch[m]> Cool, let’s move on
11617:39 <Murch[m]> How does test: Move BnB feerate sensitivity tests extend the coverage over the original tests?
11717:39 <Murch[m]> https://github.com/bitcoin-core-review-club/bitcoin/commit/afd4b807ff1300e4f74ceab6a683f3ff1376369d
11817:39 <monlovesmango> it add test to cover heavy txs, in low fee environment heavy txs should be preferred, and in high fee env lighter txs should be preferred
11917:40 <Murch[m]> monlovesmango: Right!
12017:40 <Murch[m]> In the old tests, more weight was only a function of the count of inputs, but in the new tests, we additionally vary the weight via the "output type"
12117:41 <Murch[m]> What does the “clone skipping test” test?
12217:41 <Murch[m]> https://github.com/bitcoin-core-review-club/bitcoin/commit/9d7db26b7b556784c16e41572ba2d2edc6dd6c24#diff-36088c93368e137d955348aba223985bd4f198f2aaecd626c830f4612ca884c8R132-R136
12317:41 <monlovesmango> Murch: can you explain the "Originally these tests relied on SFFO to work" comment in the git message? since BnB is disabled with SFFO?
12417:42 <Murch[m]> Ah yeah. We had a couple bugs with SFFO and BnB
12517:42 <Murch[m]> When you make the recipient pay the fee, what happens when you drop the excess to fees?
12617:42 <dzxzg38> It tests a performance optimization in CoinGrinder (https://github.com/bitcoin/bitcoin/commit/451be19dc10381122f705bbb2127b083f1d598c6) where equivalent input sets are skipped
12717:43 <Murch[m]> It reduces the fees that get deducted from the recipient’s output (i.e., the recipient gets more)
12817:43 <dzxzg38> It does by creating an input set with 50,000 5 cent outputs, a 2 cent output, and two 7 cent output
12917:44 <Murch[m]> monlovesmango: It could happen that we tried to drop more to fees than the recipient was supposed to pay, and then they’d get more money than we tried to send in th efirst place
13017:44 <Murch[m]> it was weird
13117:44 <dzxzg38> Clone skipping lets us quickly get to the answer of using a 2 cent and two 7 cents to get to 16, but without lcone skipping we would search through the solutions including every "5 cent coin"
13217:44 <monlovesmango> interesting... will need to reread and ponder
13317:44 <Murch[m]> so we removed the SFFO option from BnB and that caused issues with the tests that used SFFO
13417:45 <glozow> ah that's neat
13517:45 <Murch[m]> I actually don’t remember exactly how the test was constructed to need SFFO to work, but yeah
13617:45 <Murch[m]> dzxzg38: We are looking at the BnB tests here, though!
13717:45 <monlovesmango> ok that order of events helps
13817:45 <Murch[m]> dzxzg38: Right otherwise, though
13917:46 <Murch[m]> How does the expected_result in the original test line up with what you would pick manually at low and at high feerates?
14017:47 <monlovesmango> at high fee rates I think it would match what I would manually pick, but at low fee rates I think it would be preferable to use more utxos
14117:48 <Murch[m]> In the old test we had 4 × 7 CENT + 1 × 2 CENT + 50'000 × 5 CENT. We were looking for 4×7+2 = 30.
14217:48 <Murch[m]> monlovesmango: Right, what would that be?
14317:49 <monlovesmango> also, i know this is probably irrelavent to the way the coin selection actually execute in BnB, but was thinking maybe 1 cent and two 7 cents should be used to get 15 instead? so that the 5 cent coins still factor to 15?
14417:50 <monlovesmango> so in the old test I would prefer 6x5 = 30 for coin selection in low fee environment
14517:50 <Murch[m]> In the new test, we have 2 × 7 CENT + 1 × 2 CENT + 50'000 × 5 CENT
14617:50 <Murch[m]> and we are looking for 16 CENT
14717:51 <Murch[m]> monlovesmango: Yeah, exactly!
14817:51 <Murch[m]> So, I reimplemented BnB in the style of CoinGrinder in #32150
14917:51 <monlovesmango> in the new test (for low fee env) i would prob prefer 3x5 + 1x2
15017:52 <Murch[m]> And after I added the CloneSkipping it found the 6 × 5 CENT solution for low feerates
15117:52 <Murch[m]> That’s why I changed it to 16 δρ
15217:52 <Murch[m]> :)
15317:52 <Murch[m]> monlovesmango: 17 CENT is too much, though, that’s not a changeless solution
15417:53 <Murch[m]> It’s a fine selection if you want to create change, though
15517:53 <monlovesmango> ahh true
15617:53 <Murch[m]> Almost done, so let’s keep moving: What is the motivation for the “Overshoot upper bound” to use different values than the original?
15717:53 <Murch[m]> https://github.com/bitcoin-core-review-club/bitcoin/commit/65521465da036616172f4fbeef2855b8ddefd75f#diff-36088c93368e137d955348aba223985bd4f198f2aaecd626c830f4612ca884c8R141-R142
15817:53 <monlovesmango> "And after I added the CloneSkipping it found the 6 × 5 CENT solution for low feerates" - what was the solution for high feerates?
15917:54 <monlovesmango> I think the original was using 0 fee
16017:54 <Murch[m]> monlovesmango: It should still be the 4×7+2, since that needs to pay for five inputs, whereas 6×5 pays for six inputs
16117:54 <Murch[m]> monlovesmango: Yeah, I adapted the test to use effective values and ran it at high and low
16217:55 <monlovesmango> ok maybe i'll try adding some tests to test it out myself :)
16317:55 <Murch[m]> Ah, I remember now
16417:55 <Murch[m]> I first just reimplemented BnB and then it could iterate through more combinations within the 100'000 cutoff
16517:56 <Murch[m]> and with the clone skipping optimization it found the 6×5 at feerate 0, because it prefers more inputs at low feerates
16617:56 <Murch[m]> I later rebased the BnB rewrite on this Test refactor, though
16717:56 <Murch[m]> And I picked the 16 because it was not ambiguous
16817:57 <Murch[m]> i.e. it only has the 2 × 2×7 solution, whereas the 30 had the two solutions
16917:57 <monlovesmango> also, the way that it tested now, it is very explicit that it cannot overshoot by more than cost of change
17017:58 <Murch[m]> Any thoughts on the Overshoot upper bound test?
17117:58 <Murch[m]> Since the question is orthogonal, let me also put out the last question: What characteristic makes a set of UTXOs “expensive to search” for BnB?
17217:58 <Murch[m]> monlovesmango: Right: previously it just tested a single value that overshot the target window.
17317:58 <monlovesmango> I actually had assumed the ambiguity of the old test was intentional, which was the only reason I brought it up for the new test
17417:58 <Murch[m]> The new test tests the last value that is permitted and tests the first value that overshoots: it tests exactly the boundaries
17517:59 <monlovesmango> Murch: yep it is much more concrete now
17617:59 <Murch[m]> monlovesmango: I had first changed the test and then only noticed it when I rewrote BnB ^^
17718:00 <monlovesmango> haha.. its a process
17818:00 <stringintech> I had a question regarding the coverage reports
17918:00 <Murch[m]> Okay, I’ll take the last question myself:
18018:00 <Murch[m]> BnB has a hard time iterating through UTXO pools that have a lot of similar values. I.e., just like the "doppelganger pool" where there are a bunch of UTXOs that only differ by one sat.
18118:00 <Murch[m]> stringintech: Go ahead!
18218:00 <Santos> A set of UTXOs is “expensive to search” for BnB when it has: many coins with nearly identical values, a tight target value relative to the sum of available coins or weight characteristics that force frequent backtracking due to transaction size limits.
18318:01 <stringintech> https://limewire.com/d/6TqGD#2VHPC3MSXx
18418:01 <stringintech> you can see it here; I should be comparing the coinselection.cpp coverage in the two reports right? (above for the PR branch and below the master)
18518:01 <Murch[m]> Santos: Very good, that’s better than my answre :)
18618:02 <Murch[m]> #endmeeting