The PR branch HEAD was 4ac1add at the time of this review club meeting.
Notes
Wallets maintain their balance as a pool of Unspent Transaction Outputs
(UTXOs). An
OutputGroup
collects all UTXOs that were sent to the same public key.
Coin selection refers to the process of picking UTXOs from the walletâs UTXO
pool to fund a transaction.
Beyond the primary goal of funding a transaction, the secondary goals of coin
selection are:
Minimizing short term and long term fees:
We have to balance two needs: we want to pay the smallest fee now to get a
timely confirmation, but we have to keep in mind that all of the walletâs
UTXOs will need to be spent at some point. We shouldnât over-optimize
locally at the detriment of large future costs. E.g. always using
largest-first selection minimizes the current cost, but grinds the
walletâs UTXO pool to dust.
Maintaining financial privacy:
There are a number of heuristics that tracking companies employ to cluster
payments and addresses. For example, using inputs with the same output
script in two transactions indicates that the two transactions involved
the same party. Similarly, it is usually assumed that all inputs of a
transaction were controlled by the same entity. Sometimes the privacy and
economic considerations are opposed. If you want to understand privacy
considerations better, check out the Privacy
article on the Bitcoin wiki.
Reliably move the payment(s) to finalization:
Using unconfirmed inputs can make transactions unreliable. Unconfirmed
transactions received from another wallet may time out or be replaced,
making those funds disappear. Even using self-sent unconfirmed funds may
delay the new transaction if the parent transaction has an extensive
ancestry, is extraordinarily large, or used a lower feerate than targeted
for the child transaction.
The process of creating a transaction in Bitcoin Coreâs wallet roughly
follows these steps:
CreateTransaction() builds the header and recipient outputs, i.e. the
parts of the transaction that will remain fixed throughout the coin
selection process.
The long_term_feerate, effective_feerate, and discard_feerate are
established using GetMinimumFeeRate(), which falls back either to fee
estimation or user input. The feerates remain constant for the remaining
process. The amount to be raised from the inputs, nTargetValue is
calculated from the sum of recipient amounts, the target feerate, and the
size of the fixed transaction parts.
The walletâs UTXO pool is retrieved via AvailableCoins().
Using the above fixed values, SelectCoins() is used to pick an input
set to fund the transaction. If the user manually selected some UTXOs by
means of the CoinControl feature, these UTXOs are used first. The
remaining UTXOs are grouped into OutputGroups. The grouped UTXOs are
then used in one or multiple rounds of SelectCoinsMinConf(), using
increasingly permissive CoinEligibilityFilters to preselect
OutputGroups until an input set is found.
Determine whether to create a change output, finish assembling the
transaction, and sign the transaction.
When a UTXO is selected as an input, the size of the transaction is increased,
which in turn increases the necessary fees to maintain the targeted feerate.
We can preempt the cost added by the input by considering each UTXO at its
effective value, its value reduced by the UTXOâs input cost at the given
feerate.
Bitcoin Core currently uses two different solvers:
The knapsack solver sorts the available UTXOs by effective value, and
then combines candidate input sets by randomly selecting or skipping
inputs until it has sufficient funds for the transaction. When it finds a
viable solution, it keeps the solution if it produces a smaller change
than the prior stored solution. Then, the algorithm deselects the last
input and continues traversing the UTXO list as described, trying to
converge on the smallest change output above MIN_CHANGE. This random
process is repeated up to 1,000 times.
The Branch and Bound solver (BnB) deterministically searches the
complete combination space to find an input set that will avoid the
creation of a change output. It performs a depth-first search on a binary
tree where each branch represents the inclusion or exclusion of a UTXO,
exploring inclusion branches first, and backtracking whenever a subtree
cannot yield a solution. It returns the first discovered input set that
is an exact match for the funding requirement of the transaction. The
qualifier âexact matchâ here refers here to an input set that overshoots
the nTargetValue by less than the cost of a change output. The BnB
algorithm is not guaranteed to find a solution even when there are
sufficient funds since an exact match may not exist.
Using effective values in coin selection means that we can freely add and
remove inputs until we have have raised enough funds to pay for the
transactionâs fixed costs: the payment amounts, the transaction header and the
output data. Whatever final input set we settle for, the increased transaction
size and input costs have already been accounted for.
PR #17331 implements the use
of effective values across all different coin selection strategies. Prior,
it was only used for the BnB selection where it facilitated the search to be
effectively performed. Introducing effective values into the knapsack solver
means it no longer works on a moving target, and thus only needs to be run
once. It also allows some of the pre-selection steps to be shared between BnB
and knapsack.
This PR also changes the behavior of SelectCoinsMinConf() to try BnB first
and then attempt knapsack only in the case that BnB did not find an input set.
Before this PR, the function would only run one or the other strategy
controlled by the boolean flag use_bnb.
Questions
How does the effective valuediffer
from the nValue of a UTXO?
<glozow> murch: because, before, after you ran knapsack with nValues instead of effective values, you could be off by a little bit since you didn't take the inputs into account
<murch> dkf: Right, what we are all getting at here is that by calculating the effective value of UTXOs, we already account for the cost of the inputs when selectign them.
<sipa> it's making coin selection not a circular reasoning: because adding an extra input changes how much fee you have to pay, possbily necessitating trying coin selection again
<marqusat> Effective fee of static vsize overhead + outputs vsize. Itâs needed to calculate selection_target, doesnât need to be passed separately alongside nValue to SelectCoinsBnB.
<lightlike> but isn't the knapsack algorithm non-deterministic with some randomness involved? In that case, couldn't running it multiple times still get a better solution, regardless of effective values used or not?
<dkf> the fee for all extra block data that is not user-provided payload? I am not sure if this amount changes though due to unfamiliarity with this part
<murch> raj__: It could also be five payments in a single transaction, an OP_RETURN, or custom script. Basically all the things that are the intended product of the transaction
<sipa> raj__: at the time coin selection is invoked we already have what you could call an "incomplete" transaction; it can have multiple inputs and outputs already (which must be included); the goal of coin selection is adding (a) additional inputs from the wallet and (b) optionally adding change so that the fee is as intended
<lightlike> doesn't that mean that we should run knapsack multiple times again because we have a moving target again? (could have to substract less from the outputs if we had a better solution)
<murch> The way this was implemented (now amended) in our snapshot we're looking, what subtle issue gets introduced here when we evaluate the effective value as being the whole value?
<murch> glozow: Great question, I hadn't actually thought about it. But I think that is resolved by the coin control inputs getting used first in either selection procedure and their fees getting accounted for then.
<sipa> right; they are still inputs that can be selected; as long as they're treated like other ones in terms of accounting, they shouldn't be counted again throigh no_input_fees
<murch> But essentially what lightlike and glozow both are getting here at is that we no longer filter UTXOs that are uneconomic and that the fees move back to the end of the transaction building
<murch> Even when some wallets e.g. copy the output type of the recipient output(s), when we are starting the coni selection, we know what we are going for as an output typeâif we need one in the first place
<murch> an estimate of the maximum feerate that we will need to pay in the long term to use a UTXO. A reasonable upper bound on what we might need to pay for a low time preference transaction in the long term.
<dkf> is the assumption that the long term fee rate always increases compared to the effective rate? is there no way it could be cheaper than expected?
<lightlike> do we calculate this only to determine whether we should create a change output or not bother and add it to the fees? or are there additional reasons?
<sipa> right, they"re not really upper bounds or lower bounds; they're both jusg estimates - but one is conservatively high, the other is conservatively low
<murch> Actually, true story, I kinda broke that in 2014 with a tiny patch to coin selection that later got reverted when people found that it caused the UTXO set to bloat :p
<lightlike> I don't see why there is a need to keep legacy behavior here. I mean, it's either better or worse to keep it, but why should we care how it used to be?
<Murch> jnewbery: Yes, I expect that Knapsack will go away altogether, but review in Wallet has been pretty slow, so there are efforts that have been waiting for literally years in that regard
<glozow> (2) When we are in the range of an âexact match,â i.e. the difference between the selected coinsâ total effective value and the target value is lower than the cost of a change output
<larryruane_> question, if there's time, are there functional or unit tests that run through these various code paths? i like to watch code in action using a debugger
<lightlike> so that's also to keep the utxo set small? As opposed to creating the change output and just not spending it / hoping for extremely low feerate times?