The PR branch HEAD was 31895fb at the time of this review club meeting.
A bitcoin transaction will often have a payment output and a change output. In order to preserve
transaction privacy and avoid leaking information about a user’s wallet and funds, we want to keep
the payment address and payment amount as private as possible. In other words, we don’t want to leak
information which allows an outside observer to guess which of the two outputs is the payment vs the
One technique used for determining the payment address and amount is the “Payment to different
script type” heuristic.
This allows an outside observer to guess the payment address and amount with reasonable accuracy for
certain types of bitcoin transactions.
PR #23789 added payment address matching when
generating a change address as a means of breaking the heuristic. This logic can lead to the
wallet having UTXOs of different address types (e.g bech32m, bech32, P2SH, legacy). Depending on how
these UTXOs are spent in the future, they might still leak information about which is the
change/payment address in the original transaction.
PR #24584 adds logic to avoid mixing different
address types when selecting UTXOs to fund a transaction.
<lightlike> why does one, in the example of the PR, infer that an output that is being mixed later is likely the change of an earlier tx? If it was the payment instead, couldn't that later be be mixed with other outputs as well?
<Murch> If both outputs match type, but one of them is later mixed with more modern UTXOs on a transaction, we can assume that the other output was the one that picked the less modern format—and thus was the receiver.
<josibake> lightlike: great question. if i see a tx with all bech32 inputs and two p2sh outputs, and then in the next tx i see that p2sh output mixed with bech32 inputs to fund the second tx, it is very likely that the p2sh output being mixed was the change from the first, assuming that the wallet is picking a change address to match the payment address (which core does)
<furszy> aside from the extra fee costs, wouldn't be more confusing for a chain analysis company if the software would be randomly changing output formats? instead of be always uniformly using the newest one or using the same provided by the receiver.
<josibake> theStack: P2TR was one of the motivations for this PR! with P2TR adoption, i expect the pay to different script type heuristic to match even more txs as user transition from legacy, p2sh, bech32 to using bech32m
<vnprc> furszy: consider this scenario: a user spends down most of their funds leaving only old address types. They find themselves unable to spend funds that require a newer address type even though their wallet software tells them they have enough BTC. The user would need to consolidate UTXOs into a newer address type. This user doesn't understand why
<sipa> Part of it is a chicken-and-egg problem. Receiving wallets don't want to upgrade before mostly all sending software/sites supports it. Especially enterprise/custodial sending software/sites usually have their hands full support the latest dog breed variety ape coin, and won't allocate much engineering resources on bitcoin unless receivers demand it.
<sipa> Especially on a system as public as a blockchain, decent privacy really demands that nearly everyone favors the more private solution. If that solution comes at a significant cost, it just won't be used.
<josibake> antonleviathan, furszy: regarding making it configurable, imo bitcoin core wallet should try to be fairly balanced by default. meaning, reasonable efficiency and reasonable privacy. this leaves room for other wallets to specialize in being a "super efficient wallet" or a "super private wallet"
<josibake> sipa: privacy by default is the only way to actually help users be more private. of course, having more options to allow users to opt in to sacrificing efficiency for more privacy is also good
<vnprc> I recall branch and bound seeks to eliminate change outputs by matching input UTXO values to the amount the user wants to spend. I think it does this by setting a threshold and donating the small excess UTXO value in the form of fees. Just going off my memory here.
<svav> SelectCoinsBnB uses a Branch and Bound algorithm to explore a bounded search tree of potential solutions, scoring them with a metric called “waste.” Notably, the Branch and Bound algorithm looks for an exact solution and never produces a change output. As such, it’s possible for SelectCoinsBnB to fail even though the wallet has sufficient
<josibake> this next question is a bit more open ended (no wrong answers) and is similar to the discussion we just had about privacy vs efficiency: are there other things/metrics we could consider during coin selection besides just the waste metric?
<Murch> Yeah, the `waste metric` compares the cost of the inputs currently selected to a hypothetical cost of spending them later at a longterm feerate estimate. It also adds the cost of creating and spending change, or if there is no change, the excess beyond the target that is dropped to the fees to make the changeless transaciton
<Murch> svav: Yep, that's where the waste metric was first introduced, but we've since generalized it to be used as a prioritization tool to pick from multiple input set candidates in transaction building
<josibake> vnprc: thats a good example! so this would be an example of a "privacy metric", perhaps preferring many small inputs and no change vs one giant input with a big change output that says "i have a lot of bitcoin!"
<theStack> vnprc: me too, but also for the reason that smaller UTXOs are more likely to be trapped due to being lower than the "effective dust-limit" in the future (not sure if that term is right, but i'm sometimes wondering if some of my UTXOs are too small to be spent in, let's say 10 years due to permanent exponentially increased fee-rates)
<josibake> svav: this goes back to the txouttype to outputtype mapping: a majority of utxos will fall into p2pkh, p2sh, or bech32. for more complicated script types, rather than have a specific bucket for each (or rather than just use txouttype for the mapping), we are putting them in an others bucket. if we allow mixing, this is no different behavior wise than using one giant vector of all available outputs. hope that
<Murch> theStack: That's a good point. The interesting effect of using the waste metric as described above is that it prefers bigger input sets at low feerates. It also prefers changeless transactions. So if there is an input set that uses small UTXOs and combines to the right value we'll prefer that (unless there is something that scores even better)
<theStack> Murch: seems like a good idea. reaching changeless transactions (if it's not "send-to-myself") are rather rare i guess though in practice? (but maybe i'm think in too small scale, in wallets with a huge number of UTXOs it's probably pretty likely)