The PR branch HEAD was 87de0157 at the time of this review club meeting.
Notes
Today’s PR is a partial fix to a bug report filed last November, #17603
“partial spend avoidance makes partial spends and getbalances doesn’t
notice.” In that issue,
Bitcoin Core contributor dooglus describes two
ways the wallet does not behave as expected when using the avoid_reuse wallet
flag (sidenote: this issue is an excellent example of a good bug report, with
detailed steps to reproduce the issue).
This PR addresses one of the two bugs with a fairly small fix, but there is much
to dig into regarding avoid_reuse, coin selection, and whether this is the
best solution to the problem.
The avoid_reuse wallet flag was introduced in
#13756 by
kallewoof, who subsequently followed it up with
#16239 to mitigate exposure to
dust
attacks. The
avoid_reuse feature achieves this by:
avoiding spending from destinations that the wallet has previously spent from,
and
attempting to sweep larger parts of the outputs to a destination when it
does spend, from a destination that has multiple outputs available.
Today’s PR addresses the latter of the two. The main keyword to look for in the
code is GroupOutputs.
This PR deals with the broad topic of Coin Selection (here is a recommended
reference
work
on the subject by Murch). Coin selection changes seem
to have a hard time getting reviews. Why do you think that is? What are some bad
things that can happen if something goes wrong in coin selection?
What is the problem with coin reuse? How can it be exploited, and what
approach does the avoid_reuse feature employ to solve it? Are you aware of
other ways to mitigate attacks, perhaps implemented by other privacy-focused
wallets?
Can you describe the problem this PR tries to solve in your own words, including
the necessary pre-conditions?
How does this PR currently attempt to resolve the issue?
How does the current approach compare to the previously proposed
approach
by the author? Can you think of other (better?) ways to solve it?
More generally, do you like the way avoid_reuse currently handles this case
of coin selection? Can you think of different solutions to the problem?
<fjahr> We are talking about a PR from I made about 2 months ago. There are not a lot LOC to review but personally I learned a lot about avoid_reuse/avoidparticalspend and how it effects coin selection so I hope you find it interesting. Also I want to pick your brain if this is the best possible solution to the problem :)
<fjahr> Let's start with the typical first questions but just ask any questions at any point and I will try to keep up: Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)
<fjahr> Great, maybe let's move on to my next question: This PR deals with the broad topic of Coin Selection. Coin selection changes seem to have a hard time getting reviews. Why do you think that is? What are some bad things that can happen if something goes wrong in coin selection?
<amiti> its really hard to properly predict the possible eventual effects of small changes to coin selection can have .. since it can compound over time
<jonatack> My impression from talking with achow101 is that few people understand it well, apart from notmurch and I suppose people like instagibbs, kallewoof, achow101, and few have to deal with very large wallets
<jnewbery> pinheadmz: I don't think 'consolidate addresses' is the right way of putting this. The aim is to spend all outputs to the same address in the same tx
<fjahr> Well, next question: What is the problem with coin reuse? How can it be exploited, and what approach does the avoid_reuse feature employ to solve it? Are you aware of other ways to mitigate attacks, perhaps implemented by other privacy-focused wallets?
<jnewbery> pinheadmz: those 10 inputs need to be spent eventually! It's only more expensive if either (i) you happen to create the spend during a high-feerate period (ii) somehow using more inputs prevents BnB from finding a solution that doesn't create change
<ecurrencyhodler> Dust attack: It's when someone sends a small amount of Bitcoin to a known Bitcoin address to track it and see if they can link it to other parts of your Bitcoin holdings.
<notmurch> Regarding dusting: the main issue is that these dust utxo will be picked up by another transaction leaving the wallet creating a link between two addresses. In combination with the "single sender" heuristic
<jkczyz> notmurch: but if it's dust, it presumably doesn't (or didn't) have much value and wasn't yours to begin with. Is the only reason not to eventually spend to avoid UTXO bloat?
<fjahr> on the 10 inputs: in terms of privacy also you have only one chance for the first time you spent from a destination, so ideally you can sweep all of it at once but 10 is the arbitrary cut-off
<ecurrencyhodler> Tying @notmurch's comment in with address reuse: If someone sends dust to an address and it is reused, then the wallet will include it into a spend which would tie together that cluster of addresses.
<platesondeck> could there be a wallet feature that picks a dust utxo and uses it to fill the mining fee for a regular transaction gradually instead of waiting to collect all the dust in one big pile
<notmurch> platesondeck: the link is being created by a transaction referencing the UTXO in its inputs. Hence that's indistinguishable from any other way of spending it ;)
<willcl_ark> fjahr: but if you have more than 10, then using _only_ 10 seems like it has all been for nothing? Might as well choose "up to 10, if <=10 total" or else disregard this policy for this address?
<willcl_ark> e.g.: if you have 12 UTXO at that address, and you use only 10, still another spend will link them, therefore using 10 previously, when you only needed 1, was just wasting tx fees?
<fjahr> How about we discuss this q first which was last in my notes: More generally, do you like the way avoid_reuse/avoidpartialspends currently handles this case of coin selection? Can you think of different solutions to the problem?
<jnewbery> there may be some minor effect from giving BnB less freedom in the way it chooses inputs, but simply using up a UTXO now instead of later does not change the total fee across time that the wallet pays
<willcl_ark> jnewbery: hmmmm. Let me think about this. I agree that the second spend is not so costly, because you already merged those 9 un-needed UTXOs into a single change output, but I will need to "spend them twice" to spend them once, if you see what I mean
<jnewbery> willcl_ark: it's useful to draw out all the inputs/outputs from multiple transactions to get your head around it. If you assume constant feerate (which is a bad assumption), then the only way you can save on tx fees is by avoiding creating change outputs. Any other strategy of spending your UTXOs will result in the same txfees.
<jonatack> "Cases where we have 11+ outputs all pointing to the same destination may result in privacy leaks as they will potentially be deterministically sorted."
<fjahr> So does anyone else have opinions on wills question? I personally have the feeling that this is kind of an edge case and while the could be a better algorithm it might not be worth the effort to implement in this case? Do you agree or disagree?
<willcl_ark> hmmm. seems to me then in response to jnewbery and fjahr's earlier question, that the easiest solution to reason about is to have the policy that "all UTXO at an address always spent together", which could then warn the user if #UTXO > 10, and of course be opted out of... otherwise the user is not sure what they are getting?
<platesondeck> This might be a bad question but is there a way to send small change amounts to your lightning node at the time of transaction? Would that still result in some trail when the channel is closed later on?
<michaelfolkson> platesondeck: You mean increase the size of an existing channel (splicing) or opening a new channel? Currently (before Schnorr) you would still see funds have been sent to a 2-of-2 multisig address and then left that 2-of-2 address
<jnewbery> willcl_ark: exactly. You can think of the fee as already implicitly part of the each UTXO. In fact, BnB uses the implicit amount of UTXOs when selecting them (the value of the UTXO minus the fee required to spend it)
<fjahr> Aside from the more broad discussion on coin selection: Can you describe the problem this PR tries to solve in your own words, including the necessary pre-conditions?
<jnewbery> whenever you create a transaction in your wallet, it results in either one new UTXO (change) or zero new UTXOs (no change) in your wallet. You save fees in the long run if you can make transactions that result in no new UTXOs in your wallet.
<chanho> jnewbery: so the crux of the argument assuming constant fee rate seems to be that you essentially want to minimize the number of change outputs when you consider all the ways of spending the total of the given UTXO set?
<fjahr> the case described in the issue where mod10 is 1 certainly looks the weirdest to the user because it seems likee the wallet flag is not working
<notmurch> sorry for slow reply, willcl_ark: by sweeping in two transactions, the cost increases are having a second output, paying for two transaction headers and having to spend two inputs later instead of one.
<platesondeck> if you already have dust would it be more privacy preserving to collect 1 piece of dust with your next transaction instead of all together?
<nothingmuch> if not, i would have expected limiting the size to have happened afterwards (i.e. first select unbounded group, then only spend up to some limit from it) instead of constructing separate groups ahead of time
<jnewbery> I'm just saying that if the goal is to avoid large txs, then we can go a lot higher than 10 inputs, which would confer these privacy benefits on more users
<jonatack> fjahr: it seems to me that adding more documentation in your PR on these things while working on it and figuring it all out could be a real value-add
<amiti> pinheadmz: `for (auto& it: gmap)` creates an iterator `it` that just points to each element and iterates through. in this case, gmap is a `std::map<CTxDestination, OutputGroup>`, so first and second point to the tx-dest vs output-group respectively
<jnewbery> I think having a limit of 10 also changes the dust attack from "send one dust output to the address" to "send 11 dust outputs to the address", rather than removing it entirely, no?
<willcl_ark> jnewbery: does a dust attck have to send to an "empty" (previously all spent) address? if there are already UTXOs there, then does dust make any difference vs tracking a current UTXO?
<jnewbery> Murch: yes, but that's a much more general topic than for destination groups. I want my coin selection to choose few inputs when fees are high and many inputs when fees are low. That's a more general thing than just destination groups.
<Murch> jnewbery: for avoiding partial spends yeah, I would agree that 10 is kinda low. However, if you already got paid ten times to the same address, it sounds a lot more likely you will receive future payments there again…
<jnewbery> before you all go, I'm thinking of moving the meeting back to 17:00 UTC next week. That'd put it back to the same local time for people in the US, but would make it an hour early in local time for people in Europe, until your DST starts.
<Murch> willcl_ark: A lot of wallets actually don't track address reuse, so the expected outcome in many cases would be that people spend it in two different transactions later. Sending to an address that has money still, makes it more likely that someone is sill tracking it, though.
<nothingmuch> hmmm, apart from identifying which output is payment and which is change, which is arguably already an issue, is there any downside to later spending marked used outputs only together with change already related to them? trying to figure out if fee considerations and privacy considerations can be considered independently
<nothingmuch> yeah it's related to my previous question... i'm very confused/ambivalent about this behaviour so i'm trying to find equivalences to simplify my mental model
<nothingmuch> "arguably already an issue" - i'm assuming here (perhaps incorrectly) that the payment amount is relatively small, so unnecessary input heuristic could identify the change address
<Murch> nothingmuch: e.g. it would reveal more about what sort of wallet you're running which may allow other guesses such as whether it's possible that there was more than one change
<jnewbery> willcl_ark jonatack: There's always a temptation when faced with a difficult design decision to resort to "add a config option". I think that's usually the wrong decision, because most people never use them, and they lead to combinatorial complexity.
<Murch> Right, but let's say you're spending more in the second tx, and it uses multiple inputs. All of them being related to the address that was dusted would be pretty unlikely, so it tells us more about either your wallet composition or your wallt software
<kallewoof> I've taken the liberty of summarizing the previous meeting's talking points. I'll post that, and we can air our opinions on what's been pointed out so far, or just "that about covers it".
<kallewoof> Question 2: "Coin selection changes seem to have a hard time getting reviews. Why do you think that is? What are some bad things that can happen if something goes wrong in coin selection?"
<kallewoof> Summary of responses for 2: willcl_ark notes that "there is no "correct" answer for which tradeoffs are best for any given user situation, does a user want "more private", or "cheaper" for example..."; amiti notes that it's hard to predict what even a small change will result in; jonatack notes that not a lot of people know the code well enough to be confident enough to review it.
<fanquake> > Coin selection changes seem to have a hard time getting reviews. - mainly because there are about 4? active contributors that work on/want to review this.
<kallewoof> Question 3: What is the problem with coin reuse? How can it be exploited, and what approach does the avoid_reuse feature employ to solve it? Are you aware of other ways to mitigate attacks, perhaps implemented by other privacy-focused wallets?
<kallewoof> Summary of responses for 3: "mainly a way to prevent dust attack" (amiti), later explained to be when someone sends a small amount of btc to a known address of yours, in the hopes that your wallet will pick it up when sending money next time, so they can tie multiple bitcoin addresses to you.
<kallewoof> right, yeah, i see what you mean. i simply assumed we all skipped to the latter question, which was a bit odd of me, now that i think about it
<Murch> and in more exotic constructions like anyone_can_pay transactions, if you use a UTXO that shares the address with others, you may be signing away the other utxos aswell
<kallewoof> Continuing: Someone raised the question of why there is a limit on the number of UTXOs at all. What do you think is the reason? Fjahr says 'because it can result in high fees', but that's only partially correct.
<kallewoof> In the same vein: Jonatack asks how the limit (OUTPUT_GROUP_MAX_ENTRIES) was decided; people sort of did find it, but I will answer it for the record: it was arbitrarily decided by me. I figured we would tweak it if it became necessary later.
<Murch> kallewoof: the high fee argument can be sidestepped by making use of the waste metric BnB already uses. A set of 10+ inputs would simply show up as overtly costly in the selection at high fees, but would actually be preferred at low fees
<kallewoof> The primary reason why there is a cap on groups is to avoid the risk of the resulting transaction being so large it breaks consensus. Imagine someone who has 10k tiny outputs all to the same address. If they tried to use this feature with no limits, they would get a single gigantic UTXO which would probably be too large to fit in a block.
<kallewoof> Question 4 (out of order from review notes): "More generally, do you like the way avoid_reuse/avoidpartialspends currently handles this case of coin selection? Can you think of different solutions to the problem?"
<kallewoof> I think personally that the people who end up turning on the avoid reuse feature and the people who get non-adversarial payments repeatedly to the same address are different people, so raising the 10 towards a more realistic number is totally fine. Thoughts?
<kallewoof> I guess my concern right now is, if you have 200 spends to same address and you spend one of the 100 entry groups, the 'avoid reuse' feature will mark the other one as spent, and remove it from future coin selection (by default; you can still use it by saying 'include used' when doing coin select)
<kallewoof> right. i saw jnewbery's note about preferring to have fewer options. i'm not sure i agree in this particular case. there are clearly two separate groups of users
<Murch> kallewoof: Checking the limit is not that hard: when you build a transaction, you already know the size of all recipient outputs and the transaction overhead. The only variables are whether or not there is change and how many inputs ther are
<kallewoof> but it may also be that the group that does excessive address reuse should be encouraged to switch to a more dynamic solution (e.g. btcpay instead of a static donation address)
<Murch> I think for someone privacy sensitive enough to use avoid_reuse going over 10 utxo received to one address is the edgecase, going over 100 seems exorbitant
<kallewoof> Murch: over 100 sounds unusual though, yeah, but someone might send a ton just to ensure at least one of their outputs is always included in your coin select
<kallewoof> Anyway, I'm gonna call it a close as I ran out of notes and we're already closing in on an hour. Thanks a lot for coming to hang! Hopefully we can do this regularly. :)