This PR deals with the broad topic of Coin Selection (here is a recommended
on the subject by Murch). Coin selection changes seem
to have a hard time getting reviews. Why do you think that is? What are some bad
things that can happen if something goes wrong in coin selection?
What is the problem with coin reuse? How can it be exploited, and what
approach does the avoid_reuse feature employ to solve it? Are you aware of
other ways to mitigate attacks, perhaps implemented by other privacy-focused
Can you describe the problem this PR tries to solve in your own words, including
the necessary pre-conditions?
How does this PR currently attempt to resolve the issue?
<fjahr> We are talking about a PR from I made about 2 months ago. There are not a lot LOC to review but personally I learned a lot about avoid_reuse/avoidparticalspend and how it effects coin selection so I hope you find it interesting. Also I want to pick your brain if this is the best possible solution to the problem :)
<fjahr> Let's start with the typical first questions but just ask any questions at any point and I will try to keep up: Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)
<fjahr> Great, maybe let's move on to my next question: This PR deals with the broad topic of Coin Selection. Coin selection changes seem to have a hard time getting reviews. Why do you think that is? What are some bad things that can happen if something goes wrong in coin selection?
<jonatack> My impression from talking with achow101 is that few people understand it well, apart from notmurch and I suppose people like instagibbs, kallewoof, achow101, and few have to deal with very large wallets
<fjahr> Well, next question: What is the problem with coin reuse? How can it be exploited, and what approach does the avoid_reuse feature employ to solve it? Are you aware of other ways to mitigate attacks, perhaps implemented by other privacy-focused wallets?
<jnewbery> pinheadmz: those 10 inputs need to be spent eventually! It's only more expensive if either (i) you happen to create the spend during a high-feerate period (ii) somehow using more inputs prevents BnB from finding a solution that doesn't create change
<notmurch> Regarding dusting: the main issue is that these dust utxo will be picked up by another transaction leaving the wallet creating a link between two addresses. In combination with the "single sender" heuristic
<ecurrencyhodler> Tying @notmurch's comment in with address reuse: If someone sends dust to an address and it is reused, then the wallet will include it into a spend which would tie together that cluster of addresses.
<willcl_ark> fjahr: but if you have more than 10, then using _only_ 10 seems like it has all been for nothing? Might as well choose "up to 10, if <=10 total" or else disregard this policy for this address?
<fjahr> How about we discuss this q first which was last in my notes: More generally, do you like the way avoid_reuse/avoidpartialspends currently handles this case of coin selection? Can you think of different solutions to the problem?
<jnewbery> there may be some minor effect from giving BnB less freedom in the way it chooses inputs, but simply using up a UTXO now instead of later does not change the total fee across time that the wallet pays
<willcl_ark> jnewbery: hmmmm. Let me think about this. I agree that the second spend is not so costly, because you already merged those 9 un-needed UTXOs into a single change output, but I will need to "spend them twice" to spend them once, if you see what I mean
<jnewbery> willcl_ark: it's useful to draw out all the inputs/outputs from multiple transactions to get your head around it. If you assume constant feerate (which is a bad assumption), then the only way you can save on tx fees is by avoiding creating change outputs. Any other strategy of spending your UTXOs will result in the same txfees.
<fjahr> So does anyone else have opinions on wills question? I personally have the feeling that this is kind of an edge case and while the could be a better algorithm it might not be worth the effort to implement in this case? Do you agree or disagree?
<willcl_ark> hmmm. seems to me then in response to jnewbery and fjahr's earlier question, that the easiest solution to reason about is to have the policy that "all UTXO at an address always spent together", which could then warn the user if #UTXO > 10, and of course be opted out of... otherwise the user is not sure what they are getting?
<platesondeck> This might be a bad question but is there a way to send small change amounts to your lightning node at the time of transaction? Would that still result in some trail when the channel is closed later on?
<michaelfolkson> platesondeck: You mean increase the size of an existing channel (splicing) or opening a new channel? Currently (before Schnorr) you would still see funds have been sent to a 2-of-2 multisig address and then left that 2-of-2 address
<jnewbery> willcl_ark: exactly. You can think of the fee as already implicitly part of the each UTXO. In fact, BnB uses the implicit amount of UTXOs when selecting them (the value of the UTXO minus the fee required to spend it)
<jnewbery> whenever you create a transaction in your wallet, it results in either one new UTXO (change) or zero new UTXOs (no change) in your wallet. You save fees in the long run if you can make transactions that result in no new UTXOs in your wallet.
<chanho> jnewbery: so the crux of the argument assuming constant fee rate seems to be that you essentially want to minimize the number of change outputs when you consider all the ways of spending the total of the given UTXO set?
<notmurch> sorry for slow reply, willcl_ark: by sweeping in two transactions, the cost increases are having a second output, paying for two transaction headers and having to spend two inputs later instead of one.
<nothingmuch> if not, i would have expected limiting the size to have happened afterwards (i.e. first select unbounded group, then only spend up to some limit from it) instead of constructing separate groups ahead of time
<amiti> pinheadmz: `for (auto& it: gmap)` creates an iterator `it` that just points to each element and iterates through. in this case, gmap is a `std::map<CTxDestination, OutputGroup>`, so first and second point to the tx-dest vs output-group respectively
<jnewbery> Murch: yes, but that's a much more general topic than for destination groups. I want my coin selection to choose few inputs when fees are high and many inputs when fees are low. That's a more general thing than just destination groups.
<Murch> jnewbery: for avoiding partial spends yeah, I would agree that 10 is kinda low. However, if you already got paid ten times to the same address, it sounds a lot more likely you will receive future payments there again…
<jnewbery> before you all go, I'm thinking of moving the meeting back to 17:00 UTC next week. That'd put it back to the same local time for people in the US, but would make it an hour early in local time for people in Europe, until your DST starts.
<Murch> willcl_ark: A lot of wallets actually don't track address reuse, so the expected outcome in many cases would be that people spend it in two different transactions later. Sending to an address that has money still, makes it more likely that someone is sill tracking it, though.
<nothingmuch> hmmm, apart from identifying which output is payment and which is change, which is arguably already an issue, is there any downside to later spending marked used outputs only together with change already related to them? trying to figure out if fee considerations and privacy considerations can be considered independently
<jnewbery> willcl_ark jonatack: There's always a temptation when faced with a difficult design decision to resort to "add a config option". I think that's usually the wrong decision, because most people never use them, and they lead to combinatorial complexity.
<Murch> Right, but let's say you're spending more in the second tx, and it uses multiple inputs. All of them being related to the address that was dusted would be pretty unlikely, so it tells us more about either your wallet composition or your wallt software
<kallewoof> Summary of responses for 2: willcl_ark notes that "there is no "correct" answer for which tradeoffs are best for any given user situation, does a user want "more private", or "cheaper" for example..."; amiti notes that it's hard to predict what even a small change will result in; jonatack notes that not a lot of people know the code well enough to be confident enough to review it.
<kallewoof> Question 3: What is the problem with coin reuse? How can it be exploited, and what approach does the avoid_reuse feature employ to solve it? Are you aware of other ways to mitigate attacks, perhaps implemented by other privacy-focused wallets?
<kallewoof> Summary of responses for 3: "mainly a way to prevent dust attack" (amiti), later explained to be when someone sends a small amount of btc to a known address of yours, in the hopes that your wallet will pick it up when sending money next time, so they can tie multiple bitcoin addresses to you.
<kallewoof> Continuing: Someone raised the question of why there is a limit on the number of UTXOs at all. What do you think is the reason? Fjahr says 'because it can result in high fees', but that's only partially correct.
<kallewoof> In the same vein: Jonatack asks how the limit (OUTPUT_GROUP_MAX_ENTRIES) was decided; people sort of did find it, but I will answer it for the record: it was arbitrarily decided by me. I figured we would tweak it if it became necessary later.
<Murch> kallewoof: the high fee argument can be sidestepped by making use of the waste metric BnB already uses. A set of 10+ inputs would simply show up as overtly costly in the selection at high fees, but would actually be preferred at low fees
<kallewoof> The primary reason why there is a cap on groups is to avoid the risk of the resulting transaction being so large it breaks consensus. Imagine someone who has 10k tiny outputs all to the same address. If they tried to use this feature with no limits, they would get a single gigantic UTXO which would probably be too large to fit in a block.
<kallewoof> Question 4 (out of order from review notes): "More generally, do you like the way avoid_reuse/avoidpartialspends currently handles this case of coin selection? Can you think of different solutions to the problem?"
<kallewoof> I think personally that the people who end up turning on the avoid reuse feature and the people who get non-adversarial payments repeatedly to the same address are different people, so raising the 10 towards a more realistic number is totally fine. Thoughts?
<kallewoof> I guess my concern right now is, if you have 200 spends to same address and you spend one of the 100 entry groups, the 'avoid reuse' feature will mark the other one as spent, and remove it from future coin selection (by default; you can still use it by saying 'include used' when doing coin select)
<Murch> kallewoof: Checking the limit is not that hard: when you build a transaction, you already know the size of all recipient outputs and the transaction overhead. The only variables are whether or not there is change and how many inputs ther are