Improve coin selection for destination groups >10 (wallet)

https://github.com/bitcoin/bitcoin/pulls/17824

Host: fjahr  -  PR author: fjahr

The PR branch HEAD was 87de0157 at the time of this review club meeting.

Notes

Today’s PR is a partial fix to a bug report filed last November, #17603 “partial spend avoidance makes partial spends and getbalances doesn’t notice.” In that issue, Bitcoin Core contributor dooglus describes two ways the wallet does not behave as expected when using the avoid_reuse wallet flag (sidenote: this issue is an excellent example of a good bug report, with detailed steps to reproduce the issue).

This PR addresses one of the two bugs with a fairly small fix, but there is much to dig into regarding avoid_reuse, coin selection, and whether this is the best solution to the problem.

The avoid_reuse wallet flag was introduced in #13756 by kallewoof, who subsequently followed it up with #16239 to mitigate exposure to dust attacks. The avoid_reuse feature achieves this by:

  1. avoiding spending from destinations that the wallet has previously spent from, and
  2. attempting to sweep larger parts of the outputs to a destination when it does spend, from a destination that has multiple outputs available.

Today’s PR addresses the latter of the two. The main keyword to look for in the code is GroupOutputs.

The other error reported in issue #17603 was a misrepresentation of the wallet balances in the RPC getbalances. This was resolved by #17843 “wallet: Reset reused transactions cache”.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)

  2. This PR deals with the broad topic of Coin Selection (here is a recommended reference work on the subject by Murch). Coin selection changes seem to have a hard time getting reviews. Why do you think that is? What are some bad things that can happen if something goes wrong in coin selection?

  3. What is the problem with coin reuse? How can it be exploited, and what approach does the avoid_reuse feature employ to solve it? Are you aware of other ways to mitigate attacks, perhaps implemented by other privacy-focused wallets?

  4. Can you describe the problem this PR tries to solve in your own words, including the necessary pre-conditions?

  5. How does this PR currently attempt to resolve the issue?

  6. How does the current approach compare to the previously proposed approach by the author? Can you think of other (better?) ways to solve it?

  7. More generally, do you like the way avoid_reuse currently handles this case of coin selection? Can you think of different solutions to the problem?

Meeting Log

  113:17 <jonatack> git grepping shows EXTRA_DESCENDANT_TX_SIZE_LIMIT = 10000, related?
  213:26 <fjahr> yeah, that's also my understanding but I took it from kalle's suggestion here https://github.com/bitcoin/bitcoin/pull/17824#issuecomment-570548151 and did not explicitly clarify if he chose it for another reason.
  313:27 <jonatack> right, thanks, it's the only place i saw it too... I think the value should be commented and maybe hoisted to a static constant
  413:27 <fjahr> I think at the time the carve-out discussion for lightning was still fresh and I just accepted it as the same number in the back of my head
  513:28 <fjahr> jonatack: good point
  613:28 <jonatack> also, if you have to retouch, can you make this mini-edit to the avoid_reuse test?
  713:28 <jonatack> self.log.info("Test fund send fund send {}".format(second_addr_type))
  813:29 <fjahr> haha, yeah, I have seen that a lot, will do :)
  913:29 <jonatack> :)
 1013:30 <jonatack> instead of self.log.info("Test fund send fund send")
 1113:30 <jonatack> since it's called several times
 1213:30 <kanzure> ah, "an hour and ten minutes from now" was misleading
 1313:33 <jonatack> (i don't think anyone will mind if you slip that change into your commit)
 1413:46 <emzy> day light saving is crazy. Time zones, too.
 1513:47 <fjahr> Bitcoin fixes this, let's just use blockheight :p
 1613:48 <pinheadmz> I have a bitcoin block clock over here :-)
 1714:00 <fjahr> I think it's time
 1814:00 <fjahr> #startmeeting
 1914:01 <willcl_ark> hi
 2014:01 <jnewbery> hi
 2114:01 <fjahr> Hey everyone! Welcome to this weeks edition of the PR Review Club.
 2214:01 <lightlike> hi
 2314:01 <ajonas> hi
 2414:01 <jkczyz> hi
 2514:01 <platesondeck> Hello
 2614:01 <nothingmuch> hi
 2714:01 <pinheadmz> hi
 2814:01 <jonatack_> hi
 2914:01 <fjahr> We are talking about a PR from I made about 2 months ago. There are not a lot LOC to review but personally I learned a lot about avoid_reuse/avoidparticalspend and how it effects coin selection so I hope you find it interesting. Also I want to pick your brain if this is the best possible solution to the problem :)
 3014:01 <emzy> hi
 3114:02 <michaelfolkson> hi
 3214:02 <notmurch> hello
 3314:02 <pinheadmz> yeah i didnt even know this was a wallet feature
 3414:03 <fjahr> Let's start with the typical first questions but just ask any questions at any point and I will try to keep up: Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)
 3514:03 <fjahr> pinheadmz: it was also new to me
 3614:03 <jonatack> Concept ACK; studying the implementation
 3714:04 <fjahr> Maybe also someone who also recreated the bug? The original bug issue had nice command line instructions for regtest.
 3814:05 <jonatack> Yes, reproduced it while reviewing https://github.com/bitcoin/bitcoin/pull/17838
 3914:05 <willcl_ark> yes concept ACK for me also. I'm always surprised how difficult to implement coin selection seems to be
 4014:05 <jonatack> "test: test the >10 UTXO case for output groups"
 4114:06 <jonatack> which you fixed with #17843
 4214:06 <fjahr> Great, maybe let's move on to my next question: This PR deals with the broad topic of Coin Selection. Coin selection changes seem to have a hard time getting reviews. Why do you think that is? What are some bad things that can happen if something goes wrong in coin selection?
 4314:07 <pinheadmz> coin selection is responsible for TX size, fee, privacy...
 4414:07 <willcl_ark> I think it's because there is no "correct" answer for which tradeoffs are best for any given user situation
 4514:07 <willcl_ark> does a user want "more private", or "cheaper" for example...
 4614:07 <amiti> its really hard to properly predict the possible eventual effects of small changes to coin selection can have .. since it can compound over time
 4714:08 <michaelfolkson> So I can't remember where this was discussed. Maybe on a podcast. But didn't Murch's changes based on his thesis get reverted?
 4814:08 <pinheadmz> michaelfolkson: murch mentioned that to me yeah
 4914:08 <jonatack> My impression from talking with achow101 is that few people understand it well, apart from notmurch and I suppose people like instagibbs, kallewoof, achow101, and few have to deal with very large wallets
 5014:08 <pinheadmz> i was surprised this is feature even a thing -- IIUC, we spend 10 UTXOs just to consolidate addresses?
 5114:09 <willcl_ark> achow101 also has some Twitch streams where he talks a lot about coin selection :)
 5214:09 <jonatack> and bitcoin core's wallet isn't adapted to large-scale exchange-sized applications afaik
 5314:09 <michaelfolkson> What was the reason for reverting Murch's changes? Some impact on the network?
 5414:09 <jnewbery> pinheadmz: I don't think 'consolidate addresses' is the right way of putting this. The aim is to spend all outputs to the same address in the same tx
 5514:10 <pinheadmz> sure couldnt think of a good term
 5614:10 <fjahr> jnewbery: yes!
 5714:10 <pinheadmz> but still, you could end up with a TX 10x in size to gain this privacy
 5814:10 <jnewbery> (consolidate addresses implies to me the opposite - that you're taking multiple addresses and doing something that would link them)
 5914:11 <pinheadmz> its a UTXO consolidation though
 6014:11 <notmurch> michaelfolkson: my initial dabblings with Coin Selection got reverted but it was two years before my thesis
 6114:11 <pinheadmz> kind of pretending that all outputs to an address are just one big single utxo
 6214:11 <pinheadmz> murch is here!
 6314:12 <jonatack> michaelfolkson: (iirc that was an livera podcast with achow101 that you are referring to)
 6414:12 <fjahr> pinheadmz: yepp, and even more if you spend multiple groups
 6514:12 <pinheadmz> fjahr: so this could be an expensive mechanism. Was it controversial?
 6614:12 <michaelfolkson> Ah cool. So the primary conclusions of the thesis haven't been discarded notmurch?
 6714:13 <achow101> michaelfolkson: murch's thesis is the BnB coin selection algo we use in core now
 6814:13 <notmurch> My first attempt to change Bitcoin Core's coin selection made it stop spending uneconomic inputs
 6914:13 <achow101> I'm kind of trying to get the second half of his conclusions into core as well (the random selection fallback part)
 7014:13 <michaelfolkson> Cool, thanks. And thanks jonatack yup. Transcript is here: https://stephanlivera.com/episode/99/
 7114:13 <fjahr> pinheadmz: I didn't go through that again in my prepbut I can not remember a big discussion
 7214:13 <notmurch> this caused a notable increase in the UTXO set in the following months and got it reverted
 7314:14 <jonatack> yes, see src/wallet/coinselection.cpp::20-
 7414:14 <jonatack> for the BnB algo
 7514:15 <fjahr> Ok, pinheadmz: I remember there were some comments on the number 10 but it was kind of accepted, can not find it now
 7614:15 <fjahr> strike the ok :)
 7714:15 <fjahr> Well, next question: What is the problem with coin reuse? How can it be exploited, and what approach does the avoid_reuse feature employ to solve it? Are you aware of other ways to mitigate attacks, perhaps implemented by other privacy-focused wallets?
 7814:15 <jnewbery> pinheadmz: can you explain why 'this could be an expensive mechanism'?
 7914:16 <pinheadmz> jnewbery: adding 10 inputs to a TX when only 1 is sufifcient to cover the output value ?
 8014:17 <amiti> my understanding is that the `avoid_reuse` feature is mainly a way to prevent a dust attack?
 8114:17 <platesondeck> i'm sorry, but could someone pls explain to me what exactly you mean by coin reuse?
 8214:17 <fjahr> amiti: right, someone want to define dust attack?
 8314:17 <notmurch> platesondeck: when funds are received to the same address twice
 8414:17 <fjahr> address reuse, sorry :)
 8514:17 <jnewbery> pinheadmz: those 10 inputs need to be spent eventually! It's only more expensive if either (i) you happen to create the spend during a high-feerate period (ii) somehow using more inputs prevents BnB from finding a solution that doesn't create change
 8614:18 <andrewtoth_> jnewbery why can't the other 9 inputs be hodled forever?
 8714:18 <ecurrencyhodler> Dust attack: It's when someone sends a small amount of Bitcoin to a known Bitcoin address to track it and see if they can link it to other parts of your Bitcoin holdings.
 8814:18 <amiti> a dust attack is where an adversary sends dust to lots of different addresses, then waits to see where its used to link to other addresses
 8914:19 <platesondeck> fjahr ok the terminology threw me for a loop
 9014:19 <jnewbery> andrewtoth_: :)
 9114:19 <fjahr> amiti, ecurrencyhodler: correct
 9214:19 <notmurch> andrewtoth_: if you ever want to make use of the value of an unspent, it will need to be spent eventually
 9314:19 <fjahr> platesondeck: sorry about that
 9414:20 <notmurch> Regarding dusting: the main issue is that these dust utxo will be picked up by another transaction leaving the wallet creating a link between two addresses. In combination with the "single sender" heuristic
 9514:20 <notmurch> this allows to tie together a larger cluster of addresses
 9614:21 <jkczyz> notmurch: but if it's dust, it presumably doesn't (or didn't) have much value and wasn't yours to begin with. Is the only reason not to eventually spend to avoid UTXO bloat?
 9714:21 <notmurch> *a link between the previous transaction and the new transaction
 9814:21 <fjahr> on the 10 inputs: in terms of privacy also you have only one chance for the first time you spent from a destination, so ideally you can sweep all of it at once but 10 is the arbitrary cut-off
 9914:21 <notmurch> jkczyz what if what's dust today eventually is worth spending?
10014:21 <willcl_ark> why not always spend all from the address?
10114:21 <ecurrencyhodler> Tying @notmurch's comment in with address reuse: If someone sends dust to an address and it is reused, then the wallet will include it into a spend which would tie together that cluster of addresses.
10214:21 <platesondeck> could there be a wallet feature that picks a dust utxo and uses it to fill the mining fee for a regular transaction gradually instead of waiting to collect all the dust in one big pile
10314:22 <pinheadmz> i wonder if this will even be an issue as bip44 wallets dominate
10414:23 <fjahr> willcl_ark: because that might then result in high fees, the 10 outputs group limit is trying to strike a balance
10514:23 <notmurch> platesondeck: the link is being created by a transaction referencing the UTXO in its inputs. Hence that's indistinguishable from any other way of spending it ;)
10614:23 <jkczyz> notmurch: understood but it could still be considered dust relative to the rest of your coins ;)
10714:24 <fjahr> Bonus question: who can give a quick definition of a "destination" as it is used in this context?
10814:24 <willcl_ark> fjahr: but if you have more than 10, then using _only_ 10 seems like it has all been for nothing? Might as well choose "up to 10, if <=10 total" or else disregard this policy for this address?
10914:24 <jonatack> Does anyone know how the outputs groups limit (OUTPUT_GROUP_MAX_ENTRIES = 10 in wallet.cpp::46) was decided?
11014:24 <andrewtoth_> destination would be any address derived from same pubkey, so could be p2pkh, wrapped p2sh, p2wpkh
11114:25 <willcl_ark> e.g.: if you have 12 UTXO at that address, and you use only 10, still another spend will link them, therefore using 10 previously, when you only needed 1, was just wasting tx fees?
11214:25 <fjahr> jonatack: should be in the avoidparticalspend code, but i did not pull it up
11314:27 <jnewbery> willcl_ark: using inputs early is not wasting fees. Those inputs need to be spent eventually
11414:27 <fjahr> willcl_ark: you are already getting towards my later questions :) buy maybe I can pull it up.
11514:28 <fjahr> How about we discuss this q first which was last in my notes: More generally, do you like the way avoid_reuse/avoidpartialspends currently handles this case of coin selection? Can you think of different solutions to the problem?
11614:28 <jnewbery> there may be some minor effect from giving BnB less freedom in the way it chooses inputs, but simply using up a UTXO now instead of later does not change the total fee across time that the wallet pays
11714:28 <ecurrencyhodler> What is BnB short for?
11814:28 <fjahr> Branch'n;Bound
11914:29 <willcl_ark> jnewbery: hmmmm. Let me think about this. I agree that the second spend is not so costly, because you already merged those 9 un-needed UTXOs into a single change output, but I will need to "spend them twice" to spend them once, if you see what I mean
12014:30 <jonatack> fjahr: i see it was added by kallewoof in 18f690e "wallet: shuffle coins before grouping, where warranted"
12114:31 <fjahr> jonatack: ok, I think I remember promag asking about the number and kalle said it was kind of arbitrarily chosen, or am I making that up? :)
12214:31 <jonatack> but he hoisted the pre-existing value of 10 to a constant, it was already previously set to 10
12314:32 <jonatack> yes, seems so
12414:32 <jnewbery> willcl_ark: it's useful to draw out all the inputs/outputs from multiple transactions to get your head around it. If you assume constant feerate (which is a bad assumption), then the only way you can save on tx fees is by avoiding creating change outputs. Any other strategy of spending your UTXOs will result in the same txfees.
12514:32 <jonatack> "Cases where we have 11+ outputs all pointing to the same destination may result in privacy leaks as they will potentially be deterministically sorted."
12614:34 <fjahr> So does anyone else have opinions on wills question? I personally have the feeling that this is kind of an edge case and while the could be a better algorithm it might not be worth the effort to implement in this case? Do you agree or disagree?
12714:34 <jnewbery> here's the comment where 10 was chosen: https://github.com/bitcoin/bitcoin/pull/12257/files#r204171876
12814:35 <willcl_ark> hmmm. seems to me then in response to jnewbery and fjahr's earlier question, that the easiest solution to reason about is to have the policy that "all UTXO at an address always spent together", which could then warn the user if #UTXO > 10, and of course be opted out of... otherwise the user is not sure what they are getting?
12914:35 <jonatack> jnewbery: thanks!
13014:35 <platesondeck> This might be a bad question but is there a way to send small change amounts to your lightning node at the time of transaction? Would that still result in some trail when the channel is closed later on?
13114:36 <willcl_ark> I guess some static donation addresses exist which could have _very_ high numbers of UTXOs though
13214:36 <pinheadmz> 1andreas... :-)
13314:37 <willcl_ark> jnewbery: I see your fee argument now; you are just paying the fee _now_, instead of _later_ :)
13414:39 <michaelfolkson> platesondeck: You mean increase the size of an existing channel (splicing) or opening a new channel? Currently (before Schnorr) you would still see funds have been sent to a 2-of-2 multisig address and then left that 2-of-2 address
13514:39 <jnewbery> willcl_ark: exactly. You can think of the fee as already implicitly part of the each UTXO. In fact, BnB uses the implicit amount of UTXOs when selecting them (the value of the UTXO minus the fee required to spend it)
13614:40 <fjahr> platesondeck: you could also do a submarine swap with someone but that is not something that would implement in the core wallet
13714:41 <fjahr> Aside from the more broad discussion on coin selection: Can you describe the problem this PR tries to solve in your own words, including the necessary pre-conditions?
13814:41 <jnewbery> whenever you create a transaction in your wallet, it results in either one new UTXO (change) or zero new UTXOs (no change) in your wallet. You save fees in the long run if you can make transactions that result in no new UTXOs in your wallet.
13914:41 <chanho> jnewbery: so the crux of the argument assuming constant fee rate seems to be that you essentially want to minimize the number of change outputs when you consider all the ways of spending the total of the given UTXO set?
14014:42 <pinheadmz> fjahr: I think the bug is, if you have 11 utxos for 1 destination, instead of spending the group of 10, it spends the "group" of 1
14114:42 <jnewbery> chanho: yes, that's true in general. Less change outputs => less txfees
14214:43 <pinheadmz> but then it also labels the the group of 10 as already resused
14314:44 <fjahr> pinheadmz: yes, or more generally 'mod 10' could also be 21, 12 etc.
14414:45 <fjahr> the case described in the issue where mod10 is 1 certainly looks the weirdest to the user because it seems likee the wallet flag is not working
14514:45 <jonatack> chanho: minimising tx fees and also maximising privacy by minimising addresses used together and preferring split over several utxos
14614:45 <notmurch> sorry for slow reply, willcl_ark: by sweeping in two transactions, the cost increases are having a second output, paying for two transaction headers and having to spend two inputs later instead of one.
14714:46 <willcl_ark> sorry to be hung up on this, I hope I'm not de-railing... but why not: if (can be done without change); else (spend all from address)?
14814:48 <fjahr> all good, as I said in the beginning it's the point of this review club to learn about this stuff and since we have the experts here... :)
14914:48 <notmurch> chanho: avoiding change also improves privacy and also increases the amount of balance that remains spendable in your wallet
15014:50 <fjahr> On the PR specifically: How does this PR currently attempt to resolve the issue? What 'mechanism' is it using?
15114:52 <pinheadmz> well theres this new vecotr of detinations called full_groups
15214:53 <fjahr> pinheadmz: yes, but what is doing with the group that is not full?
15314:53 <pinheadmz> and i know from context its preferring it
15414:53 <pinheadmz> but the it.second and it.first stuff is out of my scope
15514:54 <pinheadmz> looks like it iterates through the full_groups first
15614:54 <fjahr> the key are the ancestors
15714:55 <pinheadmz> yeah so that comes in to play later in the coin selection algortihm?
15814:55 <pinheadmz> like theres no other "pregerence" property?
15914:55 <fjahr> yes, it is currently using the number of ancestors (max_anchestors - 1) to 'penalize' the 'mod 10 leftovers' group so it gets chosen last
16014:56 <pinheadmz> so maybe this is too low level but "for (auto& it : gmap)" -- creates an iterator `it` ?
16114:57 <pinheadmz> and that has .first and .sceond properties?
16214:57 <fjahr> You can see how it takes effect in 'SelectCoins' in the that starts with 'bool res = value_to_select <= 0 ||'
16314:57 <fjahr> only 3 minutes left, shoot if you have any questions left :)
16414:58 <jnewbery> I have a question: does 10 inputs seem aggressively low to people, given the goal is to avoid transactions becoming too big?
16514:58 <jonatack> I'm curious where the 9999 and 10000 max ancestor values come from, and need to study SelectCoins()
16614:58 <fjahr> pinheadmz: https://github.com/bitcoin/bitcoin/blob/master/src/wallet/wallet.cpp#L2395
16714:59 <Murch> jnewbery:Could you define the goal more specifically?
16814:59 <platesondeck> if you already have dust would it be more privacy preserving to collect 1 piece of dust with your next transaction instead of all together?
16914:59 <jnewbery> a regular P2PKH tx with 10 inputs and 2 outputs is ~1500 vbytes, which isn't enormous. We've seen much larger utxo consolidations than that
17014:59 <jnewbery> Murch: https://github.com/bitcoin/bitcoin/pull/12257/files#r204171876
17115:00 <jnewbery> oh no. What happens when a Murch and a notmurch collide?
17215:00 <Murch> jnewbery: Well, when the fees are high, ten inputs is kinda sizeable already. When the fees are low, 10 isn't all that much
17315:00 <Murch> platesondeck: depends on whether they're on the same address or different ones
17415:00 <nothingmuch> is thinking of n outputs with a single yet unused script as a single output with n times the overhead missing anything?
17515:00 <fjahr> platesondeck: can you be more specific? dust in the same destination?
17615:01 <jnewbery> well if you're going to talk about coin selection being dynamic to prevailing feerates, that's a whole larger topic
17715:01 <nothingmuch> if not, i would have expected limiting the size to have happened afterwards (i.e. first select unbounded group, then only spend up to some limit from it) instead of constructing separate groups ahead of time
17815:01 <jnewbery> I'm just saying that if the goal is to avoid large txs, then we can go a lot higher than 10 inputs, which would confer these privacy benefits on more users
17915:02 <willcl_ark> tbh, 10 seems ok for "regular users" (from personal experience), I guess anything larger than that is just "considered edge case"?
18015:02 <jonatack> fjahr: it seems to me that adding more documentation in your PR on these things while working on it and figuring it all out could be a real value-add
18115:02 <Murch> jnewbery: well, at one sat per vB, I like my txes to have 50+ inputs :p
18215:02 <amiti> pinheadmz: `for (auto& it: gmap)` creates an iterator `it` that just points to each element and iterates through. in this case, gmap is a `std::map<CTxDestination, OutputGroup>`, so first and second point to the tx-dest vs output-group respectively
18315:02 <jnewbery> I think having a limit of 10 also changes the dust attack from "send one dust output to the address" to "send 11 dust outputs to the address", rather than removing it entirely, no?
18415:03 <michaelfolkson> Agreed jonatack
18515:03 <Murch> nothingmuch: you could see it like that if you always enforce avoiding partial spends
18615:03 <fjahr> jonatack: do you mean on the features I am editing or the change itself?
18715:04 <willcl_ark> jnewbery: does a dust attck have to send to an "empty" (previously all spent) address? if there are already UTXOs there, then does dust make any difference vs tracking a current UTXO?
18815:04 <jnewbery> Murch: yes, but that's a much more general topic than for destination groups. I want my coin selection to choose few inputs when fees are high and many inputs when fees are low. That's a more general thing than just destination groups.
18915:05 <Murch> jnewbery: for avoiding partial spends yeah, I would agree that 10 is kinda low. However, if you already got paid ten times to the same address, it sounds a lot more likely you will receive future payments there again…
19015:05 <jonatack> fjahr: the changeset for sure. maybe an extra commit for relevant related code
19115:05 <fjahr> ok, we are 5 minutes over so I will call #endmeeting but feel free to keep the conversation going :)
19215:06 <jnewbery> Thanks fjahr. Great meeting!
19315:06 <Murch> willcl_ark: no, a dust attack can also send to an address that has coins currently already
19415:06 <jonatack> jnewbery: maybe add a config or option arg?
19515:06 <michaelfolkson> Nice work fjahr on your first one :)
19615:06 <fjahr> Thanks everyone for participating, especially murch!
19715:06 <willcl_ark> Murch: what is the difference between just tracking an already-existing UTXO in that case?
19815:07 <Murch> willcl_ark: if the user decides to spend anyway, it may increase the cost of their transaction
19915:07 <willcl_ark> thanks fjahr!
20015:07 <jnewbery> before you all go, I'm thinking of moving the meeting back to 17:00 UTC next week. That'd put it back to the same local time for people in the US, but would make it an hour early in local time for people in Europe, until your DST starts.
20115:07 <amiti> thanks for hosting fjahr! I'm very unfamiliar with this part of the code & this was helpful exposure
20215:07 <jnewbery> any agreement/objections to that?
20315:07 <andrewtoth_> Thanks fjahr and everyone!
20415:07 <pinheadmz> yes thanks everyone!
20515:07 <docallag> Thanks fjahr
20615:07 <andrewtoth_> Either time works for me
20715:08 <jonatack> jnewbery: no objections here to earlier
20815:08 <docallag> In Europe but +1 on the time change
20915:08 <Murch> willcl_ark: A lot of wallets actually don't track address reuse, so the expected outcome in many cases would be that people spend it in two different transactions later. Sending to an address that has money still, makes it more likely that someone is sill tracking it, though.
21015:08 <platesondeck> good meeting fjahr
21115:08 <jonatack> thanks fjahr, great meeting
21215:08 <nothingmuch> hmmm, apart from identifying which output is payment and which is change, which is arguably already an issue, is there any downside to later spending marked used outputs only together with change already related to them? trying to figure out if fee considerations and privacy considerations can be considered independently
21315:09 <Murch> fjahr: it was fun :)
21415:09 <jnewbery> (with apologies to anyone in the southern hemisphere or who don't have DST)
21515:10 <willcl_ark> Murch: I kind of understand
21615:10 <jonatack> Murch: thanks for swinging by, coin selection is a topic we could definitely spend more time thinking about.
21715:10 <Murch> nothingmuch: interesting question
21815:11 <Murch> my pleasure
21915:12 <Murch> nothingmuch: Privacy is definitely the more mind-boggling one of those
22015:13 <nothingmuch> yeah it's related to my previous question... i'm very confused/ambivalent about this behaviour so i'm trying to find equivalences to simplify my mental model
22115:14 <jonatack> nothingmuch: agree, seems worth thinking about
22215:14 <willcl_ark> might be nice to be able to configure the 10 value at spend time witha flag, or a conf option
22315:14 <Murch> I think revealing which one was change might actually be worse than connecting it with another address, still mulling though
22415:15 <nothingmuch> and afaict the consolidation can be delayed indefinitely from a privacy POV so long as it ends up in an equivalent state
22515:15 <nothingmuch> yes, definitely a serious issue
22615:16 <nothingmuch> "arguably already an issue" - i'm assuming here (perhaps incorrectly) that the payment amount is relatively small, so unnecessary input heuristic could identify the change address
22715:16 <Murch> nothingmuch: e.g. it would reveal more about what sort of wallet you're running which may allow other guesses such as whether it's possible that there was more than one change
22815:17 <jnewbery> willcl_ark jonatack: There's always a temptation when faced with a difficult design decision to resort to "add a config option". I think that's usually the wrong decision, because most people never use them, and they lead to combinatorial complexity.
22915:18 <Murch> Right, but let's say you're spending more in the second tx, and it uses multiple inputs. All of them being related to the address that was dusted would be pretty unlikely, so it tells us more about either your wallet composition or your wallt software
23015:18 <jonatack> jnewbery: true, and same for changing the value dynamically WRT fees.
23115:20 <nothingmuch> Murch: good point
23215:20 <nothingmuch> fwiw my mental model for this was suppose don't enable avoid_reuse, and later realize i should have, what would i do manually
23315:22 <willcl_ark> jnewbery: I see your point. It does seem a shame though to have an arbritrary and fixed group value hardcoded IMO

Meeting Log – Asia time zone

Host: kallewoof

23423:58 <fanquake> kallewoof: we're the same time at last week hey?
23502:17 <kallewoof> fanquake: yep!
23602:24 <fanquake> kallewoof 👍
23704:56 <kallewoof> other-side-of-earth meeting starts in a couple minutes
23804:58 ⚡ fanquake races to finish making lunch
23905:00 <kallewoof> weird. my mac clock was a few minutes ahead for some reason. had to open time preferences to make it refresh.
24005:00 <kallewoof> #startmeeting
24105:01 <kallewoof> hi
24205:01 <aj> hi
24305:02 <fanquake> hi
24405:02 <akionak> hi
24505:02 <Murch> hi
24605:02 <anditto> hi
24705:02 <fanquake> Murch: glad we've got the expert here
24805:03 <kallewoof> Isn't it very early/late for you, Murch?? :o
24905:03 ⚡ Murch hides in a shadow
25005:03 <Murch> 9pm only
25105:03 <kallewoof> Oh..!
25205:03 <Murch> I'm in the bay area
25305:03 <Murch> I should finally look at the PR, though :p
25405:03 <kallewoof> I've taken the liberty of summarizing the previous meeting's talking points. I'll post that, and we can air our opinions on what's been pointed out so far, or just "that about covers it".
25505:04 <kallewoof> First off, this is about PR #17824 at https://github.com/bitcoin/bitcoin/pull/17824
25605:04 <kallewoof> Murch: ^ link :)
25705:04 <kallewoof> Question 1: Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)
25805:05 <fanquake> Have only glanced over the PR. Read some of the meeting logs from last night.
25905:06 <Murch> I have a rudimentary understanding from the previous discussion
26005:07 <kallewoof> I've reviewed an earlier version. I think the PR title could use an improvement.
26105:08 <kallewoof> Concept ACK on my side, fwiw. It's a bug-fix too, IIRC.
26205:08 <Murch> The commit message could be a bit more elaborate
26305:09 <kallewoof> Yeah
26405:10 <fanquake> kallewoof: did you want to post that summary?
26505:10 <kallewoof> yeap
26605:11 <kallewoof> I didn't have one for question 1
26705:11 <kallewoof> Question 2: "Coin selection changes seem to have a hard time getting reviews. Why do you think that is? What are some bad things that can happen if something goes wrong in coin selection?"
26805:11 <kallewoof> Summary of responses for 2: willcl_ark notes that "there is no "correct" answer for which tradeoffs are best for any given user situation, does a user want "more private", or "cheaper" for example..."; amiti notes that it's hard to predict what even a small change will result in; jonatack notes that not a lot of people know the code well enough to be confident enough to review it.
26905:12 <jonatack> hi
27005:12 <kallewoof> jonatack: hey :)
27105:12 <jonatack> (5 am :D)
27205:12 <kallewoof> damn..
27305:13 <Murch> Yeah, coinselection is fairly multi dimensional and impact is only measurable over a longer period of time
27405:13 <Murch> I should make time to review :-/
27505:14 <fanquake> > Coin selection changes seem to have a hard time getting reviews. - mainly because there are about 4? active contributors that work on/want to review this.
27605:14 <Murch> achow, alex, instagibbs and?
27705:14 <kallewoof> Yeah. And it's scary. A screw up could very much lead to a loss of funds.
27805:14 <kallewoof> Murch: me, I think
27905:15 <Murch> wel, not usually
28005:15 <fanquake> The other points are all correct. Very hard to propose any chances to this code that are "obviously" correct.
28105:15 <Murch> kallewoof: Not clear to me how you'd lose a lot of money with coinselection
28205:16 <Murch> you can make it a bit inefficient, then you'll overpay a bit over a long period of time until you fix
28305:16 <Murch> or you can make an actual mistake, that'll cause you to use a lot more inputs at high fees, but you should catch that quickly
28405:16 <kallewoof> Murch: coin selection specifically is pretty safe, yeah. but it's a central part of the wallet software.
28505:17 <Murch> right
28605:17 <Murch> and transaction building is scary :D
28705:18 <kallewoof> Question 3: What is the problem with coin reuse? How can it be exploited, and what approach does the avoid_reuse feature employ to solve it? Are you aware of other ways to mitigate attacks, perhaps implemented by other privacy-focused wallets?
28805:18 <kallewoof> Summary of responses for 3: "mainly a way to prevent dust attack" (amiti), later explained to be when someone sends a small amount of btc to a known address of yours, in the hopes that your wallet will pick it up when sending money next time, so they can tie multiple bitcoin addresses to you.
28905:19 <kallewoof> I'm personally interested in hearing about alternatives. I don't know of any, myself.
29005:20 <Murch> Only have one that is a bit of a stretch
29105:20 <Murch> Receiving BCH to Segwit addresses is a lot less safe when the address was used before ;)
29205:20 <Murch> but yeah, main drawback is loss of privacy
29305:21 <Murch> also, people keep mentioning that the user that reuses the address loses privacy
29405:21 <Murch> but also the sender that sends to it and the recipient that later gets paid from those funds lose privacy
29505:22 <meshcollider> Oops! Sorry I'm late
29605:22 <meshcollider> Hi
29705:22 <kallewoof> meshcollider: hi! :)
29805:22 <Murch> hey meshcollider!
29905:22 <kallewoof> Murch: confused about BCH part. are you talkign about the altcoin now?
30005:22 <Murch> yeah
30105:23 <fanquake> Hi meshcollider
30205:23 <kallewoof> oh I think I see what you're saying now. you're talking about another reason why coin reuse is bad
30305:23 <kallewoof> right?
30405:23 <Murch> right
30505:23 <Murch> I mean, the obvious answer to Q3 is that it's a privacy issue.
30605:24 <kallewoof> i thought you were talking about alternative solutions to dealing with it
30705:24 <kallewoof> right
30805:24 <Murch> and you were asking whether there is anything else
30905:24 <Murch> nope, sorry, should have been clearer
31005:25 <jonatack> meshcollider: hey :)
31105:25 <kallewoof> right, yeah, i see what you mean. i simply assumed we all skipped to the latter question, which was a bit odd of me, now that i think about it
31205:25 <meshcollider> I definitely agree on the points made above about why coin selection is hard to review
31305:26 <meshcollider> The fact there is no correct answer just makes it seem very directionless
31405:26 <Murch> oh, I have two more
31505:26 <kallewoof> two more reasons why coin reuse is bad?
31605:27 <Murch> address reuse makes private key leaks more likely in the case `k` gets reused (apparently this caused the loss of 55 BTC once)
31705:29 <Murch> and in more exotic constructions like anyone_can_pay transactions, if you use a UTXO that shares the address with others, you may be signing away the other utxos aswell
31805:29 <kallewoof> first time i heard of that. would love to read about it if you have a link :)
31905:30 <jonatack> same, TIL
32005:30 <Murch> right after I figure out how to paste in weechat
32105:31 <Murch> bitcoin.stackexchange.com/a/42380/5406
32205:31 <kallewoof> i may be completely off base here, but sighash_noinput would mean address reuse = replayability, no?
32305:31 <kallewoof> (sighash_noinput may have a different name these days)
32405:32 <aj> kallewoof: yes; you shouldn't use noinput/anyprevout if replayability is a concern
32505:33 <kallewoof> aj: this has probably been discussed to death, so i'll ping you after the meeting :P
32605:33 <kallewoof> Continuing: Someone raised the question of why there is a limit on the number of UTXOs at all. What do you think is the reason? Fjahr says 'because it can result in high fees', but that's only partially correct.
32705:33 <kallewoof> In the same vein: Jonatack asks how the limit (OUTPUT_GROUP_MAX_ENTRIES) was decided; people sort of did find it, but I will answer it for the record: it was arbitrarily decided by me. I figured we would tweak it if it became necessary later.
32805:34 <kallewoof> But yeah. Why do you think there's a (currently) 10 UTXO cap per group?
32905:37 <Murch> kallewoof: the high fee argument can be sidestepped by making use of the waste metric BnB already uses. A set of 10+ inputs would simply show up as overtly costly in the selection at high fees, but would actually be preferred at low fees
33005:38 <Murch> I.e. if you just treat the whole group as one UTXO with a higher cost
33105:38 <kallewoof> that's a good point, yeah
33205:39 <kallewoof> The primary reason why there is a cap on groups is to avoid the risk of the resulting transaction being so large it breaks consensus. Imagine someone who has 10k tiny outputs all to the same address. If they tried to use this feature with no limits, they would get a single gigantic UTXO which would probably be too large to fit in a block.
33305:40 <kallewoof> maybe 10k is too low, but you get the point
33405:40 <Murch> yeah
33505:40 <jonatack> jnewbery found the link here https://github.com/bitcoin/bitcoin/pull/12257/files#r204177652 where you introduced it
33605:40 <kallewoof> ahh okay, i missed that
33705:40 <Murch> yeah, but 100 would be fine
33805:41 <Murch> unless your transaction also has thousands of outputs at th same time ;)
33905:41 <kallewoof> yeah, i think 100 is a good number
34005:41 <kallewoof> Question 4 (out of order from review notes): "More generally, do you like the way avoid_reuse/avoidpartialspends currently handles this case of coin selection? Can you think of different solutions to the problem?"
34105:41 <kallewoof> I didn't see any specific responses to this question, but I'm raising it since you guys might have thoughts.
34205:41 <kallewoof> There's some interesting fee related discussion in the previous meeting. I recommend reading through it.
34305:41 <Murch> p2pkh input has 147 or 148 vB, so you can do more than 600 if you don't have too many outputs.
34405:41 <Murch> ;)
34505:42 <kallewoof> Murch: nice, yeah
34605:42 <Murch> Segwit even more, of course
34705:43 <kallewoof> I think personally that the people who end up turning on the avoid reuse feature and the people who get non-adversarial payments repeatedly to the same address are different people, so raising the 10 towards a more realistic number is totally fine. Thoughts?
34805:43 <Murch> Regarding Q4, I find the 10 limit a bit arbitrary, but 100 is just as arbitrary.
34905:43 <jonatack> kallewoof: sgtm
35005:43 <Murch> Doesn't Bitcoin Core have a check somewhere whether the transaction size limit is exceeded?
35105:44 <kallewoof> 100 is less arbitrary because we've given more thought to it than was put into 10. Or is that not how "arbitrary" works? :)
35205:44 <Murch> lol
35305:44 <kallewoof> Murch: maybe, but the user will still not be able to spend their gigantic group easily
35405:45 <Murch> well, 100 should definitely cover most adverserial dusting attack scenarios
35505:45 <Murch> and people deliberately reusing addresses excessively probably don't care as you say
35605:46 <jonatack> kallewoof: you mention in the original comment "Perhaps this should be an option"
35705:46 <kallewoof> I guess my concern right now is, if you have 200 spends to same address and you spend one of the 100 entry groups, the 'avoid reuse' feature will mark the other one as spent, and remove it from future coin selection (by default; you can still use it by saying 'include used' when doing coin select)
35805:47 <jonatack> in the previous session willcl_ark liked that idea, and jnewbery less so (more combinatorial complexity)
35905:47 <kallewoof> right. i saw jnewbery's note about preferring to have fewer options. i'm not sure i agree in this particular case. there are clearly two separate groups of users
36005:48 <Murch> kallewoof: Checking the limit is not that hard: when you build a transaction, you already know the size of all recipient outputs and the transaction overhead. The only variables are whether or not there is change and how many inputs ther are
36105:48 <jonatack> i like the idea, but we may want to provide guidance on how to set it/use it
36205:48 <kallewoof> but it may also be that the group that does excessive address reuse should be encouraged to switch to a more dynamic solution (e.g. btcpay instead of a static donation address)
36305:48 <Murch> so, one could simply limit such that the transaction remains below the standard limit
36405:49 <kallewoof> Murch: that's true. I wonder if the complexity is worth it though.
36505:50 <Murch> unlikely, but it would cover the case with 200 utxos if you're seriously concerned :)
36605:50 <jonatack> kallewoof: btcpay is what i do currently use for repeat txns, but i would have liked to use the bitcoin core wallet
36705:51 <Murch> I think for someone privacy sensitive enough to use avoid_reuse going over 10 utxo received to one address is the edgecase, going over 100 seems exorbitant
36805:51 <jonatack> or be able to export an xpub... descriptor wallets should help here, I believe
36905:51 <kallewoof> jonatack: I haven't given it a lot of thought, but it would be cool to look into what is missing to get to where you can use bitcoin core
37005:52 <jonatack> yes
37105:53 <kallewoof> Murch: I think gmax has a small fortune in track-"dust" from early days
37205:53 <kallewoof> Murch: over 100 sounds unusual though, yeah, but someone might send a ton just to ensure at least one of their outputs is always included in your coin select
37305:53 <Murch> well, we found our first user of SNICKER :p
37405:54 <kallewoof> Anyway, I'm gonna call it a close as I ran out of notes and we're already closing in on an hour. Thanks a lot for coming to hang! Hopefully we can do this regularly. :)
37505:55 <kallewoof> #endmeeting
37605:55 <fanquake> thanks kallewoof
37705:55 <Murch> thanks for prepping
37805:55 <jonatack> 12 months (or maybe even 52 weeks) might be common enough for the limit bump
37905:55 <jonatack> thanks kallewoof, thanks Murch
38005:56 <Murch> I've not heard about any dusting attacks since Sochi
38105:56 ⚡ jonatack waves to fanquake and meshcollider
38205:56 <Murch> or at least not any widespread ones
38305:56 <kallewoof> jonatack: thanks for joining at such an early hour :0
38405:57 <jonatack> 10/10 would do it again :) will post the meeting log to https://bitcoincore.reviews/17824
38505:57 ⚡ fanquake waves
38605:57 <kallewoof> thanks! :D
38705:58 <Murch> alright, good night
38805:58 <kallewoof> night Murch!