Bitcoin Core PR Review Club

Improve coin selection for destination groups >10 (wallet)

Mar 11, 2020

https://github.com/bitcoin/bitcoin/pull/17824

Host: fjahr - PR author: fjahr

The PR branch HEAD was 87de0157 at the time of this review club meeting.

Notes

Today’s PR is a partial fix to a bug report filed last November, #17603 “partial spend avoidance makes partial spends and getbalances doesn’t notice.” In that issue, Bitcoin Core contributor dooglus describes two ways the wallet does not behave as expected when using the avoid_reuse wallet flag (sidenote: this issue is an excellent example of a good bug report, with detailed steps to reproduce the issue).

This PR addresses one of the two bugs with a fairly small fix, but there is much to dig into regarding avoid_reuse, coin selection, and whether this is the best solution to the problem.

The avoid_reuse wallet flag was introduced in #13756 by kallewoof, who subsequently followed it up with #16239 to mitigate exposure to dust attacks. The avoid_reuse feature achieves this by:

avoiding spending from destinations that the wallet has previously spent from, and
attempting to sweep larger parts of the outputs to a destination when it does spend, from a destination that has multiple outputs available.

Today’s PR addresses the latter of the two. The main keyword to look for in the code is GroupOutputs.

The other error reported in issue #17603 was a misrepresentation of the wallet balances in the RPC getbalances. This was resolved by #17843 “wallet: Reset reused transactions cache”.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)
This PR deals with the broad topic of Coin Selection (here is a recommended reference work on the subject by Murch). Coin selection changes seem to have a hard time getting reviews. Why do you think that is? What are some bad things that can happen if something goes wrong in coin selection?
What is the problem with coin reuse? How can it be exploited, and what approach does the avoid_reuse feature employ to solve it? Are you aware of other ways to mitigate attacks, perhaps implemented by other privacy-focused wallets?
Can you describe the problem this PR tries to solve in your own words, including the necessary pre-conditions?
How does this PR currently attempt to resolve the issue?
How does the current approach compare to the previously proposed approach by the author? Can you think of other (better?) ways to solve it?
More generally, do you like the way avoid_reuse currently handles this case of coin selection? Can you think of different solutions to the problem?

Meeting Log

13:17

<jonatack> git grepping shows EXTRA_DESCENDANT_TX_SIZE_LIMIT = 10000, related?

13:26

<fjahr> yeah, that's also my understanding but I took it from kalle's suggestion here https://github.com/bitcoin/bitcoin/pull/17824#issuecomment-570548151 and did not explicitly clarify if he chose it for another reason.

13:27

<jonatack> right, thanks, it's the only place i saw it too... I think the value should be commented and maybe hoisted to a static constant

13:27

<fjahr> I think at the time the carve-out discussion for lightning was still fresh and I just accepted it as the same number in the back of my head

13:28

<fjahr> jonatack: good point

13:28

<jonatack> also, if you have to retouch, can you make this mini-edit to the avoid_reuse test?

13:28

<jonatack> self.log.info("Test fund send fund send {}".format(second_addr_type))

13:29

<fjahr> haha, yeah, I have seen that a lot, will do :)

13:29

<jonatack> :)

13:30

<jonatack> instead of self.log.info("Test fund send fund send")

13:30

<jonatack> since it's called several times

13:30

<kanzure> ah, "an hour and ten minutes from now" was misleading

13:33

<jonatack> (i don't think anyone will mind if you slip that change into your commit)

13:46

<emzy> day light saving is crazy. Time zones, too.

13:47

<fjahr> Bitcoin fixes this, let's just use blockheight :p

13:48

<pinheadmz> I have a bitcoin block clock over here :-)

14:00

<fjahr> I think it's time

14:00

<fjahr> #startmeeting

14:01

<willcl_ark> hi

14:01

<jnewbery> hi

14:01

<fjahr> Hey everyone! Welcome to this weeks edition of the PR Review Club.

14:01

<lightlike> hi

14:01

<ajonas> hi

14:01

<jkczyz> hi

14:01

<platesondeck> Hello

14:01

<nothingmuch> hi

14:01

<pinheadmz> hi

14:01

<jonatack_> hi

14:01

<fjahr> We are talking about a PR from I made about 2 months ago. There are not a lot LOC to review but personally I learned a lot about avoid_reuse/avoidparticalspend and how it effects coin selection so I hope you find it interesting. Also I want to pick your brain if this is the best possible solution to the problem :)

14:01

<emzy> hi

14:02

<michaelfolkson> hi

14:02

<notmurch> hello

14:02

<pinheadmz> yeah i didnt even know this was a wallet feature

14:03

<fjahr> Let's start with the typical first questions but just ask any questions at any point and I will try to keep up: Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)

14:03

<fjahr> pinheadmz: it was also new to me

14:03

<jonatack> Concept ACK; studying the implementation

14:04

<fjahr> Maybe also someone who also recreated the bug? The original bug issue had nice command line instructions for regtest.

14:05

<jonatack> Yes, reproduced it while reviewing https://github.com/bitcoin/bitcoin/pull/17838

14:05

<willcl_ark> yes concept ACK for me also. I'm always surprised how difficult to implement coin selection seems to be

14:05

<jonatack> "test: test the >10 UTXO case for output groups"

14:06

<jonatack> which you fixed with #17843

14:06

<fjahr> Great, maybe let's move on to my next question: This PR deals with the broad topic of Coin Selection. Coin selection changes seem to have a hard time getting reviews. Why do you think that is? What are some bad things that can happen if something goes wrong in coin selection?

14:07

<pinheadmz> coin selection is responsible for TX size, fee, privacy...

14:07

<willcl_ark> I think it's because there is no "correct" answer for which tradeoffs are best for any given user situation

14:07

<willcl_ark> does a user want "more private", or "cheaper" for example...

14:07

<amiti> its really hard to properly predict the possible eventual effects of small changes to coin selection can have .. since it can compound over time

14:08

<michaelfolkson> So I can't remember where this was discussed. Maybe on a podcast. But didn't Murch's changes based on his thesis get reverted?

14:08

<pinheadmz> michaelfolkson: murch mentioned that to me yeah

14:08

<jonatack> My impression from talking with achow101 is that few people understand it well, apart from notmurch and I suppose people like instagibbs, kallewoof, achow101, and few have to deal with very large wallets

14:08

<pinheadmz> i was surprised this is feature even a thing -- IIUC, we spend 10 UTXOs just to consolidate addresses?

14:09

<willcl_ark> achow101 also has some Twitch streams where he talks a lot about coin selection :)

14:09

<jonatack> and bitcoin core's wallet isn't adapted to large-scale exchange-sized applications afaik

14:09

<michaelfolkson> What was the reason for reverting Murch's changes? Some impact on the network?

14:09

<jnewbery> pinheadmz: I don't think 'consolidate addresses' is the right way of putting this. The aim is to spend all outputs to the same address in the same tx

14:10

<pinheadmz> sure couldnt think of a good term

14:10

<fjahr> jnewbery: yes!

14:10

<pinheadmz> but still, you could end up with a TX 10x in size to gain this privacy

14:10

<jnewbery> (consolidate addresses implies to me the opposite - that you're taking multiple addresses and doing something that would link them)

14:11

<pinheadmz> its a UTXO consolidation though

14:11

<notmurch> michaelfolkson: my initial dabblings with Coin Selection got reverted but it was two years before my thesis

14:11

<pinheadmz> kind of pretending that all outputs to an address are just one big single utxo

14:11

<pinheadmz> murch is here!

14:12

<jonatack> michaelfolkson: (iirc that was an livera podcast with achow101 that you are referring to)

14:12

<fjahr> pinheadmz: yepp, and even more if you spend multiple groups

14:12

<pinheadmz> fjahr: so this could be an expensive mechanism. Was it controversial?

14:12

<michaelfolkson> Ah cool. So the primary conclusions of the thesis haven't been discarded notmurch?

14:13

<achow101> michaelfolkson: murch's thesis is the BnB coin selection algo we use in core now

14:13

<notmurch> My first attempt to change Bitcoin Core's coin selection made it stop spending uneconomic inputs

14:13

<achow101> I'm kind of trying to get the second half of his conclusions into core as well (the random selection fallback part)

14:13

<michaelfolkson> Cool, thanks. And thanks jonatack yup. Transcript is here: https://stephanlivera.com/episode/99/

14:13

<fjahr> pinheadmz: I didn't go through that again in my prepbut I can not remember a big discussion

14:13

<notmurch> this caused a notable increase in the UTXO set in the following months and got it reverted

14:14

<jonatack> yes, see src/wallet/coinselection.cpp::20-

14:14

<jonatack> for the BnB algo

14:15

<fjahr> Ok, pinheadmz: I remember there were some comments on the number 10 but it was kind of accepted, can not find it now

14:15

<fjahr> strike the ok :)

14:15

<fjahr> Well, next question: What is the problem with coin reuse? How can it be exploited, and what approach does the avoid_reuse feature employ to solve it? Are you aware of other ways to mitigate attacks, perhaps implemented by other privacy-focused wallets?

14:15

<jnewbery> pinheadmz: can you explain why 'this could be an expensive mechanism'?

14:16

<pinheadmz> jnewbery: adding 10 inputs to a TX when only 1 is sufifcient to cover the output value ?

14:17

<amiti> my understanding is that the `avoid_reuse` feature is mainly a way to prevent a dust attack?

14:17

<platesondeck> i'm sorry, but could someone pls explain to me what exactly you mean by coin reuse?

14:17

<fjahr> amiti: right, someone want to define dust attack?

14:17

<notmurch> platesondeck: when funds are received to the same address twice

14:17

<fjahr> address reuse, sorry :)

14:17

<jnewbery> pinheadmz: those 10 inputs need to be spent eventually! It's only more expensive if either (i) you happen to create the spend during a high-feerate period (ii) somehow using more inputs prevents BnB from finding a solution that doesn't create change

14:18

<andrewtoth_> jnewbery why can't the other 9 inputs be hodled forever?

14:18

<ecurrencyhodler> Dust attack: It's when someone sends a small amount of Bitcoin to a known Bitcoin address to track it and see if they can link it to other parts of your Bitcoin holdings.

14:18

<amiti> a dust attack is where an adversary sends dust to lots of different addresses, then waits to see where its used to link to other addresses

14:19

<platesondeck> fjahr ok the terminology threw me for a loop

14:19

<jnewbery> andrewtoth_: :)

14:19

<fjahr> amiti, ecurrencyhodler: correct

14:19

<notmurch> andrewtoth_: if you ever want to make use of the value of an unspent, it will need to be spent eventually

14:19

<fjahr> platesondeck: sorry about that

14:20

<notmurch> Regarding dusting: the main issue is that these dust utxo will be picked up by another transaction leaving the wallet creating a link between two addresses. In combination with the "single sender" heuristic

14:20

<notmurch> this allows to tie together a larger cluster of addresses

14:21

<jkczyz> notmurch: but if it's dust, it presumably doesn't (or didn't) have much value and wasn't yours to begin with. Is the only reason not to eventually spend to avoid UTXO bloat?

14:21

<notmurch> *a link between the previous transaction and the new transaction

14:21

<fjahr> on the 10 inputs: in terms of privacy also you have only one chance for the first time you spent from a destination, so ideally you can sweep all of it at once but 10 is the arbitrary cut-off

14:21

<notmurch> jkczyz what if what's dust today eventually is worth spending?

14:21

<willcl_ark> why not always spend all from the address?

14:21

<ecurrencyhodler> Tying @notmurch's comment in with address reuse: If someone sends dust to an address and it is reused, then the wallet will include it into a spend which would tie together that cluster of addresses.

14:21

<platesondeck> could there be a wallet feature that picks a dust utxo and uses it to fill the mining fee for a regular transaction gradually instead of waiting to collect all the dust in one big pile

14:22

<pinheadmz> i wonder if this will even be an issue as bip44 wallets dominate

14:23

<fjahr> willcl_ark: because that might then result in high fees, the 10 outputs group limit is trying to strike a balance

14:23

<notmurch> platesondeck: the link is being created by a transaction referencing the UTXO in its inputs. Hence that's indistinguishable from any other way of spending it ;)

14:23

<jkczyz> notmurch: understood but it could still be considered dust relative to the rest of your coins ;)

14:24

<fjahr> Bonus question: who can give a quick definition of a "destination" as it is used in this context?

14:24

<willcl_ark> fjahr: but if you have more than 10, then using _only_ 10 seems like it has all been for nothing? Might as well choose "up to 10, if <=10 total" or else disregard this policy for this address?

14:24

<jonatack> Does anyone know how the outputs groups limit (OUTPUT_GROUP_MAX_ENTRIES = 10 in wallet.cpp::46) was decided?

14:24

<andrewtoth_> destination would be any address derived from same pubkey, so could be p2pkh, wrapped p2sh, p2wpkh

14:25

<willcl_ark> e.g.: if you have 12 UTXO at that address, and you use only 10, still another spend will link them, therefore using 10 previously, when you only needed 1, was just wasting tx fees?

14:25

<fjahr> jonatack: should be in the avoidparticalspend code, but i did not pull it up

14:27

<jnewbery> willcl_ark: using inputs early is not wasting fees. Those inputs need to be spent eventually

14:27

<fjahr> willcl_ark: you are already getting towards my later questions :) buy maybe I can pull it up.

14:28

<fjahr> How about we discuss this q first which was last in my notes: More generally, do you like the way avoid_reuse/avoidpartialspends currently handles this case of coin selection? Can you think of different solutions to the problem?

14:28

<jnewbery> there may be some minor effect from giving BnB less freedom in the way it chooses inputs, but simply using up a UTXO now instead of later does not change the total fee across time that the wallet pays

14:28

<ecurrencyhodler> What is BnB short for?

14:28

<fjahr> Branch'n;Bound

14:29

<willcl_ark> jnewbery: hmmmm. Let me think about this. I agree that the second spend is not so costly, because you already merged those 9 un-needed UTXOs into a single change output, but I will need to "spend them twice" to spend them once, if you see what I mean

14:30

<jonatack> fjahr: i see it was added by kallewoof in 18f690e "wallet: shuffle coins before grouping, where warranted"

14:31

<fjahr> jonatack: ok, I think I remember promag asking about the number and kalle said it was kind of arbitrarily chosen, or am I making that up? :)

14:31

<jonatack> but he hoisted the pre-existing value of 10 to a constant, it was already previously set to 10

14:32

<jonatack> yes, seems so

14:32

<jnewbery> willcl_ark: it's useful to draw out all the inputs/outputs from multiple transactions to get your head around it. If you assume constant feerate (which is a bad assumption), then the only way you can save on tx fees is by avoiding creating change outputs. Any other strategy of spending your UTXOs will result in the same txfees.

14:32

<jonatack> "Cases where we have 11+ outputs all pointing to the same destination may result in privacy leaks as they will potentially be deterministically sorted."

14:34

<fjahr> So does anyone else have opinions on wills question? I personally have the feeling that this is kind of an edge case and while the could be a better algorithm it might not be worth the effort to implement in this case? Do you agree or disagree?

14:34

<jnewbery> here's the comment where 10 was chosen: https://github.com/bitcoin/bitcoin/pull/12257/files#r204171876

14:35

<willcl_ark> hmmm. seems to me then in response to jnewbery and fjahr's earlier question, that the easiest solution to reason about is to have the policy that "all UTXO at an address always spent together", which could then warn the user if #UTXO > 10, and of course be opted out of... otherwise the user is not sure what they are getting?

14:35

<jonatack> jnewbery: thanks!

14:35

<platesondeck> This might be a bad question but is there a way to send small change amounts to your lightning node at the time of transaction? Would that still result in some trail when the channel is closed later on?

14:36

<willcl_ark> I guess some static donation addresses exist which could have _very_ high numbers of UTXOs though

14:36

<pinheadmz> 1andreas... :-)

14:37

<willcl_ark> jnewbery: I see your fee argument now; you are just paying the fee _now_, instead of _later_ :)

14:39

<michaelfolkson> platesondeck: You mean increase the size of an existing channel (splicing) or opening a new channel? Currently (before Schnorr) you would still see funds have been sent to a 2-of-2 multisig address and then left that 2-of-2 address

14:39

<jnewbery> willcl_ark: exactly. You can think of the fee as already implicitly part of the each UTXO. In fact, BnB uses the implicit amount of UTXOs when selecting them (the value of the UTXO minus the fee required to spend it)

14:40

<fjahr> platesondeck: you could also do a submarine swap with someone but that is not something that would implement in the core wallet

14:41

<fjahr> Aside from the more broad discussion on coin selection: Can you describe the problem this PR tries to solve in your own words, including the necessary pre-conditions?

14:41

<jnewbery> whenever you create a transaction in your wallet, it results in either one new UTXO (change) or zero new UTXOs (no change) in your wallet. You save fees in the long run if you can make transactions that result in no new UTXOs in your wallet.

14:41

<chanho> jnewbery: so the crux of the argument assuming constant fee rate seems to be that you essentially want to minimize the number of change outputs when you consider all the ways of spending the total of the given UTXO set?

14:42

<pinheadmz> fjahr: I think the bug is, if you have 11 utxos for 1 destination, instead of spending the group of 10, it spends the "group" of 1

14:42

<jnewbery> chanho: yes, that's true in general. Less change outputs => less txfees

14:43

<pinheadmz> but then it also labels the the group of 10 as already resused

14:44

<fjahr> pinheadmz: yes, or more generally 'mod 10' could also be 21, 12 etc.

14:45

<fjahr> the case described in the issue where mod10 is 1 certainly looks the weirdest to the user because it seems likee the wallet flag is not working

14:45

<jonatack> chanho: minimising tx fees and also maximising privacy by minimising addresses used together and preferring split over several utxos

14:45

<notmurch> sorry for slow reply, willcl_ark: by sweeping in two transactions, the cost increases are having a second output, paying for two transaction headers and having to spend two inputs later instead of one.

14:46

<willcl_ark> sorry to be hung up on this, I hope I'm not de-railing... but why not: if (can be done without change); else (spend all from address)?

14:48

<fjahr> all good, as I said in the beginning it's the point of this review club to learn about this stuff and since we have the experts here... :)

14:48

<notmurch> chanho: avoiding change also improves privacy and also increases the amount of balance that remains spendable in your wallet

14:50

<fjahr> On the PR specifically: How does this PR currently attempt to resolve the issue? What 'mechanism' is it using?

14:52

<pinheadmz> well theres this new vecotr of detinations called full_groups

14:53

<fjahr> pinheadmz: yes, but what is doing with the group that is not full?

14:53

<pinheadmz> and i know from context its preferring it

14:53

<pinheadmz> but the it.second and it.first stuff is out of my scope

14:54

<pinheadmz> looks like it iterates through the full_groups first

14:54

<fjahr> the key are the ancestors

14:55

<pinheadmz> yeah so that comes in to play later in the coin selection algortihm?

14:55

<pinheadmz> like theres no other "pregerence" property?

14:55

<fjahr> yes, it is currently using the number of ancestors (max_anchestors - 1) to 'penalize' the 'mod 10 leftovers' group so it gets chosen last

14:56

<pinheadmz> so maybe this is too low level but "for (auto& it : gmap)" -- creates an iterator `it` ?

14:57

<pinheadmz> and that has .first and .sceond properties?

14:57

<fjahr> You can see how it takes effect in 'SelectCoins' in the that starts with 'bool res = value_to_select <= 0 ||'

14:57

<fjahr> only 3 minutes left, shoot if you have any questions left :)

14:58

<jnewbery> I have a question: does 10 inputs seem aggressively low to people, given the goal is to avoid transactions becoming too big?

14:58

<jonatack> I'm curious where the 9999 and 10000 max ancestor values come from, and need to study SelectCoins()

14:58

<fjahr> pinheadmz: https://github.com/bitcoin/bitcoin/blob/master/src/wallet/wallet.cpp#L2395

14:59

<Murch> jnewbery:Could you define the goal more specifically?

14:59

<platesondeck> if you already have dust would it be more privacy preserving to collect 1 piece of dust with your next transaction instead of all together?

14:59

<jnewbery> a regular P2PKH tx with 10 inputs and 2 outputs is ~1500 vbytes, which isn't enormous. We've seen much larger utxo consolidations than that

14:59

<jnewbery> Murch: https://github.com/bitcoin/bitcoin/pull/12257/files#r204171876

15:00

<jnewbery> oh no. What happens when a Murch and a notmurch collide?

15:00

<Murch> jnewbery: Well, when the fees are high, ten inputs is kinda sizeable already. When the fees are low, 10 isn't all that much

15:00

<Murch> platesondeck: depends on whether they're on the same address or different ones

15:00

<nothingmuch> is thinking of n outputs with a single yet unused script as a single output with n times the overhead missing anything?

15:00

<fjahr> platesondeck: can you be more specific? dust in the same destination?

15:01

<jnewbery> well if you're going to talk about coin selection being dynamic to prevailing feerates, that's a whole larger topic

15:01

<nothingmuch> if not, i would have expected limiting the size to have happened afterwards (i.e. first select unbounded group, then only spend up to some limit from it) instead of constructing separate groups ahead of time

15:01

<jnewbery> I'm just saying that if the goal is to avoid large txs, then we can go a lot higher than 10 inputs, which would confer these privacy benefits on more users

15:02

<willcl_ark> tbh, 10 seems ok for "regular users" (from personal experience), I guess anything larger than that is just "considered edge case"?

15:02

<jonatack> fjahr: it seems to me that adding more documentation in your PR on these things while working on it and figuring it all out could be a real value-add

15:02

<Murch> jnewbery: well, at one sat per vB, I like my txes to have 50+ inputs :p

15:02

<amiti> pinheadmz: `for (auto& it: gmap)` creates an iterator `it` that just points to each element and iterates through. in this case, gmap is a `std::map<CTxDestination, OutputGroup>`, so first and second point to the tx-dest vs output-group respectively

15:02

<jnewbery> I think having a limit of 10 also changes the dust attack from "send one dust output to the address" to "send 11 dust outputs to the address", rather than removing it entirely, no?

15:03

<michaelfolkson> Agreed jonatack

15:03

<Murch> nothingmuch: you could see it like that if you always enforce avoiding partial spends

15:03

<fjahr> jonatack: do you mean on the features I am editing or the change itself?

15:04

<willcl_ark> jnewbery: does a dust attck have to send to an "empty" (previously all spent) address? if there are already UTXOs there, then does dust make any difference vs tracking a current UTXO?

15:04

<jnewbery> Murch: yes, but that's a much more general topic than for destination groups. I want my coin selection to choose few inputs when fees are high and many inputs when fees are low. That's a more general thing than just destination groups.

15:05

<Murch> jnewbery: for avoiding partial spends yeah, I would agree that 10 is kinda low. However, if you already got paid ten times to the same address, it sounds a lot more likely you will receive future payments there again…

15:05

<jonatack> fjahr: the changeset for sure. maybe an extra commit for relevant related code

15:05

<fjahr> ok, we are 5 minutes over so I will call #endmeeting but feel free to keep the conversation going :)

15:06

<jnewbery> Thanks fjahr. Great meeting!

15:06

<Murch> willcl_ark: no, a dust attack can also send to an address that has coins currently already

15:06

<jonatack> jnewbery: maybe add a config or option arg?

15:06

<michaelfolkson> Nice work fjahr on your first one :)

15:06

<fjahr> Thanks everyone for participating, especially murch!

15:06

<willcl_ark> Murch: what is the difference between just tracking an already-existing UTXO in that case?

15:07

<Murch> willcl_ark: if the user decides to spend anyway, it may increase the cost of their transaction

15:07

<willcl_ark> thanks fjahr!

15:07

<jnewbery> before you all go, I'm thinking of moving the meeting back to 17:00 UTC next week. That'd put it back to the same local time for people in the US, but would make it an hour early in local time for people in Europe, until your DST starts.

15:07

<amiti> thanks for hosting fjahr! I'm very unfamiliar with this part of the code & this was helpful exposure

15:07

<jnewbery> any agreement/objections to that?

15:07

<andrewtoth_> Thanks fjahr and everyone!

15:07

<pinheadmz> yes thanks everyone!

15:07

<docallag> Thanks fjahr

15:07

<andrewtoth_> Either time works for me

15:08

<jonatack> jnewbery: no objections here to earlier

15:08

<docallag> In Europe but +1 on the time change

15:08

<Murch> willcl_ark: A lot of wallets actually don't track address reuse, so the expected outcome in many cases would be that people spend it in two different transactions later. Sending to an address that has money still, makes it more likely that someone is sill tracking it, though.

15:08

<platesondeck> good meeting fjahr

15:08

<jonatack> thanks fjahr, great meeting

15:08

<nothingmuch> hmmm, apart from identifying which output is payment and which is change, which is arguably already an issue, is there any downside to later spending marked used outputs only together with change already related to them? trying to figure out if fee considerations and privacy considerations can be considered independently

15:09

<Murch> fjahr: it was fun :)

15:09

<jnewbery> (with apologies to anyone in the southern hemisphere or who don't have DST)

15:10

<willcl_ark> Murch: I kind of understand

15:10

<jonatack> Murch: thanks for swinging by, coin selection is a topic we could definitely spend more time thinking about.

15:10

<Murch> nothingmuch: interesting question

15:11

<Murch> my pleasure

15:12

<Murch> nothingmuch: Privacy is definitely the more mind-boggling one of those

15:13

<nothingmuch> yeah it's related to my previous question... i'm very confused/ambivalent about this behaviour so i'm trying to find equivalences to simplify my mental model

15:14

<jonatack> nothingmuch: agree, seems worth thinking about

15:14

<willcl_ark> might be nice to be able to configure the 10 value at spend time witha flag, or a conf option

15:14

<Murch> I think revealing which one was change might actually be worse than connecting it with another address, still mulling though

15:15

<nothingmuch> and afaict the consolidation can be delayed indefinitely from a privacy POV so long as it ends up in an equivalent state

15:15

<nothingmuch> yes, definitely a serious issue

15:16

<nothingmuch> "arguably already an issue" - i'm assuming here (perhaps incorrectly) that the payment amount is relatively small, so unnecessary input heuristic could identify the change address

15:16

<Murch> nothingmuch: e.g. it would reveal more about what sort of wallet you're running which may allow other guesses such as whether it's possible that there was more than one change

15:17

<jnewbery> willcl_ark jonatack: There's always a temptation when faced with a difficult design decision to resort to "add a config option". I think that's usually the wrong decision, because most people never use them, and they lead to combinatorial complexity.

15:18

<Murch> Right, but let's say you're spending more in the second tx, and it uses multiple inputs. All of them being related to the address that was dusted would be pretty unlikely, so it tells us more about either your wallet composition or your wallt software

15:18

<jonatack> jnewbery: true, and same for changing the value dynamically WRT fees.

15:20

<nothingmuch> Murch: good point

15:20

<nothingmuch> fwiw my mental model for this was suppose don't enable avoid_reuse, and later realize i should have, what would i do manually

15:22

<willcl_ark> jnewbery: I see your point. It does seem a shame though to have an arbritrary and fixed group value hardcoded IMO

Meeting Log – Asia time zone

Host: kallewoof

23:58

<fanquake> kallewoof: we're the same time at last week hey?

02:17

<kallewoof> fanquake: yep!

02:24

<fanquake> kallewoof 👍

04:56

<kallewoof> other-side-of-earth meeting starts in a couple minutes

04:58

⚡ fanquake races to finish making lunch

05:00

<kallewoof> weird. my mac clock was a few minutes ahead for some reason. had to open time preferences to make it refresh.

05:00

<kallewoof> #startmeeting

05:01

<kallewoof> hi

05:01

<aj> hi

05:02

<fanquake> hi

05:02

<akionak> hi

05:02

<Murch> hi

05:02

<anditto> hi

05:02

<fanquake> Murch: glad we've got the expert here

05:03

<kallewoof> Isn't it very early/late for you, Murch?? :o

05:03

⚡ Murch hides in a shadow

05:03

<Murch> 9pm only

05:03

<kallewoof> Oh..!

05:03

<Murch> I'm in the bay area

05:03

<Murch> I should finally look at the PR, though :p

05:03

<kallewoof> I've taken the liberty of summarizing the previous meeting's talking points. I'll post that, and we can air our opinions on what's been pointed out so far, or just "that about covers it".

05:04

<kallewoof> First off, this is about PR #17824 at https://github.com/bitcoin/bitcoin/pull/17824

05:04

<kallewoof> Murch: ^ link :)

05:04

<kallewoof> Question 1: Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)

05:05

<fanquake> Have only glanced over the PR. Read some of the meeting logs from last night.

05:06

<Murch> I have a rudimentary understanding from the previous discussion

05:07

<kallewoof> I've reviewed an earlier version. I think the PR title could use an improvement.

05:08

<kallewoof> Concept ACK on my side, fwiw. It's a bug-fix too, IIRC.

05:08

<Murch> The commit message could be a bit more elaborate

05:09

<kallewoof> Yeah

05:10

<fanquake> kallewoof: did you want to post that summary?

05:10

<kallewoof> yeap

05:11

<kallewoof> I didn't have one for question 1

05:11

<kallewoof> Question 2: "Coin selection changes seem to have a hard time getting reviews. Why do you think that is? What are some bad things that can happen if something goes wrong in coin selection?"

05:11

<kallewoof> Summary of responses for 2: willcl_ark notes that "there is no "correct" answer for which tradeoffs are best for any given user situation, does a user want "more private", or "cheaper" for example..."; amiti notes that it's hard to predict what even a small change will result in; jonatack notes that not a lot of people know the code well enough to be confident enough to review it.

05:12

<jonatack> hi

05:12

<kallewoof> jonatack: hey :)

05:12

<jonatack> (5 am :D)

05:12

<kallewoof> damn..

05:13

<Murch> Yeah, coinselection is fairly multi dimensional and impact is only measurable over a longer period of time

05:13

<Murch> I should make time to review :-/

05:14

<fanquake> > Coin selection changes seem to have a hard time getting reviews. - mainly because there are about 4? active contributors that work on/want to review this.

05:14

<Murch> achow, alex, instagibbs and?

05:14

<kallewoof> Yeah. And it's scary. A screw up could very much lead to a loss of funds.

05:14

<kallewoof> Murch: me, I think

05:15

<Murch> wel, not usually

05:15

<fanquake> The other points are all correct. Very hard to propose any chances to this code that are "obviously" correct.

05:15

<Murch> kallewoof: Not clear to me how you'd lose a lot of money with coinselection

05:16

<Murch> you can make it a bit inefficient, then you'll overpay a bit over a long period of time until you fix

05:16

<Murch> or you can make an actual mistake, that'll cause you to use a lot more inputs at high fees, but you should catch that quickly

05:16

<kallewoof> Murch: coin selection specifically is pretty safe, yeah. but it's a central part of the wallet software.

05:17

<Murch> right

05:17

<Murch> and transaction building is scary :D

05:18

<kallewoof> Question 3: What is the problem with coin reuse? How can it be exploited, and what approach does the avoid_reuse feature employ to solve it? Are you aware of other ways to mitigate attacks, perhaps implemented by other privacy-focused wallets?

05:18

<kallewoof> Summary of responses for 3: "mainly a way to prevent dust attack" (amiti), later explained to be when someone sends a small amount of btc to a known address of yours, in the hopes that your wallet will pick it up when sending money next time, so they can tie multiple bitcoin addresses to you.

05:19

<kallewoof> I'm personally interested in hearing about alternatives. I don't know of any, myself.

05:20

<Murch> Only have one that is a bit of a stretch

05:20

<Murch> Receiving BCH to Segwit addresses is a lot less safe when the address was used before ;)

05:20

<Murch> but yeah, main drawback is loss of privacy

05:21

<Murch> also, people keep mentioning that the user that reuses the address loses privacy

05:21

<Murch> but also the sender that sends to it and the recipient that later gets paid from those funds lose privacy

05:22

<meshcollider> Oops! Sorry I'm late

05:22

<meshcollider> Hi

05:22

<kallewoof> meshcollider: hi! :)

05:22

<Murch> hey meshcollider!

05:22

<kallewoof> Murch: confused about BCH part. are you talkign about the altcoin now?

05:22

<Murch> yeah

05:23

<fanquake> Hi meshcollider

05:23

<kallewoof> oh I think I see what you're saying now. you're talking about another reason why coin reuse is bad

05:23

<kallewoof> right?

05:23

<Murch> right

05:23

<Murch> I mean, the obvious answer to Q3 is that it's a privacy issue.

05:24

<kallewoof> i thought you were talking about alternative solutions to dealing with it

05:24

<kallewoof> right

05:24

<Murch> and you were asking whether there is anything else

05:24

<Murch> nope, sorry, should have been clearer

05:25

<jonatack> meshcollider: hey :)

05:25

<kallewoof> right, yeah, i see what you mean. i simply assumed we all skipped to the latter question, which was a bit odd of me, now that i think about it

05:25

<meshcollider> I definitely agree on the points made above about why coin selection is hard to review

05:26

<meshcollider> The fact there is no correct answer just makes it seem very directionless

05:26

<Murch> oh, I have two more

05:26

<kallewoof> two more reasons why coin reuse is bad?

05:27

<Murch> address reuse makes private key leaks more likely in the case `k` gets reused (apparently this caused the loss of 55 BTC once)

05:29

<Murch> and in more exotic constructions like anyone_can_pay transactions, if you use a UTXO that shares the address with others, you may be signing away the other utxos aswell

05:29

<kallewoof> first time i heard of that. would love to read about it if you have a link :)

05:30

<jonatack> same, TIL

05:30

<Murch> right after I figure out how to paste in weechat

05:31

<Murch> bitcoin.stackexchange.com/a/42380/5406

05:31

<kallewoof> i may be completely off base here, but sighash_noinput would mean address reuse = replayability, no?

05:31

<kallewoof> (sighash_noinput may have a different name these days)

05:32

<aj> kallewoof: yes; you shouldn't use noinput/anyprevout if replayability is a concern

05:33

<kallewoof> aj: this has probably been discussed to death, so i'll ping you after the meeting :P

05:33

<kallewoof> Continuing: Someone raised the question of why there is a limit on the number of UTXOs at all. What do you think is the reason? Fjahr says 'because it can result in high fees', but that's only partially correct.

05:33

<kallewoof> In the same vein: Jonatack asks how the limit (OUTPUT_GROUP_MAX_ENTRIES) was decided; people sort of did find it, but I will answer it for the record: it was arbitrarily decided by me. I figured we would tweak it if it became necessary later.

05:34

<kallewoof> But yeah. Why do you think there's a (currently) 10 UTXO cap per group?

05:37

<Murch> kallewoof: the high fee argument can be sidestepped by making use of the waste metric BnB already uses. A set of 10+ inputs would simply show up as overtly costly in the selection at high fees, but would actually be preferred at low fees

05:38

<Murch> I.e. if you just treat the whole group as one UTXO with a higher cost

05:38

<kallewoof> that's a good point, yeah

05:39

<kallewoof> The primary reason why there is a cap on groups is to avoid the risk of the resulting transaction being so large it breaks consensus. Imagine someone who has 10k tiny outputs all to the same address. If they tried to use this feature with no limits, they would get a single gigantic UTXO which would probably be too large to fit in a block.

05:40

<kallewoof> maybe 10k is too low, but you get the point

05:40

<Murch> yeah

05:40

<jonatack> jnewbery found the link here https://github.com/bitcoin/bitcoin/pull/12257/files#r204177652 where you introduced it

05:40

<kallewoof> ahh okay, i missed that

05:40

<Murch> yeah, but 100 would be fine

05:41

<Murch> unless your transaction also has thousands of outputs at th same time ;)

05:41

<kallewoof> yeah, i think 100 is a good number

05:41

<kallewoof> Question 4 (out of order from review notes): "More generally, do you like the way avoid_reuse/avoidpartialspends currently handles this case of coin selection? Can you think of different solutions to the problem?"

05:41

<kallewoof> I didn't see any specific responses to this question, but I'm raising it since you guys might have thoughts.

05:41

<kallewoof> There's some interesting fee related discussion in the previous meeting. I recommend reading through it.

05:41

<Murch> p2pkh input has 147 or 148 vB, so you can do more than 600 if you don't have too many outputs.

05:41

<Murch> ;)

05:42

<kallewoof> Murch: nice, yeah

05:42

<Murch> Segwit even more, of course

05:43

<kallewoof> I think personally that the people who end up turning on the avoid reuse feature and the people who get non-adversarial payments repeatedly to the same address are different people, so raising the 10 towards a more realistic number is totally fine. Thoughts?

05:43

<Murch> Regarding Q4, I find the 10 limit a bit arbitrary, but 100 is just as arbitrary.

05:43

<jonatack> kallewoof: sgtm

05:43

<Murch> Doesn't Bitcoin Core have a check somewhere whether the transaction size limit is exceeded?

05:44

<kallewoof> 100 is less arbitrary because we've given more thought to it than was put into 10. Or is that not how "arbitrary" works? :)

05:44

<Murch> lol

05:44

<kallewoof> Murch: maybe, but the user will still not be able to spend their gigantic group easily

05:45

<Murch> well, 100 should definitely cover most adverserial dusting attack scenarios

05:45

<Murch> and people deliberately reusing addresses excessively probably don't care as you say

05:46

<jonatack> kallewoof: you mention in the original comment "Perhaps this should be an option"

05:46

<kallewoof> I guess my concern right now is, if you have 200 spends to same address and you spend one of the 100 entry groups, the 'avoid reuse' feature will mark the other one as spent, and remove it from future coin selection (by default; you can still use it by saying 'include used' when doing coin select)

05:47

<jonatack> in the previous session willcl_ark liked that idea, and jnewbery less so (more combinatorial complexity)

05:47

<kallewoof> right. i saw jnewbery's note about preferring to have fewer options. i'm not sure i agree in this particular case. there are clearly two separate groups of users

05:48

<Murch> kallewoof: Checking the limit is not that hard: when you build a transaction, you already know the size of all recipient outputs and the transaction overhead. The only variables are whether or not there is change and how many inputs ther are

05:48

<jonatack> i like the idea, but we may want to provide guidance on how to set it/use it

05:48

<kallewoof> but it may also be that the group that does excessive address reuse should be encouraged to switch to a more dynamic solution (e.g. btcpay instead of a static donation address)

05:48

<Murch> so, one could simply limit such that the transaction remains below the standard limit

05:49

<kallewoof> Murch: that's true. I wonder if the complexity is worth it though.

05:50

<Murch> unlikely, but it would cover the case with 200 utxos if you're seriously concerned :)

05:50

<jonatack> kallewoof: btcpay is what i do currently use for repeat txns, but i would have liked to use the bitcoin core wallet

05:51

<Murch> I think for someone privacy sensitive enough to use avoid_reuse going over 10 utxo received to one address is the edgecase, going over 100 seems exorbitant

05:51

<jonatack> or be able to export an xpub... descriptor wallets should help here, I believe

05:51

<kallewoof> jonatack: I haven't given it a lot of thought, but it would be cool to look into what is missing to get to where you can use bitcoin core

05:52

<jonatack> yes

05:53

<kallewoof> Murch: I think gmax has a small fortune in track-"dust" from early days

05:53

<kallewoof> Murch: over 100 sounds unusual though, yeah, but someone might send a ton just to ensure at least one of their outputs is always included in your coin select

05:53

<Murch> well, we found our first user of SNICKER :p

05:54

<kallewoof> Anyway, I'm gonna call it a close as I ran out of notes and we're already closing in on an hour. Thanks a lot for coming to hang! Hopefully we can do this regularly. :)

05:55

<kallewoof> #endmeeting

05:55

<fanquake> thanks kallewoof

05:55

<Murch> thanks for prepping

05:55

<jonatack> 12 months (or maybe even 52 weeks) might be common enough for the limit bump

05:55

<jonatack> thanks kallewoof, thanks Murch

05:56

<Murch> I've not heard about any dusting attacks since Sochi

05:56

⚡ jonatack waves to fanquake and meshcollider

05:56

<Murch> or at least not any widespread ones

05:56

<kallewoof> jonatack: thanks for joining at such an early hour :0

05:57

<jonatack> 10/10 would do it again :) will post the meeting log to https://bitcoincore.reviews/17824

05:57

⚡ fanquake waves

05:57

<kallewoof> thanks! :D

05:58

<Murch> alright, good night

05:58

<kallewoof> night Murch!