Use Single Random Draw in addition to knapsack as coin selection fallback (wallet)

May 5, 2021

https://github.com/bitcoin/bitcoin/pull/17526

Host: glozow - PR author: achow101

The PR branch HEAD was fac99dc at the time of this review club meeting.

Notes

Coin selection refers to the process of selecting UTXOs (or “coins”) from a wallet’s available UTXO pool to fund a transaction. Generally, the goal is to pick coins that will minimize fees for the user (now and in the long run), help the transaction reliably confirm in a timely manner, and not leak information about the wallet. This Stack Exchange post offers an overview of tradeoffs between different coin selection strategies. We covered coin selection in a previous review club, #17331.
PR #17526 implements Single Random Draw (SRD) as an additional fallback coin selection strategy. SRD is fairly straightforward: it randomly picks OutputGroups from a pool of eligible UTXOs until the total amount is sufficient to cover the payment and fees. Any extra funds are put in a change output.
This means that, with this PR, our coin selection will have three different solvers: Branch and Bound (BnB), Knapsack, and Single Random Draw (SRD). Note that some randomness is used in coin selection, so we won’t always come up with the same solution.
The overall strategy within SelectCoins() (including PR #17331, on which PR #17526 is built):
- Coins manually selected by the user using CoinControl are added first.
- All available coins (excluding the pre-selected ones) are gathered. CoinEligibilityFilters are used to filter these coins within SelectCoinsMinConf(). We have a clear hierarchy of which coin selection solutions are preferred: we first try with a restriction of at least 6 confirmations on foreign UTXOs and 1 confirmation on our own UTXOs. If no solution is found, we try with at least 1 confirmation on all UTXOs. If that doesn’t work and the user allows spending unconfirmed change, we try that (still requiring at least 1 confirmation on foreign UTXOs), gradually increasing mempool ancestor limits on the unconfirmed change.
- Within SelectCoinsMinConf(), OutputGroups are created using the CoinEligibilityFilter. There is a clear preference for BnB and we will only try Knapsack and SRD if that fails. However, we’ll try both Knapsack and SRD together, picking the solution with lower fees, breaking ties by number of UTXOs used.

Questions

Can you give a high-level description of the coin selection strategy including the changes proposed in this PR?
Within SelectCoinsMinConf(), if we have both a Knapsack and an SRD solution, how do we decide which one to use?
Why might we prefer to spend more inputs in the same transaction?
Quiz: Based on the coin selection scheme proposed here, let’s say that Solutions A, B, and C exist (ignore the fact that we would exit early after finding a solution we’re satisfied with). Which would we pick? (Hint: which invocation of SelectCoinsMinConf() would each of these come from?)

Solution A: picked using Knapsack. Produces a change output, pays 100 satoshis in fees, and only uses confirmed UTXOs, each with 4 confirmations.

Solution B: picked using BnB. No change output, pays 95 satoshis in fees, and uses one unconfirmed change output.

Solution C: picked using SRD. Produces a change output, pays 99 satoshis in fees, and only uses confirmed UTXOs, each with 1 confirmation.
What are OutputGroups? Why does SRD pick from output groups rather than from UTXOs?
What does calling GroupOutputs() with positive_only=true do (Hint: you may want to review what effective values are)? What could happen if SelectCoinsSRD() was called with all_groups instead of positive_groups?
What are some ways a deterministic coin selection algorithm might leak information about the wallet’s UTXO pool? Why do we shuffle vCoins before creating OutputGroups?
Bonus: We’ve listed some qualitative (e.g. presence of a change output) and quantitative (e.g. number of inputs used) ways to compare coin selection solutions. Instead of returning as soon as SelectCoinsMinConf() finds a solution, should we try multiple and then pick one? How might we design a metric to decide which one to use?

Meeting Log

19:00

<glozow> #startmeeting

19:00

<jnewbery> hi!

19:00

<Guest13> hi

19:00

<glozow> Welcome to PR Review Club everybody! We're doing Coin Selection, Part 2 this week!

19:00

<larryruane_> hi!

19:00

<glozow> Notes and questions are at https://bitcoincore.reviews/17526

19:00

<michaelfolkson> hi

19:00

<hernanmarino> hi glozow and everyone !

19:00

<glozow> hi hernanmarino!

19:00

<lightlike> hi

19:00

<kouloumos> hi

19:01

<glozow> PR for today is #17526: Use Single Random Draw in addition to knapsack as coin selection fallback

19:01

<murch> hi

19:01

<darius58> hi!

19:01

<emzy> hi

19:02

<Anthony85> 👋

19:02

<glozow> Note that this PR is built on top of #17331, which we covered last week in PR Review Club. It should still be follow-along-able if you weren't here, but those notes could be helpful: https://bitcoincore.reviews/17331

19:02

<glozow> did y'all get a chance to review the PR? y/n

19:03

<michaelfolkson> y ~30 mins

19:03

<emzy> n

19:03

<mixoflatsixo> n

19:03

<dkf> n

19:03

<lightlike> a bit

19:04

<darius58> a bit

19:04

<jnewbery> y

19:05

<glozow> No problem :) let's do some high-level review for starters then. Would anyone like to tell us what Single Random Draw does?

19:05

<murch> y

19:05

<larryruane_> chooses a random set of UTXOs to use as the inputs to a transaction that we're creating

19:05

<hernanmarino> it randomly picks some outputs, until there's enough

19:05

<glozow> hernanmarino: yes!

19:06

<schmidty> hi!

19:06

<glozow> Can someone give a high-level description of the coin selection strategy including the changes proposed in this PR?

19:06

<michaelfolkson> If nothing else works let randomness save the day

19:07

<lightlike> BnB first - if it gives no solution, run both Knapsack and SRD and use whatever is "better".

19:07

<glozow> lightlike: yes!

19:07

<michaelfolkson> "better" defined as lower fees

19:07

<glozow> what constitutes "better" ?

19:07

<larryruane_> and I think try BnB first because if it is able to find a solution, there's no change output, which is nice

19:07

<glozow> Hint, Code here: https://github.com/bitcoin/bitcoin/blob/fac99dcaf33f1fe77b60cb8b0a89b0d47f842d0d/src/wallet/wallet.cpp#L2419

19:07

<darius58> and then if fees are equal, whichever spends more utxos

19:08

<glozow> larryruane_: yes! why is it good to have no change output?

19:08

<glozow> darius58: yep yep!

19:09

<jonatack> hi

19:09

<murch> Because it uses less fees, reduces the overall UTXO in the system, does not put any funds into flight in a change output, and breaks the change heuristic

19:10

<glozow> murch: indeed. wanna define "change heuristic" for us?

19:11

<michaelfolkson> The assumption that there is always an output which is effectively change back to the spender

19:11

<murch> A bunch of the chainalysis techniques revolve around guessing which output is the change returning excess funds from the input selection to the sender

19:12

<murch> When guessed correctly, this can be used to cluster future transactions with this one to build a wallet profile

19:12

<glozow> so there's a privacy component too!

19:12

<murch> by not returning any funds to the sender's wallet, there are future transactions directly related to this transaction (unless addresses are reused)

19:12

<glozow> back to the tie-breaking scheme, why might we want spend more utxos?

19:13

<darius58> to reduce the utxo bloat?

19:13

<lightlike> murch: are *no* future transactions?

19:13

<michaelfolkson> Taking advantage of a low fee environment to consolidate the number of UTXOs held

19:13

<glozow> darius58: yeah! why is that good for our wallet? and why is that good for the bitcoin ecosystem in general?

19:14

<Anthony85> I'm a complete newbie but maybe it just helps ensure that the entire ecosystem doesn't have to work as hard?

19:14

<michaelfolkson> Or maybe a large UTXO in the wallet is protected for privacy reasons and so a bunch of smaller UTXOs are used instead

19:14

<b10c> hi!

19:14

<larryruane_> i've always wondered, if the HD wallet sends change to a brand new address (as it should), does that harm privacy immediately? Or only later when the change is spent? (sorry if this is off-topic by now)

19:15

<darius58> reduces the memory demand for nodes keeping track of the UTXO set. I'm not sure why it's good for the user's wallet. Reduces future fees, plus less UTXOs to keep track of?

19:15

<glozow> Anthony85: good start, work as hard to do what?

19:15

<murch> lightlike: Without a change output and under the assumption of no address reuse, all future inputs would be unrelated to this transaction's inputs.

19:15

<murch> oooh

19:15

<murch> yes "+no"

19:16

<lightlike> murch: ok, then it makes perfect sense to me :-)

19:16

<glozow> darius58: yes good answer!

19:16

<sipa> larryruane_: the HD aspect is immaterial to your question

19:16

<glozow> keeping the UTXO set as small as possible is pretty important

19:17

<murch> darius58: Yes, not creating a change output saves cost now, and then also saves the future cost of spending that UTXO at a later time!

19:17

<glozow> for user's wallet, it's a bit less significant but a similar idea, it'd be nice to have to track/spend fewer UTXOs

19:17

<darius58> @murch my only question though is whether it is cheaper than another solution with less UTXOs in the input, if they had the same fee at the current moment?

19:17

<glozow> and as murch mentioned before, a change output is an additional unconfirmed UTXO we have to track (until the tx confirms)

19:18

<murch> larryruane_: It's a bit more complicated, but e.g. if two different scripts are used for recipient and change, it would still be obvious which one would be change if there is change

19:18

<glozow> alright, Quiz time! Based on the coin selection scheme proposed here, let's say that Solutions A, B, and C exist (ignore the fact that we would exit early after finding a solution we're satisfied with). Which would we pick?

19:18

<murch> darius58: Kinda depends, at low feerates you should probably prefer more inputs, at high feerates you should probably prefer less fees

19:18

<glozow> Solution A: picked using Knapsack. Produces a change output, pays 100 satoshis in fees, and only uses confirmed UTXOs, each with 4 confirmations.

19:18

<glozow> Solution B: picked using BnB. No change output, pays 95 satoshis in fees, and uses one unconfirmed change output.

19:18

<glozow> Solution C: picked using SRD. Produces a change output, pays 99 satoshis in fees, and only uses confirmed UTXOs, each with 1 confirmation.

19:19

<Anthony47> B?

19:19

<glozow> We have 1 vote for B. What do others think?

19:19

<lightlike> I'll try C.

19:19

<darius58> Does it depend if the unconfirmed output in B is an owned change output?

19:20

<sipa> does the answer not depend on predicted ftuture feerate?

19:20

<darius58> i'm guessing C

19:20

<glozow> they're all using the same feerates

19:20

<emzy> I guess B

19:20

<glozow> (sorry if that wasn't clear)

19:20

<michaelfolkson> C too. B's problem is it uses an unconfirmed change output?

19:21

<michaelfolkson> min confirmations by default is 1?

19:22

<glozow> darius58: would we spend any other type of unconfirmed output?

19:22

<glozow> michaelfolkson: right, the relevant code is here: https://github.com/bitcoin/bitcoin/blob/fac99dcaf33f1fe77b60cb8b0a89b0d47f842d0d/src/wallet/wallet.cpp#L2536

19:22

<glozow> The key parts to look at are the `CoinEligibilityFilter`s

19:23

<murch> larryruane_: let's chat later after the meeting

19:23

<glozow> (1, 6, 0)

19:23

<glozow> (1, 1, 0)

19:23

<glozow> (1, 2, 0)

19:23

<glozow> ...etc

19:23

<michaelfolkson> So prefer 6 confirmations but 1 is ok

19:23

<Anthony47> ahh interesting so B would be used if there was 1 confirmed?

100

19:23

<glozow> we'll get to the answer to the quiz shortly (everyone feel free to give you answers as we discuss this)

101

19:23

<kouloumos> michaelfolkson And it becomes 0 on a later invocation so according to the *Hint* I'll also say C

102

19:23

<darius58> @glozow yeah my question didn't make sense lol, of course a change output means of course it was created by the user

103

19:23

<glozow> michaelfolkson: correct, we try 6 confirmations on foreign outputs

104

19:24

<glozow> so the first number in `CoinEligibilityFilter` corresponds to the # of confirmations on foreign outputs

105

19:24

<glozow> wait no

106

19:24

<glozow> that's our own outputs

107

19:24

<glozow> the second number is foreign outputs, sorry

108

19:25

<glozow> so the second number is always at least 1

109

19:25

<larryruane_> I think B because if BnB finds a solution, that's it, we don't even run the other 2 algorithms

110

19:25

<glozow> which means we always use confirmed foreign outputs

111

19:25

<michaelfolkson> larryruane_: "(ignore the fact that we would exit early after finding a solution we're satisfied with"

112

19:25

<glozow> ok, so it seems like we have 3 votes for B and 3 votes for C so far

113

19:26

<Anthony47> I'll stick with my guns (B) but I'm starting to think it is C lol

114

19:26

<glozow> hehe

115

19:26

<lightlike> would be funny if the answer is A

116

19:26

<michaelfolkson> What's the importance of the foreign output? The danger it could be double spent? If it is an output we own we know we won't double spend ourselves?

117

19:26

<glozow> so, in this code block, which invocation of `SelectCoinsMinConf()` gets Solution B, and which one gets C?

118

19:26

<sipa> michaelfolkson: exactly

119

19:26

<glozow> this block: https://github.com/bitcoin/bitcoin/blob/fac99dcaf33f1fe77b60cb8b0a89b0d47f842d0d/src/wallet/wallet.cpp#L2536-L2567

120

19:27

<glozow> You can answer by just saying which `CoinEligibilityFilter` it's called with

121

19:27

<glozow> michaelfolkson: yep. if the foreign wallet creates another conflicting transaction and that gets confirmed, those UTXOs disappear

122

19:29

<glozow> Ok, so I'll answer for Solution A since that's the least popular: we'd pick that using `CoinEligibilityFilter(1, 1, 0)`.

123

19:29

<lightlike> imo, the second call (1, 1, 0) gets A and C, and uses C because of the lower feerate

124

19:29

<glozow> lightlike: bingo!

125

19:29

<murch> So, Solution A and Solution C should be found in the second pass of the eligibility filter but C pays less fees. B uses an unconfirmed input which happens in a later iteration of SelectCoinsMinConf, i.e. vote for C

126

19:30

<glozow> yes, we get B from a call that uses `CoinEligibilityFilter(0, 1, *)`

127

19:30

<glozow> because it has unconfirmed change outputs (that's the 0 in the filter)

128

19:30

<glozow> so what's the answer to the quiz? :)

129

19:30

<murch> C!

130

19:30

<glozow> ding ding ding!

131

19:30

<glozow> can everybody see how we got that answer? are there any questions?

132

19:31

<glozow> (and yes, the next best answer would be A, then B)

133

19:31

<darius58> great question and explanation!

134

19:31

<Anthony47> Is there ever a scenario that B would get selected?

135

19:31

<michaelfolkson> Fun question. The real winner here is glozow

136

19:31

<glozow> Anthony47: yes, if an invocation of `SelectCoinsMinConf()` fails, we try again with a more permissive filter

137

19:32

<Anthony47> gotcha, thanks!

138

19:32

<glozow> but I guess this question is crafted as "all these solutions exist"

139

19:32

<glozow> so it wouldn't fail

140

19:32

<glozow> er, so, no

141

19:32

<glozow> oops

142

19:33

<glozow> (sorry, i just gave a really confusing answer to your question)

143

19:33

<glozow> Ok let's continue with the questions

144

19:33

<glozow> What are OutputGroups? Why does SRD pick from output groups rather than from UTXOs?

145

19:34

<michaelfolkson> An unconfirmed output would be chosen if there was no alternative, I think the answer to Anthony47 question is

146

19:34

<glozow> link to code here: https://github.com/bitcoin/bitcoin/blob/fac99dcaf33f1fe77b60cb8b0a89b0d47f842d0d/src/wallet/coinselection.h#L120

147

19:34

<michaelfolkson> (assuming not foreign)

148

19:34

<darius58> OutputGroups are UTXOs with the same scriptPubkey. And we'd rather spend those together for better privacy and security

149

19:35

<glozow> michaelfolkson: yeah, good summary

150

19:36

<glozow> darius58: yes, I think that's correct

151

19:36

<larryruane_> i'm surprised an OutputGroup doesn't contain the scriptPubKey as a member

152

19:37

<glozow> indeed, we compute these groups on the fly during coin selection using `GroupOutputs()`

153

19:37

<glozow> you could pass in `separate_coins=true` which would put each UTXO in its own outputgroup

154

19:38

<glozow> I wonder if it would be better to keep the UTXOs pre-sorted or something

155

19:38

<glozow> anyway

156

19:39

<glozow> Next question: What does calling `GroupOutputs()` with `positive_only=true` do (Hint: you may want to review what effective values are)? What could happen if `SelectCoinsSRD()` was called with `all_groups` instead of `positive_groups`?

157

19:40

<glozow> does anyone who came to review club last week wanna tell us what effective values are?

158

19:40

<darius58> does it not include UTXOs with a negative effective value?

159

19:40

<glozow> darius58: jup

160

19:41

<murch> effective values are the value of inputs after deducting the input cost at the current selection attempt's feerate

161

19:41

<glozow> murch: good job attending last week's review club! could you break down for us, then, what it means for a coin to have a negative effective feerate?

162

19:42

<lightlike> so we'd use groups that spend more fees than their UTXOs are worth and burn money.

163

19:42

<glozow> lightlike: exactly. and why would this suck if we're doing a single random draw?

164

19:42

<michaelfolkson> (inside joke, murch hosted last week's review club :) )

165

19:43

<michaelfolkson> https://bitcoincore.reviews/17331

166

19:43

<murch> lightlike: Does it group only UTXOs with positive values or does it only allow groups that are positive in sum or does it only create groups from scriptPubKeys that only have positively valued utxos?

167

19:43

<glozow> big perks for returning review clubbies, u get jokes 😛

168

19:44

<lightlike> murch: I thought the criterium would be to be positive in sum, but I am not sure at all.

169

19:45

<darius58> @murch the first one you said: group only UTXOs with positive value

170

19:45

<michaelfolkson> So a negative effective feerate, it is effectively dust? It costs more to spend than the value of the output?

171

19:46

<darius58> @glozow it would suck because there would be some cases where SRD would fail because it couldn't reach a high enough amount, even if it could have by not including negative-effective value UTXOs

172

19:47

<murch> michaelfolkson: People use "dust" to denominate various things. Some people refer to UTXOs that are uneconomic at 3 sat/vB as dust, some refer to UTXOs that are uneconomic at the current feerate as dust, some just seem to call anything below e.g. 2000 sats dust.

173

19:47

<murch> darius58: Excellent thinking :)

174

19:47

<glozow> michaelfolkson: it's worse than dust, we'd need to select even more inputs to cover the extra cost

175

19:48

<glozow> darius58: yes! you could literally be increasing the value you need to select while drawing

176

19:48

<michaelfolkson> murch: I was thinking that. Need an authoritative definition of dust. Thanks

177

19:48

<lightlike> also, we might spend a LOT of dust by repeately choosing negative ouputgroups if we have many, and pay a lot of unnecessary fees.

178

19:48

<murch> michaelfolkson: https://bitcoin.stackexchange.com/a/41082/5406

179

19:49

<michaelfolkson> murch: Cool

180

19:49

<glozow> let's talk a little bit about the benefits of single random draw

181

19:49

<michaelfolkson> Lovely 2013 comment "Considering 0.01BTC as dust is probably outdated" lol

182

19:50

<murch> lightlike: Or if the currentfeerate is just extremely high, a lot of the otherwise fine UTXOs are suddenly uneconomic

183

19:50

<glozow> What are some ways a deterministic coin selection algorithm might leak information about the wallet's UTXO pool?

184

19:50

<glozow> A fun read on coin selection algos and their tradeoffs: https://bitcoin.stackexchange.com/questions/32145/what-are-the-trade-offs-between-the-different-algorithms-for-deciding-which-utxo/32445#32445

185

19:51

<murch> haha

186

19:51

<murch> I was just going to link to that

187

19:51

<michaelfolkson> StackExchange and coin selection is the perfect mix

188

19:52

<glozow> sometimes when i type stackexchange it autocorrects to "Murch's blog"

189

19:52

<lightlike> that link kind of answers that question :)

190

19:53

<michaelfolkson> glozow: Without giving specifics if a coin analysis/surveillance company knows how you are coin selecting (no randomness) they get a better idea of what output(s) are yours?

191

19:53

<murch> So, some coin selection strategies for example reveal the oldest UTXOs or the largest UTXOs. In the combination of address reuse or clustering via change output heuristics, or other wallet fingerprints, this can allow a sufficiently motivated adversary to guess the amount of coins, payment frequency and other things

192

19:53

<glozow> oops, yeah. so SRD avoids any privacy leaks related to picking coins deterministically

193

19:53

<murch> SRD selecting randomly muddles some of these

194

19:54

<michaelfolkson> Some randomness kinda obfuscates things

195

19:55

<murch> oh, 2014 murch thought that address reuse saved input costs. Gonna have to edit that.

196

19:55

<michaelfolkson> Bad 2014 murch

197

19:55

<glozow> To be thorough... Why do we shuffle `vCoins` before creating OutputGroups?

198

19:55

<lightlike> if SRD encounters a large outgroup first by chance, will it just stop, use that and create a big change?

199

19:55

<glozow> Hint: https://github.com/bitcoin/bitcoin/blob/2b45cf0bcdb3d2c1de46899e30885c953b57b475/src/wallet/wallet.cpp#L2503

200

19:55

<murch> lightlike: Yes!

201

19:56

<murch> oh yeah, that's another thing. Due to the random selection the distribution of change output sizes is not as homogenuous

202

19:56

<glozow> Hint 2: Remember how we talked about the fact that SRD picks randomly from `OutputGroup`s, not singular UTXOs?

203

19:56

<murch> E.g. Knapsack always tries to create the same change output size which is a terrible give-away

204

19:56

<murch> Also, having many UTXO of the same size isn't really useful

205

19:57

<michaelfolkson> Unless you want to do some coinjoining?

206

19:57

<glozow> michaelfolkson: not with your own wallet

207

19:57

<lightlike> murch: that sounds less than ideal to me for some use cases: if I want to pay for a small thing, I wouldn't necessarily want to make public how rich I am (if I have some large outgroups)

208

19:58

<darius14> If they're not shuffled then the order of the inputs may reveal information about how they were sorted?

209

19:58

<murch> glozow: Actually that kinda sounds like a bug, that means that outputgroups are more likely to be picked.

210

19:58

<murch> It would be better to create the output groups first, then shuffle the output groups, then shuffle the UTXOs in the group

211

19:58

<glozow> darius14: yep! we wouldn't want to make the groups deterministically if we had more than 11 UTXOs for the same SPK

212

19:58

<michaelfolkson> glozow: No but maybe you are organizing your UTXOs in preparation to do some future coinjoining with other people's UTXOs. I dunno

213

19:59

<darius14> is there something specific about '11' in that comment, or it just means for a lot of UTXOs?

214

19:59

<glozow> murch: if you had 500 UTXOs for the same SPK, if you didn't shuffle them first, you'd always make the same groups

215

19:59

<glozow> darius14: the maximum number of UTXOs for a `OutputGroup` is 10 right now

216

19:59

<murch> lightlike: that's a good point, but generally you'll have a lot more smaller UTXOs than large UTXOs, so it would be somewhat uncommon. Also if you use a small input, you'll need multiple or get back tiny change, which are also suboptimal

217

20:00

<glozow> relevant: https://github.com/bitcoin/bitcoin/pull/18418

218

20:00

<murch> darius14: I think that's a reference to outputgroups being limited to 10 UTXOs currently

219

20:00

<murch> there is a PR open to push it to 100, though

220

20:01

<darius14> @murch oh i see, and why are output groups limited to 10 UTXOs?