The PR branch HEAD was e6fe1c3 at the time of this review club meeting.
Notes
The way our wallet constructs transactions over time can leak information
about its contents. The most obvious example is we can assume that all UTXOs
sent to the same scriptPubKey are controlled by the same person. UTXOs sent to
different addresses may also be linked if they are spent together (a common
heuristic used in chain analysis). Thus, if we’re not careful, observant
attackers can link addresses to estimate our wallet balance and, if any one of
our addresses is deanonymized (e.g. we send it to an exchange, merchant, or
block explorer that knows our personal information or IP address), we might
accidentally reveal how much money we have!
The Bitcoin Core wallet implements a few best-practice privacy techniques.
One is avoiding the reuse of addresses when creating an invoice or change
address. Another is grouping UTXOs into
OutputGroups
by scriptPubKey and running coin selection on the groups rather than individual
UTXOs.
However, each OutputGroup can grow quite large. It might
not make sense to fund a 0.015 BTC transaction by sweeping a group of 150 inputs
worth 10 BTC (not to mention the extra fees for all the unnecessary inputs).
The OUTPUT_GROUP_MAX_ENTRIES constant limits the number of UTXOs per
OutputGroup.
Within GroupOutputs(), if we have more than
OUTPUT_GROUP_MAX_ENTRIES with the same scriptPubKey, we batch them
into multiple OutputGroups with up to OUTPUT_GROUP_MAX_ENTRIES UTXOs each.
If we are excluding “partial groups,” we won’t use non-full
OutputGroups in coin selection.
PR#18418 increases
OUTPUT_GROUP_MAX_ENTRIES from 10 to 100. The number 100 was suggested
during a previous review club.
This behavior change constitutes just one line (and some adjustments to the
tests), but it is ripe with opportunities to explore how coin selection works.
Try adding some log statements, re-compiling and then re-running the tests
(hint: you can use test/functional/combine_logs.py to see logs, and you
assert that your logs are printed by adding with
node.assert_debug_log(expected_msg=[your_log_statement]) to the functional
test).
Some good tests to play around with are wallet_avoidreuse.py and
wallet_groups.py.
The PR author, fjahr, has written an excellent guide to
debugging Bitcoin Core with some
hints on adding logging and using debuggers.
You can also tinker with some of the constants (maybe poke around for
off-by-one errors) and see if things break!
You may find some previous review clubs helpful:
Review Club #17824 discussed the avoid_reuse flag.
Review Clubs #17331 and #17526 discussed coin
selection.
What do the avoid_reuse wallet flag and -avoidpartialspends wallet
option do? Why might we want one to automatically turn on the other?
If your wallet has 101 UTXOs of 0.01 BTC each, all sent to the same
scriptPubKey, and tries to send a payment of 0.005 BTC, avoiding partial
spends, how many inputs will the resulting transaction have (Hint: this is
almost exactly the test_full_destination_group_is_preferred test case in
wallet_avoidreuse.py).
In that test case, what is the fee amount paid for the 0.5 BTC transaction?
(Hint: try import pdb; pdb.set_trace() and call the
gettransaction
RPC).
Can you have multiple UTXOs under the same address if you set
avoid_reuse=true?
What are the advantages, disadvantages, and potential risks to users of
increasing OUTPUT_GROUP_MAX_ENTRIES?
What do you think of increasing OUTPUT_GROUP_MAX_ENTRIES to 100,
specifically?
<glozow> Let's try a motivating example for the PR. Today (with `OUTPUT_GROUP_MAX_ENTRIES` = 10), if you have `avoid_reuse` and `avoidpartialspends` and a group of 15 UTXOs to the same scriptPubKey, what happens if you spend 10 of them but not the other 5 in a transaction?
<murch> michaelfolkson: Presumably there would be an "insufficient funds" message, I don't know whether the avoided reused addresses get mentioned. Would doubt it
<prayank> A. If custom change address is used (any address that was not created with `getrawchangeaddress` RPC in the same wallet), replacement tx will have 101 inputs
<prayank> B. If custom change address is used with label (address that was created with `getrawchangeaddress` and label was set with `setlabel` RPC), replacement tx will have 101 inputs
<lightlike> in that case we'd have two output groups, one with 100utxos and one with 1 utxos. Does the coin selection algorithm always choose the bigger output group if both output groups would be viable for the tx?
<murch> glozow: I think that a partial group refers to a group that isn't full in the presence of full groups. I.e. if you had 105 UTXOs, the group with 5 would be a partial group since a full group exists
<lukaz> a partial group is an OutputGroup with less than `OUTPUT_GROUP_MAX_ENTRIES`. A partial spend is when only some UTXOs from a spk are used to fund a tx
<glozow> murch: lukaz: ya! so in initial coin selection attempts when we're excluding partial groups, we'll only include the group with 100. if we had a group of just 2, though (not 102), we wouldn't consider that a partial group
<glozow> lightlike: right, so i assume that's why fjahr has updated the helpstring to say "Group outputs by address, selecting many (possibly all) or none"
<lightlike> murch: but the naming is certainly confusing if avoid_reuse is a strict no-go for reusing, and avoid_partialspends just means "we'll try our best"
<glozow> yeah. if you're at a high-ish feerate because you want to make a transaction now, you'll pay more in fees for those UTXOs. it might also cost more to fee-bump
<glozow> but money-wise, you might win because you won't have a situation where you're throwing away UTXOs from the combination of `avoid_reuse` and `-avoidpartialspends`?
<jnewbery> murch: am I right in saying that it's advantageous to branch and bound to have more UTXOs rather than fewer, since it'll be more likely to find a solution that results in no change?
<dariusp> yeah, if you're concerned enough about privacy to not want to use a dirty UTXO, wouldn't you rather just spend it? So by that logic it seems like you'd rather not have any limit?
<murch> The restriction of a barrel of UTXOs only being permitted to be spent as a group definitely restricts the combination space for viable input sets
<murch> lightlike: If you have a donation address, that should perhaps be a separate wallet, or then avoid_reuse simply prevents the intermingling of funds until you manually sweep the donations
<glozow> So back to dariusp's point on "why have a limit at all?" What `OUTPUT_GROUP_MAX_ENTRIES` would be too high? What do you think of 100, specifically?
<glozow> dariusp: i supppose in those cases, it's ambiguous if you're consolidating them to yourself or you're grouping them to make a payment to someone else, so it's fine to split?
<murch> dariusp: because you could do that in advance at low fees, consolidate all your UTXOs in a single group into say three pieces, and when you later want to spend at high fees, you only need to use one of the three
<glozow> michaelfolkson: idk. we saw earlier that you could maybe pay 0.0013BTC in fees on a tx. would that be acceptable to a user who has opted in to `-avoidpartialspends`?
<murch> Also, if you only have a single UTXO, when you spend from it, all of your funds are in flight and you can only make child transactions depending on this unconfirmed tx
<murch> glozow: It's a bit arbitrary. 42 might have been enough as well. Maybe 200 wouldn't be too bad. I'd firmly support 100 as being better than 10, tho
<dariusp> glozow i guess then the question around picking 100 specifically depends on who bitcoind is being built for? Someone who was super concerned with privacy or fees should probably be doing things more manually?
<lightlike> Are utxos with a negative effective feerate also included in the tx if they belong to the same output group, meaning that the absolute cost of the tx is higher compared to by simply dropping them?
<fjahr> dariusp: yeah, doing everything manually is always the last resort for people who want full control. This option give a more conveninet option that is at least helpful to most people with reused addresses and privacy concerns.
<michaelfolkson> prayank: When there are trade-offs it isn't as simple as saying anyone doesn't care. What you gain some place you lose some other place. And users have different preferences on how to manage that trade-off
<glozow> i think here it's not that black and white. a user could say `-avoidpartialspends=True, -maxtxfee=0.001BTC` if they want to hedge against a huge fee. and they can always create transactions and view them first without broadcasting ofc
<larryruane___> high-level question: the Core wallet has lots of great engineering, but is it used much is real life? If not, let me guess: we care about improving the Core wallet because many wallet implementors use it as a model? (at least we hope they do)