Add foreign_outputs metadata to support CoinJoin transactions (wallet)

https://github.com/bitcoin/bitcoin/pull/25991

Host: stickies-v  -  PR author: luke-jr

The PR branch HEAD was 10bbb0a at the time of this review club meeting.

Notes

  • CoinJoin is a trustless protocol for mixing UTXOs from multiple owners in order to make it difficult for outside parties to use the block chain’s transaction history to determine who owns which coin. It involves collaboratively creating a single transaction that spends one or multiple UTXOs from each participant into one or multiple new UTXOs for each participant, making it more difficult to trace each output’s transaction history.

  • In a very simple form: Alice, Bob and Carol each provide a 1.1 BTC UTXO and generate an address they want to receive their outputs into. Alice creates a transaction that spends those 3.3 BTC into 3 outputs of 1 BTC (and 0.3 BTC as fee), one to each of the 3 addresses generated by the participants. Alice, Bob and Carol will all need to sign the transaction, since only they can sign for their own inputs. Once all signatures have been added to the transaction, anyone can broadcast it and the CoinJoin is complete. At no point did any of the participants have access to anyone else’s coins.

  • Many different CoinJoin implementations exist (e.g. some allow non-uniform outputs, others don’t), but this is beyond the scope of this PR and Review Club.

  • Prior to #25991, gettransaction incorrectly calculates the fee because it is assumed that all of the outputs are sent by the user, as shown in the underlying issue this PR is aiming to fix.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. We already have CWallet::IsMine() - why do we need the new function CWalletTx::IsForeignOutput()? What is the difference between these functions? What is a foreign output?

  3. Does commit “Wallet: Refactor CachedTxGetAmounts fee calculation to inline value_out” introduce any behaviour change, or is it just refactoring?

  4. The developer notes indicate a preference for pre- instead of post-increment and decrement operators. In CWalletTx::SetForeignOutput, the post-decrement i-- is used. Should this be changed, or is it a valid exception to the guideline? Why?

  5. CWalletTx::m_foreign_outputs is implemented as a std::vector<bool>. Is there anything peculiar about std::vector<bool>? Which other data structure(s) would you consider using instead, if any - and why?

  6. What is the purpose of this for-loop in CWalletTx::Serialize()? For i=3, what is the value of 1 << (i % 8) - and what does it mean? (Hint: what does the [] operator on a std::string return, and what is the size of that return value?)

  7. Are the foreign outputs persisted to disk? If no, why is this not necessary? If yes, which code is responsible for doing that?

Meeting Log

  117:00 <stickies-v> #startmeeting
  217:00 <larryruane_> hi!
  317:00 <glozow> hi
  417:01 <alecc> hi
  517:01 <stickies-v> welcome everyone! Today we're looking at #25991, authored by luke-jr. The notes and questions are available on https://bitcoincore.reviews/25991
  617:01 <Amirreza> Hi
  717:02 <stickies-v> anyone joining us for the first time today? even if you're just lurking, feel free to say hi! it's nice to see who's following
  817:02 <alecc> this is my first time!
  917:02 <luke-jr> me too! <.<
 1017:02 <luke-jr> maybe
 1117:02 <brunoerg> hi
 1217:02 <asi0> hi! Second time here but still lurking ^^'
 1317:03 <stickies-v> welcome alecc, glad that you found your way here! don't hold back to raise any questions you have, it's a very welcoming environment here
 1417:03 <luke-jr> btw, in the process of fixing the minor bug, I found another less trivial-to-fix one ;)
 1517:03 <stickies-v> haha and welcome to our PR author and (maybe) first-time attendee luke-jr, thank you for joining us!
 1617:04 <stickies-v> who got the chance to review the PR or read the notes? (y/n)
 1717:04 <alecc> y
 1817:04 <brunoerg> I just read the notes
 1917:05 <stickies-v> luke-jr: well definitely do feel free to bring it up in this session too if it's interesting to discuss!
 2017:05 <Amirreza> y
 2117:05 <asi0> same as brunoerg
 2217:06 <alecc> i didn't review on github but went through the notes/questions
 2317:06 <luke-jr> stickies-v: should we wait to see if someone else notices it? :P
 2417:07 <stickies-v> haha we can keep it as a bonus until the end to keep people hooked
 2517:07 <stickies-v> what are everyone's general thoughts on the PR? and if you have been able to review in more detail, would you give it a Concept ACK, Approach ACK, Tested ACK, or NACK?
 2617:07 <glozow> luke-jr: records say you have attended before :P welcome back! https://bitcoincore.reviews/17428#l-16
 2717:08 <luke-jr> :o
 2817:09 <alecc> stickies-v: concept ACK - fixing fee calculation makes sense, i'm not super certain on implementation mostly because i'm new to the codebase
 2917:10 <Amirreza> Can someone explain to me what does `Cache` means in the context of wallet code? I don't understand for example `GetCachableAmount` or `CachedTxGetDebit`?
 3017:11 <larryruane_> concept ACK, definitely would be good to fix, still studying the approach and the code
 3117:11 <stickies-v> alecc: just to avoid confusion, the issue fixed here is not about wrong fee calculation when creating a transaction (which would cost the user money), but rather about displaying/calculating fees for existing transactions in a users wallet
 3217:12 <luke-jr> arguably the current logic is even correct - it just can't handle the data it doesn't know
 3317:12 <stickies-v> yes, sorry - that's a good nuance, thanks
 3417:12 <alecc> stickies-v: makes sense, thanks for clarifying
 3517:13 <stickies-v> Amirreza: I'm actually not sure why all these functions contain Cache, hopefully someone more familiar with the wallet can clarify?
 3617:13 <stickies-v> awaiting that, let's dive into the questions
 3717:13 <stickies-v> quick reminder - i'll be iterating over the questions sequentally, but the discussion can go async, so feel free to refer to previous questions or any other relevant topics you want to discuss
 3817:13 <michaelfolkson> hi
 3917:14 <stickies-v> We already have `CWallet::IsMine()` - why do we need the new function `CWalletTx::IsForeignOutput()`? What is the difference between these functions? What is a foreign output?
 4017:14 <stickies-v> maybe let's focus on the last bit first. conceptually, what is a foreign output?
 4117:15 <luke-jr> right now, the code assumes that if any of the inputs came from this wallet, all outputs to the transaction are sends from this wallet
 4217:15 <luke-jr> in the case of a CoinJoin, that assumption is incorrect
 4317:15 <luke-jr> but the wallet doesn't know which outputs it DID send, or didn't
 4417:15 <alecc> the PR notes specifically talk about CoinJoin - I was thinking a foreign output was an output that is 1. not being sent to a pubkey owned by our wallet and 2. not created by our wallet?
 4517:15 <luke-jr> "foreign output" is the term I'm using for outputs that contradict the assumption - that is, they AREN'T sent by this wallet
 4617:16 <luke-jr> it _could_ also still be _to_ this wallet, and if so should be treated as a receive
 4717:17 <alecc> when we say "sent by this wallet", that's like if we were participating in a coin join but didn't contribute this output? i imagine it's more general than that maybe
 4817:17 <luke-jr> right, that's the general example
 4917:17 <stickies-v> alecc: the second part of your statement is particularly relevant
 5017:17 <larryruane_> so a single transaction could have a mixture of foreign and non-foreign outputs?
 5117:17 <luke-jr> though "contribute" can be confusing
 5217:17 <luke-jr> larryruane_: yes, that's the expectation
 5317:18 <luke-jr> in fact, it could break things in other ways, if all the outputs get marked as foreign
 5417:18 <luke-jr> but it's not really clear how that _should_ work, so I'd treat that scenario as UB
 5517:18 <stickies-v> (UB is Undefined Behaviour, for anyone unfamiliar with the term)
 5617:19 <alecc> thanks, what just about to ask
 5717:19 <stickies-v> thanks for that context, luke-jr
 5817:19 <luke-jr> yeah, sorry
 5917:20 <larryruane_> oh I thought UB had to do only with the details of our C++ code, but the term also applies to how overall bitcoin works?
 6017:20 <larryruane_> (if so that's interesting!)
 6117:21 <stickies-v> just to rephrase it a bit, IsMine() tells us if a tx/script/output etc is recognized (e.g. it's watching it, or it can spend it) by our wallet
 6217:21 <luke-jr> larryruane_: UB is generic; you can use it to refer to things in the real world too :P
 6317:21 <stickies-v> however, we need to keep in mind that in a bitcoin transaction, an output doesn't specify which input(s) it spends. conceptually, it's like all the inputs are pooled together and then the output(s) just spend from that pool
 6417:21 <luke-jr> the actions of a toddler are often UB
 6517:22 <larryruane_> luke-jr: +1 TIL thanks
 6617:22 <larryruane_> haha!
 6717:22 <sipa> eh, UB is a specific term in the C/C++ standards
 6817:22 <sipa> sure, you can use it in the real world too, like many terms of art
 6917:22 <sipa> but it has a well-defined (ha!) meaning within the context of the C and C++ languages
 7017:23 <alecc> stickies-v: given that, i'm a little confused on what actually qualifies an output to be foreign? from the code i didn't see anything "detecting" an output to be foreign/don't know how it would be i guess
 7117:23 <luke-jr> alecc: it's not detectable; the user must provide the information
 7217:23 <stickies-v> so when looking at a transaction's outputs as an external observer, we can't really say "who" funds each output. we need context to be able to do that
 7317:24 <alecc> luke-jr: ah i was thinking that but wasn't sure
 7417:24 <luke-jr> this PR just makes it possible for the user to do so, saves it, and interprets using it
 7517:24 <stickies-v> alecc: foreign means, in simple terms, that you have "nothing to do" with that output. someone else is funding it, you just all happen to be in the same PR
 7617:24 <stickies-v> *same transaction, not PR
 7717:24 <sipa> luke-jr: Specifically, I'd say that foreign outputs thing matches more the "unspecified value" in C/C++ (which means the implementation could have many valid behaviors, which aren't well defined, but it can't do completely unrelated things like wipe your wallet.dat file)
 7817:24 <alecc> stickies-v: gotcha
 7917:25 <luke-jr> sipa: ok
 8017:25 <stickies-v> okay hopefully that clears up the context/purpose of the PR for everyone, I'll move on to the next question
 8117:25 <stickies-v> Does commit "Wallet: Refactor CachedTxGetAmounts fee calculation to inline value_out" introduce any behaviour change, or is it just refactoring?
 8217:25 <stickies-v> (link: https://github.com/bitcoin/bitcoin/pull/25991/commits/efa22dd36f1399c49c5a149ace5232b700c7b049)
 8317:26 <BlueMoon> Hello, sorry, I got in late, I was busy.
 8417:27 <alecc> stickies-v: from what i could tell this commit alone is just a refactor - changes fee calculation from using `tx-GetValueOut` to manually adding them to a total when iterating through outputs
 8517:28 <stickies-v> alecc: can you see any difference in the implementation of `GetValueOut()` and the 'inlined' version from this commit, though?
 8617:28 <alecc> oh yea `GetValueOut` has some constraints on the value i think
 8717:29 <alecc> that's actually probably pretty important isn't it?
 8817:29 <stickies-v> yup exactly, it checks that the output amount is <21m btc by using the `MoneyRange()` function
 8917:29 <alecc> `MoneyRange`
 9017:29 <alecc> oh dang
 9117:30 <stickies-v> so, is this behaviour change? and/or is it safe to do?
 9217:30 <alecc> yes behaviour change, not safe
 9317:32 <luke-jr> seeing as this is only called on a wallet tx, I would argue it's safe
 9417:33 <luke-jr> also note being outside the range throws an exception (or asserts), which would break things like listtransactions entirely
 9517:33 <stickies-v> luckily, it's both safe and no behaviour change. earlier in the function, when we calculate `nDebit`, `CachedTxGetDebit` (well, the functions it calls) has already checked the output values of the tx through `MoneyRange`
 9617:33 <luke-jr> but perhaps a reviewer ought to check call sites for the function
 9717:34 <luke-jr> stickies-v: ah, I didn't notice that ☺
 9817:34 <stickies-v> so effectively this inlined version just removed the duplicated (and quite possibly the ranges have been checked in other places already as well) check. all is well, another day without inflation loopholes!
 9917:35 <alecc> mm makes sense, i jumped to the conclusion too quickly
10017:36 <alecc> now that I'm actually looking at how `MoneyRange` is used in `GetValueOut` it looks like it's checking just that the txout total value is within 0-21mil which seems kinda redundant? like it's checking that the tx isn't literally spending all possible btc? maybe a question for another time
10117:38 <stickies-v> I'm not 100% all the places where MoneyRange is used or its exact purpose. Could be used as protection against inflation, but also against e.g. buffer overflow. I believe that's a bug we had quite a few years ago.
10217:39 <stickies-v> Alright, next question (changing the order a bit)
10317:39 <stickies-v> `CWalletTx::m_foreign_outputs` is implemented as a `std::vector<bool>`. Is there anything peculiar about `std::vector<bool>`? Which other data structure(s) would you consider using instead, if any - and why?
10417:39 <stickies-v> (link: https://github.com/bitcoin-core-review-club/bitcoin/blob/10bbb0a5252470c5afe17c38326476be4a523613/src/wallet/transaction.h#L193)
10517:41 <alecc> this one i wasn't too sure on - i'd have to look back in the code to see the context more, but if you know how many outputs you have before you create the `m_foreign_ouputs` vector, then you could maybe make it an array/don't need the resizeability
10617:42 <michaelfolkson> The bool only uses a bit of space rather than a byte https://stackoverflow.com/questions/17794569/why-isnt-vectorbool-a-stl-container
10717:43 <michaelfolkson> "doesn't offer all the capabilities and interface of a normal standard container"
10817:44 <alecc> oh woah
10917:44 <luke-jr> michaelfolkson: not guaranteed to, iirc, but hopefully
11017:44 <stickies-v> michaelfolkson: yeah exactly, even though the spec doesn't guarantee std::vector<bool> to be more space efficient, most (all?) implementations do, but at the cost of a different interface and some unexpected behaviour, e.g. you can't use pointers to the elements of a bool vector
11117:45 <stickies-v> in many cases it's not a problem, just something to be aware of and consider, which is why I included it in the questions here
11217:45 <luke-jr> so it's actually more efficient than putting a bool on CTxOut (which would kinda be a layering issue)
11317:45 <willcl_ark> seems like boost.container has a regular bool vec
11417:46 <stickies-v> willcl_ark: but at the same time we're also trying to remove boost dependencies, unfortunately :-D
11517:46 <luke-jr> willcl_ark: we prefer STL over boost
11617:46 <michaelfolkson> luke-jr: Not guaranteed to only use a bit of space, you mean?
11717:46 <luke-jr> michaelfolkson: right
11817:46 <willcl_ark> sure and I agree! Was just looking for alternatives as per the q :P
11917:47 <stickies-v> from what I've read, most common alternatives are std::vector<char> or std::bitset - but I'm not sure these would be preferable in this situation?
12017:47 <stickies-v> willcl_ark: yes sorry, thank you for the input!
12117:48 <willcl_ark> deque? IMO it's fine to use the weird bool vec here though
12217:48 <larryruane_> wow TIL ... I'd think it would have been better to have std::vector<bool> just be (basically) an array of bytes, and have a specific bitmap class... rather than having that exception
12317:49 <larryruane_> I wonder, if you wanted the each-bool-is-a-byte type of container, if you could declare a `std::vector<uint8_t>` or something, and just always store 0 or 1 in its elements
12417:49 <stickies-v> larryruane_: when perusing stackoverflow, you'll find many people share your sentiment. hard to phase out once it's in the standard library, though
12517:50 <stickies-v> larryruane_: yeah exactly that's a valid alternative
12617:51 <stickies-v> Next up: what is the purpose of the for-loop in `CWalletTx::Serialize()`? For i=3, what is the value of `1 << (i % 8)` - and what does it mean?
12717:51 <stickies-v> (Hint: what does the `[]` operator on a `std::string` return, and what is the size of that return value?)
12817:51 <stickies-v> (link: https://github.com/bitcoin-core-review-club/bitcoin/blob/10bbb0a5252470c5afe17c38326476be4a523613/src/wallet/transaction.h#L246-L248)
12917:52 <alecc> it looks like it packs the values into a sequence of bits - iterates over each char/byte in the string 8 times for each bit
13017:53 <alecc> for i = 3 `1 << (i % 8)` = 0001000 = 8
13117:54 <stickies-v> alecc: yeah you got it!
13217:54 <stickies-v> so, if we're iterating over the same char multiple times... aren't we just overwriting it the whole time?
13317:55 <larryruane_> why `m_foreign_outputs.at(i)` instead of `m_foreign_outputs[i]` -- in this situation would those be the same?
13417:56 <stickies-v> larryruane_: that's one of the review comments I've got lined up hah. since we've checked bounds already, I think we should just use `[]` which is slightly faster
13517:56 <alecc> stickies-v: we're bitwise or-ing the byte with a number that only has as most 1 nonzero bit, so it's only flipping that individual bit each iteration
13617:56 <luke-jr> stickies-v: is it?
13717:57 <stickies-v> alecc: yeah!
13817:57 <stickies-v> luke-jr: is it faster? I've not benchmarked, but... that's what I read? which would make sense since we're not redoing the bounds checking first?
13917:58 <luke-jr> stickies-v: it might be. other C++ stuff (maybe just maps?) generates "create a new item" and such for operator[], in which case .at() is cheaper for read access
14017:59 <larryruane_> "aren't we just overwriting it" -- I think so, an alternative might have been to write to a stack variable (in practice probably a register) instead of an element of the string `s` and only store it to `s` after the loop.. but I don't know if it's worth it
14118:00 <luke-jr> stickies-v: looks like you're right
14218:00 <stickies-v> larryruane_: sorry yeah you're right, we're overwriting the byte but keeping the individual bits which is what I meant, but you're definitely right with that nuance (and potential performance implications it may have, although probably not that important)
14318:01 <stickies-v> luke-jr: okay thanks I didn't know that, will have to read up on it a bit!
14418:01 <stickies-v> alright folks looks like we're at time, that's all for today - thank you very much for attending and participating!
14518:01 <stickies-v> #endmeeting