The PR branch HEAD was 10bbb0a at the time of this review club meeting.
Notes
CoinJoin is a trustless protocol for mixing UTXOs from multiple owners in order to make it difficult for outside parties to use the block chain’s transaction history to determine who owns which coin. It involves collaboratively creating a single transaction that spends one or multiple UTXOs from each participant into one or multiple new UTXOs for each participant, making it more difficult to trace each output’s transaction history.
In a very simple form: Alice, Bob and Carol each provide a 1.1 BTC UTXO and generate an address they want to receive their outputs into. Alice creates a transaction that spends those 3.3 BTC into 3 outputs of 1 BTC (and 0.3 BTC as fee), one to each of the 3 addresses generated by the participants. Alice, Bob and Carol will all need to sign the transaction, since only they can sign for their own inputs. Once all signatures have been added to the transaction, anyone can broadcast it and the CoinJoin is complete. At no point did any of the participants have access to anyone else’s coins.
Many different CoinJoin implementations exist (e.g. some allow non-uniform outputs, others don’t), but this is beyond the scope of this PR and Review Club.
Prior to #25991, gettransaction incorrectly calculates the fee because it is assumed that all of the outputs are sent by the user, as shown in the underlying issue this PR is aiming to fix.
CWalletTx::m_foreign_outputs is implemented as a std::vector<bool>. Is there anything peculiar about std::vector<bool>? Which other data structure(s) would you consider using instead, if any - and why?
What is the purpose of this for-loop in CWalletTx::Serialize()? For i=3, what is the value of 1 << (i % 8) - and what does it mean? (Hint: what does the [] operator on a std::string return, and what is the size of that return value?)
Are the foreign outputs persisted to disk? If no, why is this not necessary? If yes, which code is responsible for doing that?
<stickies-v> welcome everyone! Today we're looking at #25991, authored by luke-jr. The notes and questions are available on https://bitcoincore.reviews/25991
<stickies-v> what are everyone's general thoughts on the PR? and if you have been able to review in more detail, would you give it a Concept ACK, Approach ACK, Tested ACK, or NACK?
<Amirreza> Can someone explain to me what does `Cache` means in the context of wallet code? I don't understand for example `GetCachableAmount` or `CachedTxGetDebit`?
<stickies-v> alecc: just to avoid confusion, the issue fixed here is not about wrong fee calculation when creating a transaction (which would cost the user money), but rather about displaying/calculating fees for existing transactions in a users wallet
<stickies-v> quick reminder - i'll be iterating over the questions sequentally, but the discussion can go async, so feel free to refer to previous questions or any other relevant topics you want to discuss
<stickies-v> We already have `CWallet::IsMine()` - why do we need the new function `CWalletTx::IsForeignOutput()`? What is the difference between these functions? What is a foreign output?
<alecc> the PR notes specifically talk about CoinJoin - I was thinking a foreign output was an output that is 1. not being sent to a pubkey owned by our wallet and 2. not created by our wallet?
<alecc> when we say "sent by this wallet", that's like if we were participating in a coin join but didn't contribute this output? i imagine it's more general than that maybe
<stickies-v> just to rephrase it a bit, IsMine() tells us if a tx/script/output etc is recognized (e.g. it's watching it, or it can spend it) by our wallet
<stickies-v> however, we need to keep in mind that in a bitcoin transaction, an output doesn't specify which input(s) it spends. conceptually, it's like all the inputs are pooled together and then the output(s) just spend from that pool
<alecc> stickies-v: given that, i'm a little confused on what actually qualifies an output to be foreign? from the code i didn't see anything "detecting" an output to be foreign/don't know how it would be i guess
<stickies-v> so when looking at a transaction's outputs as an external observer, we can't really say "who" funds each output. we need context to be able to do that
<stickies-v> alecc: foreign means, in simple terms, that you have "nothing to do" with that output. someone else is funding it, you just all happen to be in the same PR
<sipa> luke-jr: Specifically, I'd say that foreign outputs thing matches more the "unspecified value" in C/C++ (which means the implementation could have many valid behaviors, which aren't well defined, but it can't do completely unrelated things like wipe your wallet.dat file)
<stickies-v> Does commit "Wallet: Refactor CachedTxGetAmounts fee calculation to inline value_out" introduce any behaviour change, or is it just refactoring?
<alecc> stickies-v: from what i could tell this commit alone is just a refactor - changes fee calculation from using `tx-GetValueOut` to manually adding them to a total when iterating through outputs
<stickies-v> luckily, it's both safe and no behaviour change. earlier in the function, when we calculate `nDebit`, `CachedTxGetDebit` (well, the functions it calls) has already checked the output values of the tx through `MoneyRange`
<stickies-v> so effectively this inlined version just removed the duplicated (and quite possibly the ranges have been checked in other places already as well) check. all is well, another day without inflation loopholes!
<alecc> now that I'm actually looking at how `MoneyRange` is used in `GetValueOut` it looks like it's checking just that the txout total value is within 0-21mil which seems kinda redundant? like it's checking that the tx isn't literally spending all possible btc? maybe a question for another time
<stickies-v> I'm not 100% all the places where MoneyRange is used or its exact purpose. Could be used as protection against inflation, but also against e.g. buffer overflow. I believe that's a bug we had quite a few years ago.
<stickies-v> `CWalletTx::m_foreign_outputs` is implemented as a `std::vector<bool>`. Is there anything peculiar about `std::vector<bool>`? Which other data structure(s) would you consider using instead, if any - and why?
<alecc> this one i wasn't too sure on - i'd have to look back in the code to see the context more, but if you know how many outputs you have before you create the `m_foreign_ouputs` vector, then you could maybe make it an array/don't need the resizeability
<stickies-v> michaelfolkson: yeah exactly, even though the spec doesn't guarantee std::vector<bool> to be more space efficient, most (all?) implementations do, but at the cost of a different interface and some unexpected behaviour, e.g. you can't use pointers to the elements of a bool vector
<stickies-v> from what I've read, most common alternatives are std::vector<char> or std::bitset - but I'm not sure these would be preferable in this situation?
<larryruane_> wow TIL ... I'd think it would have been better to have std::vector<bool> just be (basically) an array of bytes, and have a specific bitmap class... rather than having that exception
<larryruane_> I wonder, if you wanted the each-bool-is-a-byte type of container, if you could declare a `std::vector<uint8_t>` or something, and just always store 0 or 1 in its elements
<stickies-v> larryruane_: when perusing stackoverflow, you'll find many people share your sentiment. hard to phase out once it's in the standard library, though
<stickies-v> Next up: what is the purpose of the for-loop in `CWalletTx::Serialize()`? For i=3, what is the value of `1 << (i % 8)` - and what does it mean?
<stickies-v> larryruane_: that's one of the review comments I've got lined up hah. since we've checked bounds already, I think we should just use `[]` which is slightly faster
<alecc> stickies-v: we're bitwise or-ing the byte with a number that only has as most 1 nonzero bit, so it's only flipping that individual bit each iteration
<stickies-v> luke-jr: is it faster? I've not benchmarked, but... that's what I read? which would make sense since we're not redoing the bounds checking first?
<luke-jr> stickies-v: it might be. other C++ stuff (maybe just maps?) generates "create a new item" and such for operator[], in which case .at() is cheaper for read access
<larryruane_> "aren't we just overwriting it" -- I think so, an alternative might have been to write to a stack variable (in practice probably a register) instead of an element of the string `s` and only store it to `s` after the loop.. but I don't know if it's worth it
<stickies-v> larryruane_: sorry yeah you're right, we're overwriting the byte but keeping the individual bits which is what I meant, but you're definitely right with that nuance (and potential performance implications it may have, although probably not that important)