The PR branch HEAD was 1298f0fca7 at the time of this review club meeting.
Notes
A transaction’s fee is not set explicitly. The fee can be implicitly
calculated from the difference between total input value and the total output
value.
This PR updates the getblock RPC to include an extra “fee” field for transactions whenever:
verbosity >= 2
Undo data (which can be used for efficient fee calculation) exists for that block
Some extra changes were added to prevent fuzz test failures; we discussed
fuzz testing and how it’s used in Bitcoin Core in a previous review club
meeting.
Questions
What is CTxUndo?
How are we using it for fee calculation?
Where else is it used?
Why doesn’t a coinbase transaction have Undo data?
What’s the IO cost to read Undo data for a block?
How could we calculate the fee if Undo data didn’t exist for a block, and
what’s the IO cost?
If there’s a txindex?
If there’s no txindex?
What does CHECK_NONFATAL do?
What other options are there (e.g. asserts), and when should we use each
option?
Knowing the transaction fee is useful, but does only returning it in some
circumstances actually provide a better client experience, or is it
preferable for clients to always get exactly what they expect?
Should we also calculate the fee if Undo data is missing but a txindex
is available?
At the moment, the fact that we might return “fee” is undocumented. Is it
worth increasing the complexity of the documentation to say “we SOMETIMES
return the fee; here’s when we do so”?
We add some extra checks to prevent integer overflow in fuzz tests. But does
it make sense for fuzz tests to generate inputs that violate function
preconditions?
<felixweis> its a representation of the rev* data in your blocks/ directory. it allows to restore an UTXO in the set if we need to "rewind" the blockchain because a longer branch was found. remember, all spends from the shorter block chain removed the info (amount, bitcoin script)
<robot-dreams> sunon: Yup, I agree there's not a lot of context in UpdateCoins itself, but I found looking at where UpdateCoins itself got called to be helpful
<felixweis> BIP34 encodes the block height in the coinbase scriptSig field. so coinbase txids are still unique. previously they sometimes did override old coins.
<robot-dreams> jrawsthorne Nice, that seems like another interesting use of the undo data! Something else we can follow up on if we have time later, "what exactly is the filter index and how does that work"
<Paul_R> by override, could you equally say the over-wrote the old coins? ie. they ceased to exist anymore? seems like i would have heard of that bug by now, so i'm expecting override to not mean overwrite
<felixweis> I believe its fairly low thanks to the excellent design of UtraPrune™. we only need to read a single additional file which has around ~100 kB of data per block to get all the amount information of all the txins spend in that particular block.
<sipa> dhruvm: nope, the only protection that currently exists is just that an attacker would need to waste N bytes bandwidth to make us read N bytes from disk
<sipa> (which is one of the reasons why BIP37 was disabled by default - it enabled an attacker to ask for a block to be read from disk without pretty much any bandwidth at all)
<robot-dreams> felixweis: going back to IO cost, yup! We read a contiguous section of a `rev?????.dat` file, which contains the serialized `CTxUndo` data (a vector of UTXO info)
<robot-dreams> I got an estimate of 64kb by adding up the size of all the rev* files and dividing by the number of blocks. Same order of magnitude as your 100kb estimate.
<@jnewbery> there is some (limited) protection from -maxuploadtarget. It's more to protect bandwidth, but I think probably also protects disk reads indirectly
<emzy> ' bitcoin-cli getblock "$(bitcoin-cli getblockhash 650683)" 2 ' I will get a 'core_write.cpp:251 (TxToUniv) Internal bug detected: 'MoneyRange(fee)''
<jrawsthorne> A lot more expensive, with undo we can read it all at once but with this we'd have to get each tx individually and then find the correct output
<dhruvm> So `TxToUniv` seems to have access to `CTransaction`. `CTransaction` has `vins` and `vouts`. `vouts` have amount, but `cTxIn` does not - it points to prevout.
<robot-dreams> nehan: Interesting design question! Yeah, "if we were redesigning from scratch would we just put rev????? data right next to block data"
<stacie> when talking about txindex, is it defined as `block_height:index_of_tx_in_block` or `txid:index_of_output` (in terms of finding more details on a vin). I'm guessing it's the former?
<robot-dreams> OK, next is a "UX" question: Knowing the transaction fee is useful, but does only returning it in some circumstances actually provide a better client experience, or is it preferable for clients to always get exactly what they expect?
<robot-dreams> stacie: Yeah, great question! I agree with felixweis, I think specifically the txindex tells you the file where you can get the block data, as well as some relevant offsets into the file to find the data
<lightlike> another question: while this PR shows the fees for each tx: would it maybe be useful to also display the total amount of fees spent in a block in the getblock RPC?
<nehan> sipa: i don't know what you mean by the third option, "commitment structure". input data is already committed to in a transaction. somewhere else?
<robot-dreams> lightlike: elle: great points! I don't have enough context to know whether we want to provide the extra convenience of such a calculation, or assume RPC clients can do that on their own
<sipa> nehan: this may be a bit of a tangent, but i think it's easiest to think of blocks and transactions as abstract data structures that are defined by their commitment structure (so a transaction is defined by its txid, witness transaction by its wtxid, block by its block hash, ...), and how/what is exactly stored/sent is implementation detail
<sipa> nehan: so in that sense, sennding input data along with blocks isn't making it part of "the block" - it's just auxiliary data that happens to be sent/stored along with it
<dhruvm> robot-dreams: in response to the UX question. when representing the same data in two ways(in this case as fee, and sum(op) - sum(ip)), i worry what happens if a bug makes the two inconsistent. which representation is truth for the client? should a low-level protocol only represent the bare minimum and leave the rest to clients? should the protocol minimize computation needed from clients?
<jonatack> if anyone's interested, there was a related review club session in April, "Flush undo files after last block write (validation)" https://bitcoincore.reviews/17994
<dhruvm> sipa: I mean from the client's perspective. If the fee and sum(op)-sum(ip) ever disagree, the clients should prefer the latter. But explicitly sending them the fee makes it possible that they might not?
<felixweis> I'm curious about how people feel about Question 6. Should we just return the fee always and not introduce another "verbosity" level? even if there are speed penatlys for the rpc?
<robot-dreams> felixweis: Yeah, very interesting question. On a similar note I'm also curious if people think we should calculate the fee when a txindex (but not block undo data) is available.
<dhruvm> sipa: yeah i guess it's just my ocd when representing the same data in two ways. I've experienced bugs from that in the past. Perhaps not something to worry about.
<dhruvm> sipa: I am thinking of a future scenario where the server computes fee=sum(op)-sum(ip) incorrectly due to a bug. Now the client receives fee that is different than the formula.
<dhruvm> sipa: But I can also see how that's just paranoia. If there's a bug, we'll fix it. It's not going to be at the storage level since like you said we are not storing anything differently.
<sipa> the goal isn't that two computations end up with the same result per se - the goal is that the fee is correctly calculated, everywhere, and that implies that all implementations will come to the same conclusion
<sipa> something like bitcoin block validation is in many ways not a correctness problem - there are plenty of weird deviations from the consensus rules that could be accidentally made, as long as everyone makes the same deviations
<sipa> emzy: i hereby decree that The Fee is defined as the sum of transaction outputs' values minus the sum of the values of the outputs spent by the transaction's input.
<dhruvm> emzy: i think what sipa is saying is that it's well understood sum(op)-sum(ip) is the definition of fee. now, if there's a bug on rpc server, or rpc client, that doesn't matter as much.