The PR branch HEAD was f30f007c8 at the time of this review club meeting.
Notes
Bitcoin transactions consist of
transaction inputs,
which each refer to an output that is being spent by txid and index, and
transaction outputs,
which each contain an amount and a spending condition (encoded as a
scriptPubKey).
The transaction fee is the total amount from the transaction inputs minus
the total amount spent by the transaction outputs.
Since the transaction inputs do not explicitly include the amounts, the
only way to determine the transaction fee is to look up the UTXOs (unspent
transaction outputs) that are being spent in this transaction. In other
words, the transaction fee is implicit.
For transactions in the mempool, the UTXOs spent by the transaction will
either be in the UTXO set (the set of all UTXOs implied by the block chain
at the current height), or will be the outputs from other transactions in
the mempool
For transactions that are already spent in the block chain, the UTXO data
is saved in
undo data.
The
getrawtransaction
RPC method can be used to retrieve a transaction. The transaction can
either be an unconfirmed transaction in the mempool or a confirmed
transaction which has been saved in a block file on disk.
This PR updates the getrawtransaction RPC method to allow the user to
retrieve fee data for a transaction.
This PR changes the second argument
of getrawtransaction from verbose (a bool) to verbosity (an int).
Is this ok? Does it break backwards compatibility?
The verbosity argument accepts values of 0 (returns the hex-encoded
data for the transaction), 1 (returns an object with information about
the transaction) or 2 (returns an object with information about the
transaction, including fees and prevout information for the inputs). What
happens if 3 is passed as the argument? How about -1?
The commit introduces
local variablesblockUndo and block in the getrawtransaction() function. What are
they used for? How/where do they get set?
<pg156> I read the notes and questions and the PR, but I don't have enough information to give a concept ACK, because I don't understand how the fee data will exactly help the user. What are the use cases?
<jnewbery> pg156: it looks like there's not much motivation for this change given in the PR or associated issue. Perhaps someone else here can give some use-cases?
<maxe> I noticed this in the notes: "Initial concept ACK. This is really helpful for wallets trying to sync with core rpc out there, and it will reduce get calls by half for input fetching. Thanks for working on this."
<raj_> jnewbery, I find the input details reporting very helpful. I am working on a wallet that syncs from bitcoin core with RPC, to get the input details we had to make many rpc get queries. This will reduce those get calls by a lot..
<raj_> michaelfolkson, yes, but not without getting the input details. And yes, once we get the inputs, getting fee is mostly trivial. So not very sure on the fee part.
<jnewbery> Next question: This PR changes the second argument of getrawtransaction from verbose (a bool) to verbosity (an int). Is this ok? Does it break backwards compatibility?
<lsilva_> Does not break backwards compatibility. If the user passes a boolean parameter, the code will still handle it (src / rpc / rawtransaction.cpp: 201-205).
<larryruane> maybe very slightly non-backward compatible, because before this PR, specifying a 2 gives the old verbose but now with this PR, a more verbose output ... so that's a change in behavior (the the same input), but that's undocumented usage, so it's okay
<larryruane> oh but wait, correcting myself, all this does is add *new* fields to the output, so anyone looking at the existing fields will find them (with the same meanings)
<neha> thanks for that clarification! so iiuc clients who might have been passing a 2 or 3 for a bool (which is weird, but ok) should be fine? does it break anything to have extra fields?
<michaelfolkson> Nice comment by stickies-v: "Did some testing and it looks like the RPC is failing for coinbase transactions when verbosity==2. In this case we should probably fallback to verbosity==1 behaviour instead?"
<jnewbery> sipa larryruane: right, I think it just adds new fields to the object being returned. That should be fine - clients should tolerate receiving fields that they don't know the meaning off, since we often upgrade RPCs by adding new fields
<michaelfolkson> That's a backward compatibility issue (at least with the current state of the PR). I'm assuming it should never fail (once PR issue is sorted)
<stickies-v> would we be breaching any API guarantees by simply adding the new `fee` and `prevout` in the verbose output, without adding the new verbosity==2 option? Since we're just adding new fields?
<glozow> neha: any software that's passing getrawtransaction(2)["somefield"] will still work. if there is software asserting (getrawtransaction(2) == someobject) then it fails, but that just means they need to change the 2 to a true
<jnewbery> larryruane: that *could* be a break in user behaviour, *if* the users is calling the RPC with verbose={some value greater than 1} *and* they were expecting that field to always be there. However, that's a pretty pathological edgecase, since users shouldn't really have been setting verbose={some value greater than 1}
<neha> glozow: thanks! my question is does anyone do the latter, or I guess more like what is the API contract around that? Is it considered backwards-incompatible for that to change?
<neha> based on jnewbery's "clients should tolerate receiving fields that they don't know the meaning off, since we often upgrade RPCs by adding new fields" i agree
<lsilva_> vRPCConvertParams is a array of CRPCConvertParam. Its purpose is to convert non-string RPC argument to JSON. It is used by bitcoin-cli when preparing the request in DefaultRequestHandler::PrepareRequest. The methods used to convert values are RPCConvertNamedValues (for named parameters) and RPCConvertValues.
<jnewbery> Next question: The verbosity argument accepts values of 0 (returns the hex-encoded data for the transaction), 1 (returns an object with information about the transaction) or 2 (returns an object with information about the transaction, including fees and prevout information for the inputs). What happens if 3 is passed as the argument? How about -1?
<pg156> What's the C++ idiomatic way to express a type could be either a bool, or an integer of value 0, 1, 2? An abstract class? I kno in other languages this could be easier. (subtyping?)
<neha> depends on the API contract with users. it sounds like you are not allowed to remove fields, only add them, which means that any update to use other verbosity levels would need to abide by that
<stickies-v> I generally prefer things to fail explicitly. I don't see any benefit of accepting these other values, except slightly reducing the complexity of the code by avoiding error checking
<jnewbery> neha: I think if in some version of the code, calling the RPC with verbosity=n fails, and in the next version of the code, calling the RPC with verbosity=n succeeds, but has some of the fields missing that you'd get with verbosity=n-1, that's fine
<jnewbery> ok, next question. The commit introduces local variables blockUndo and block in the getrawtransaction() function. What are they used for? How/where do they get set?
<stickies-v> They contain the unserialized CBlockUndo and CBlock data, which get set by calling the UndoReadFromDisk and ReadBlockFromDisk functions (using out parameters). `blockUndo` is used to calculate the fee more quickly since it already contains the prevouts. `block` is used to find the index of the transaction in the block, since that is necessary to find the correct undo data in `blockUndo`.
<stickies-v> lsilva_ it seems like no fee or prevout data will be included in the RPC response then, because the blockindex would be null so it would fail the !blockindex test. Just thinking out loud here, anyone has any other views?
<pg156> Why do we need both `CBlock` and `CBlockUndo`? Is it possible to have one class `CBlock` (and only block*.dat files without rev*.dat files) handle the block reverting as well?
<stickies-v> inputs being in different blocks is no issue, that's why the have the undo data - it stores all the prevouts of each transaction in the block in its own data structure so you don't need to look them all up
<larryruane> question, `blockUndo` and `block` get default-constructed even if `verbosity` is 1, is that a performance concern? (I didn't look at those constructors)
<sipa> pg156: the best argument i think is that they are constructed at a different time, in different order; blocks are received directly from peers in the network; undo data is a side effect of validation, which happens whenever blocks become part of the new longest active chain
<stickies-v> lsilva_ yep, in this function we're looking at a single transaction, that belongs to a single block. For each block, an equivalent set of undo data is stored on disk. That undo data, for each block, contains all the inputs that were used for each of the transactions in the block. That makes it much easier to roll back a block, because all the prevouts are already stored (duplicated) there.
<jnewbery> I think it's slightly bad style that block and blockUndo get updated as side-effects of function calls in the if statement, and then used later
<jnewbery> if we were writing the ReadBlockFromDisk() and UndoReadFromDisk() interfaces from scratch, we might have them return std::optional<CBlock|CBlockUndo> (as long as that didn't impact performance)
<larryruane> side question, does `opt_tx_position` need to be initialized to `std::nullopt`? Isn't that the default? Or is that just considered good practice?
<stickies-v> larryruane I think if you don't initialize it depends on the compiler what the default value is going to be, so hence best to be explicit?
<jnewbery> larryruane: if you're concerned about performance, the thing that you should be worried about is deserializing the entire block from disk into CTransaction objects
<larryruane> jnewbery: "... deserializing the entire block ..." -- shameless plug, I have an old PR that avoids that most of the time during IBD, speeding it up considerably https://github.com/bitcoin/bitcoin/pull/16981
<stickies-v> jnewbery oh okay makes sense, so in this case it would be better to omit the initialization and just have `std::optional<size_t> opt_tx_position;` for improved readability without any downsides?