Return fee and prevout (utxos) to getrawtransaction (rpc/rest/zmq)

https://github.com/bitcoin/bitcoin/pull/23319

Host: jnewbery  -  PR author: dougEfresh

The PR branch HEAD was f30f007c8 at the time of this review club meeting.

Notes

  • Bitcoin transactions consist of transaction inputs, which each refer to an output that is being spent by txid and index, and transaction outputs, which each contain an amount and a spending condition (encoded as a scriptPubKey).

  • The transaction fee is the total amount from the transaction inputs minus the total amount spent by the transaction outputs.

  • Since the transaction inputs do not explicitly include the amounts, the only way to determine the transaction fee is to look up the UTXOs (unspent transaction outputs) that are being spent in this transaction. In other words, the transaction fee is implicit.

  • For transactions in the mempool, the UTXOs spent by the transaction will either be in the UTXO set (the set of all UTXOs implied by the block chain at the current height), or will be the outputs from other transactions in the mempool

  • For transactions that are already spent in the block chain, the UTXO data is saved in undo data.

  • The getrawtransaction RPC method can be used to retrieve a transaction. The transaction can either be an unconfirmed transaction in the mempool or a confirmed transaction which has been saved in a block file on disk.

  • This PR updates the getrawtransaction RPC method to allow the user to retrieve fee data for a transaction.

  • This PR is similar to #18772, which added fee data to the getblock RPC method. We covered that PR in a previous PR Review Club meeting.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. This PR changes the second argument of getrawtransaction from verbose (a bool) to verbosity (an int). Is this ok? Does it break backwards compatibility?

  3. Why is there a new entry added to vRPCConvertParams? What is vRPCConvertParams used for?

  4. The verbosity argument accepts values of 0 (returns the hex-encoded data for the transaction), 1 (returns an object with information about the transaction) or 2 (returns an object with information about the transaction, including fees and prevout information for the inputs). What happens if 3 is passed as the argument? How about -1?

  5. The commit introduces local variables blockUndo and block in the getrawtransaction() function. What are they used for? How/where do they get set?

  6. What does this new for loop do?

  7. Are there any performance implications of that loop? Do they matter?

  8. TxToJSON() now always calls TxToUniv() with verbosity set to TxVerbosity::SHOW_DETAILS_AND_PREVOUT, even if the RPC user called the function with verbosity set to 1. Is that ok? Why/why not?

Meeting Log

  117:00 <jnewbery> #startmeeting
  217:00 <stickies-v> hi everyone
  317:00 <raj_> hello..
  417:00 <glozow> hi
  517:00 <b10c> hi
  617:00 <svav> Hi All
  717:00 <neha> hi
  817:00 <jnewbery> hi folks. Welcome to PR Review Club. Feel free to say hi to let people know you're here
  917:00 <maxe> hi
 1017:01 <jnewbery> Is anyone here for the first time?
 1117:01 <pg156> hi
 1217:01 <michaelfolkson> hi
 1317:01 <maxe> I'm first time
 1417:01 <gene> hi
 1517:01 <schmidty> hi
 1617:01 <jnewbery> maxe: you're very welcome! Feel free to ask questions at any point if anything is unclear. We're all here to learn together
 1717:01 <effexzi> Hi
 1817:02 <jnewbery> Notes and questions are in the normal place: https://bitcoincore.reviews/23319
 1917:02 <maxe> jnewbery: thanks for the warm welcome
 2017:02 <jnewbery> I’ll use those questions to guide the conversation, but feel free to jump in at any point if you have questions or comments.
 2117:02 <jnewbery> Who had a chance to review the PR and notes/questions this week? (y/n)
 2217:02 <stickies-v> y
 2317:02 <pg156> y
 2417:02 <maxe> y
 2517:03 <larryruane> hi y (mostly)
 2617:03 <raj_> mostly y
 2717:03 <glozow> n
 2817:03 <neha> n
 2917:03 <jnewbery> Let's get into the questions. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?
 3017:03 <pg156> I read the notes and questions and the PR, but I don't have enough information to give a concept ACK, because I don't understand how the fee data will exactly help the user. What are the use cases?
 3117:04 <stickies-v> Approach ACK, waiting for a bug to be fixed before tACK
 3217:04 <raj_> concept ACK, currently theres a test failure..
 3317:05 <maxe> I will say Concept ACK but experience with codebase none existent
 3417:05 <larryruane> concept, approach, light-tested ACK
 3517:05 <jnewbery> pg156: it looks like there's not much motivation for this change given in the PR or associated issue. Perhaps someone else here can give some use-cases?
 3617:06 <maxe> I noticed this in the notes: "Initial concept ACK. This is really helpful for wallets trying to sync with core rpc out there, and it will reduce get calls by half for input fetching. Thanks for working on this."
 3717:06 <larryruane> except for me, `test/functional/rpc_rawtransaction.py` fails
 3817:07 <raj_> jnewbery, I find the input details reporting very helpful. I am working on a wallet that syncs from bitcoin core with RPC, to get the input details we had to make many rpc get queries. This will reduce those get calls by a lot..
 3917:07 <jnewbery> raj_: nice
 4017:07 <michaelfolkson> raj_: But you could process the fee calculation externally to the Core RPC right?
 4117:08 <jnewbery> I think fee information is probably generally useful for people building block explorers or other similar tools to read historic data
 4217:08 <michaelfolkson> I agree it seems slightly more convenient (at least)
 4317:08 <raj_> michaelfolkson, yes, but not without getting the input details. And yes, once we get the inputs, getting fee is mostly trivial. So not very sure on the fee part.
 4417:09 <michaelfolkson> Right
 4517:09 <jnewbery> Next question: This PR changes the second argument of getrawtransaction from verbose (a bool) to verbosity (an int). Is this ok? Does it break backwards compatibility?
 4617:09 <pg156> It maintains backwards compatibility by this line:
 4717:09 <pg156> `verbosity = request.params[1].get_bool() ? 1 : 0;`
 4817:09 <pg156> So if the argument is bool, it will be converted to 0 or 1.
 4917:09 <lsilva_> Does not break backwards compatibility. If the user passes a boolean parameter, the code will still handle it (src / rpc / rawtransaction.cpp: 201-205).
 5017:10 <pg156> But where is "request" created? Is it in this line? https://github.com/bitcoin-core-review-club/bitcoin/blob/f30f007c829547e44cac4214c04a0fa7d0ddc3c2/src/rpc/rawtransaction.cpp#L158
 5117:10 <maxe> a comment in the release notes suggests the author considered backwards compatibility
 5217:10 <larryruane> maybe very slightly non-backward compatible, because before this PR, specifying a 2 gives the old verbose but now with this PR, a more verbose output ... so that's a change in behavior (the the same input), but that's undocumented usage, so it's okay
 5317:11 <jnewbery> larryruane: very nice observation!
 5417:11 <sipa> larryruane: (i haven't reviewed the PR in detail) even if you do that, does it actually change any output fields, or just add more?
 5517:12 <larryruane> oh but wait, correcting myself, all this does is add *new* fields to the output, so anyone looking at the existing fields will find them (with the same meanings)
 5617:12 <larryruane> sipa: yes, you beat me to that comment :)
 5717:13 <neha> thanks for that clarification! so iiuc clients who might have been passing a 2 or 3 for a bool (which is weird, but ok) should be fine? does it break anything to have extra fields?
 5817:13 <michaelfolkson> Nice comment by stickies-v: "Did some testing and it looks like the RPC is failing for coinbase transactions when verbosity==2. In this case we should probably fallback to verbosity==1 behaviour instead?"
 5917:13 <larryruane> question: would it be okay to add a new verbosity level (2) and have it cause the removal of some fields? (or changing their meaning?)
 6017:13 <jnewbery> sipa larryruane: right, I think it just adds new fields to the object being returned. That should be fine - clients should tolerate receiving fields that they don't know the meaning off, since we often upgrade RPCs by adding new fields
 6117:13 <larryruane> (not that we would ever want to do that, but just curious)
 6217:14 <michaelfolkson> That's a backward compatibility issue (at least with the current state of the PR). I'm assuming it should never fail (once PR issue is sorted)
 6317:14 <larryruane> michaelfolkson: can you explain what "PR issue is sorted" means for us newbies?
 6417:14 <sipa> larryruane: that'd be an incomaptible change, and we usually follow RPC deprecation guidelines for that
 6517:15 <larryruane> sipa: +1 thanks
 6617:15 <sipa> (first add a -deprecatedrpc=bla argument to have it retain the field, later drop it)
 6717:15 <stickies-v> would we be breaching any API guarantees by simply adding the new `fee` and `prevout` in the verbose output, without adding the new verbosity==2 option? Since we're just adding new fields?
 6817:15 <glozow> neha: any software that's passing getrawtransaction(2)["somefield"] will still work. if there is software asserting (getrawtransaction(2) == someobject) then it fails, but that just means they need to change the 2 to a true
 6917:15 <glozow> (afaiu)
 7017:15 <larryruane> stickies-v: I think that would be okay, BUT, performance would be worse (and possibly of no benefit)?
 7117:16 <jnewbery> larryruane: that *could* be a break in user behaviour, *if* the users is calling the RPC with verbose={some value greater than 1} *and* they were expecting that field to always be there. However, that's a pretty pathological edgecase, since users shouldn't really have been setting verbose={some value greater than 1}
 7217:16 <neha> glozow: thanks! my question is does anyone do the latter, or I guess more like what is the API contract around that? Is it considered backwards-incompatible for that to change?
 7317:16 <michaelfolkson> larryruane: Ha, I'm just referring to the RPC failing (stickies-v comment)
 7417:16 <merkle_noob[m]> Hi everyone(Late again :( )
 7517:16 <michaelfolkson> https://github.com/bitcoin/bitcoin/pull/23319#pullrequestreview-806345918
 7617:17 <glozow> neha: i think i'd consider that backwards compatible. i can imagine the latter being used in a test maybe?
 7717:17 <neha> based on jnewbery's "clients should tolerate receiving fields that they don't know the meaning off, since we often upgrade RPCs by adding new fields" i agree
 7817:18 <jnewbery> *meaning of. oops
 7917:18 <stickies-v> larryruane ah yes true, there is a performance impact, thanks
 8017:18 <neha> just curious what the API contract is or if it's documented anywhere
 8117:18 <jnewbery> I think really this RPC shouldn't have tolerated receiving a verbose={value greater than 1} previously
 8217:19 <larryruane> jnewbery: ah but it does currently tolerate verbose=3
 8317:19 <larryruane> maybe should not?
 8417:19 <pg156> larryruane: +1. I saw that in code too.
 8517:19 <glozow> does our RPC argument handling, in general, allow conversions of non-bools to bools?
 8617:19 <larryruane> it acts like verbose=2
 8717:20 <lsilva_> I agree. If verbosity is not between 0 and 2, it should return an error.
 8817:20 <jnewbery> larryruane: yes it does, but I think it shouldn't!
 8917:20 <jnewbery> Next question: Why is there a new entry added to vRPCConvertParams? What is vRPCConvertParams used for?
 9017:21 <pg156> So the second argument can be called with the new named argument "verbosity".
 9117:21 <lsilva_> vRPCConvertParams is a array of CRPCConvertParam. Its purpose is to convert non-string RPC argument to JSON. It is used by bitcoin-cli when preparing the request in DefaultRequestHandler::PrepareRequest. The methods used to convert values are RPCConvertNamedValues (for named parameters) and RPCConvertValues.
 9217:22 <larryruane> glozow: I think there needs to be special case code to handle that https://github.com/bitcoin-core-review-club/bitcoin/commit/f30f007c829547e44cac4214c04a0fa7d0ddc3c2#diff-a58e7bb9d9a8a0287c0b7281d99da4e79b6f8c2a5780c24c6d76c14212c48640R172
 9317:22 <jnewbery> lsilva_: very good!
 9417:22 <jnewbery> perfect answer
 9517:23 <larryruane> note that you need to specify `-named` on the `bitcoin-cli` command line, and then ALL arguments need to be named
 9617:23 <larryruane> but what's nice is then they can be in any order!
 9717:24 <jnewbery> larryruane: yeah, and you can omit arguments that you don't want to specify (unlike if you're using positional arguments)
 9817:24 <larryruane> example `src/bitcoin-cli -named getrawtransaction txid=f420bf49b355894783ed5c62bd7dfb23c48aa3eb3b206e094f57bc604506e327 verbosity=2`
 9917:24 <jnewbery> Next question: The verbosity argument accepts values of 0 (returns the hex-encoded data for the transaction), 1 (returns an object with information about the transaction) or 2 (returns an object with information about the transaction, including fees and prevout information for the inputs). What happens if 3 is passed as the argument? How about -1?
10017:25 <stickies-v> Any verbosity value that is a valid integer and not `0` or `1` is treated as if it were `2`.
10117:25 <lsilva_> If verbosity is not between 0 and 2, it will be considered as 2.
10217:26 <jnewbery> I think so too. Does that seem desirable?
10317:27 <pg156> What's the C++ idiomatic way to express a type could be either a bool, or an integer of value 0, 1, 2? An abstract class? I kno in other languages this could be easier. (subtyping?)
10417:27 <schmidty> Not great for backward compatibility
10517:27 <jnewbery> schmidty: I agree!
10617:27 <gene> pg156: std::variant or a union
10717:27 <neha> depends on the API contract with users. it sounds like you are not allowed to remove fields, only add them, which means that any update to use other verbosity levels would need to abide by that
10817:28 <pg156> gene: thanks
10917:28 <glozow> should return an rpc error for a bad argument like -1
11017:28 <neha> that seems limiting for values of verbosity < 0
11117:28 <lsilva_> I don't think it is a desirable behavior.
11217:28 <stickies-v> I generally prefer things to fail explicitly. I don't see any benefit of accepting these other values, except slightly reducing the complexity of the code by avoiding error checking
11317:28 <michaelfolkson> stickies-v: Agree
11417:29 <sipa> conceptually i agree, but i don't think it actually matters here
11517:29 <jnewbery> neha: I think if in some version of the code, calling the RPC with verbosity=n fails, and in the next version of the code, calling the RPC with verbosity=n succeeds, but has some of the fields missing that you'd get with verbosity=n-1, that's fine
11617:30 <jnewbery> it's only if calling the RPC in the same way removes fields from one version to the next, we need to be careful
11717:31 <jnewbery> ok, next question. The commit introduces local variables blockUndo and block in the getrawtransaction() function. What are they used for? How/where do they get set?
11817:32 <lsilva_> They are used to retrieve confirmed transaction from block files.
11917:32 <stickies-v> They contain the unserialized CBlockUndo and CBlock data, which get set by calling the UndoReadFromDisk and ReadBlockFromDisk functions (using out parameters). `blockUndo` is used to calculate the fee more quickly since it already contains the prevouts. `block` is used to find the index of the transaction in the block, since that is necessary to find the correct undo data in `blockUndo`.
12017:33 <lsilva_> What if the inputs are in the mempool?
12117:34 <lsilva_> Or are the inputs in several different blocks?
12217:34 <lsilva_> Or *if the inputs are in several different blocks?
12317:35 <jnewbery> stickies-v: yes, I agree
12417:36 <stickies-v> lsilva_ it seems like no fee or prevout data will be included in the RPC response then, because the blockindex would be null so it would fail the !blockindex test. Just thinking out loud here, anyone has any other views?
12517:37 <jnewbery> lsilva_: good questions. Did you test either of those?
12617:37 <stickies-v> ^well, pass the `!blockindex` test, so behaviour of verbosity==1 is used
12717:38 <pg156> Why do we need both `CBlock` and `CBlockUndo`? Is it possible to have one class `CBlock` (and only block*.dat files without rev*.dat files) handle the block reverting as well?
12817:38 <stickies-v> inputs being in different blocks is no issue, that's why the have the undo data - it stores all the prevouts of each transaction in the block in its own data structure so you don't need to look them all up
12917:39 <larryruane> question, `blockUndo` and `block` get default-constructed even if `verbosity` is 1, is that a performance concern? (I didn't look at those constructors)
13017:39 <jnewbery> I haven't tested the new behaviour for a mempool tx, but I agree with stickies-v that it wouldn't be able to return the fee or prevout data
13117:39 <jnewbery> I also agree that it's not a problem for the inputs to be in different blocks for the reason that stickies-v gave
13217:39 <lsilva_> But there is only one CBlockUndo blockUndo and one CBlock block.
13317:39 <sipa> pg156: the best argument i think is that they are constructed at a different time, in different order; blocks are received directly from peers in the network; undo data is a side effect of validation, which happens whenever blocks become part of the new longest active chain
13417:40 <pg156> sipa: Thanks.
13517:40 <sipa> so you'd need to go update files on disk when undo data is produced, reordering/shuffling other data
13617:41 <jnewbery> larryruane: The default constructors are very cheap
13717:41 <sipa> having it sepatate means both can be essentially append-only data structures
13817:41 <larryruane> sipa yes I notice
13917:42 <larryruane> sorry, I noticed that the blks files can contain blocks that aren't in the best chain, we don't try to delete them (reuse that space)
14017:42 <sipa> if you care about space, prune
14117:42 <stickies-v> lsilva_ yep, in this function we're looking at a single transaction, that belongs to a single block. For each block, an equivalent set of undo data is stored on disk. That undo data, for each block, contains all the inputs that were used for each of the transactions in the block. That makes it much easier to roll back a block, because all the prevouts are already stored (duplicated) there.
14217:42 <sipa> if you don't, the <1% of stale blocks in the chain aren't going to matter
14317:43 <sipa> stickies-v: not for each block!~
14417:43 <jnewbery> I think it's slightly bad style that block and blockUndo get updated as side-effects of function calls in the if statement, and then used later
14517:44 <lsilva_> stickies-v got it. Thanks.
14617:44 <jnewbery> it just makes it a bit more difficult to read
14717:44 <stickies-v> sipa oh, could you elaborate on that? when do we not have undo data?
14817:44 <larryruane> jnewbery: yes that if statement is somewhat hard to human-parse
14917:44 <sipa> stickies-v: for blocks that were never validated
15017:44 <larryruane> (like i'm applying demorgan's rule in my head :) )
15117:45 <jnewbery> larryruane: 😂
15217:45 <stickies-v> oh okay wasn't aware, thanks! I think that's not a problem in this case since we only use the active chain?
15317:45 <jnewbery> stickies-v: the undo data is generated when we connect the block to the tip of the chain (and remove the spent UTXOs from the UTXO set)
15417:46 <sipa> stickies-v: that's right, blocks in the active chain are by definition validated
15517:46 <sipa> (because the active chain is defined as the most-work valid chain...)
15617:47 <larryruane> jnewbery: ah so it's sort of like *moving* the spent TXOs from the UTXO set to the undo "set" (data)
15717:47 <jnewbery> if we were writing the ReadBlockFromDisk() and UndoReadFromDisk() interfaces from scratch, we might have them return std::optional<CBlock|CBlockUndo> (as long as that didn't impact performance)
15817:48 <jnewbery> Next question: what does this new for loop do? (https://github.com/bitcoin-core-review-club/bitcoin/commit/f30f007c829547e44cac4214c04a0fa7d0ddc3c2#)
15917:49 <stickies-v> It finds the position of the transaction in the block, which is needed to look up the correct undo data
16017:49 <larryruane> side question, does `opt_tx_position` need to be initialized to `std::nullopt`? Isn't that the default? Or is that just considered good practice?
16117:50 <larryruane> stickies-v: +1 and that's a linear search, could be roughly 2k transactions in the block
16217:51 <larryruane> could someone explain why, in that loop, `block.vtx.at(i)` instead of `block.vtx[i]`?
16317:51 <stickies-v> larryruane I think if you don't initialize it depends on the compiler what the default value is going to be, so hence best to be explicit?
16417:51 <gene> larryruane: bounds checking
16517:52 <jnewbery> larryruane: if you're concerned about performance, the thing that you should be worried about is deserializing the entire block from disk into CTransaction objects
16617:52 <jnewbery> (which happens in ReadBlockFromDisk())
16717:52 <larryruane> jnewbery: +1 thanks
16817:53 <jnewbery> stickies-v: the default ctor for a std::optional<> is defined by the spec. It creates a std::nullopt : https://en.cppreference.com/w/cpp/utility/optional/optional
16917:54 <larryruane> jnewbery: "... deserializing the entire block ..." -- shameless plug, I have an old PR that avoids that most of the time during IBD, speeding it up considerably https://github.com/bitcoin/bitcoin/pull/16981
17017:54 <jnewbery> larryruane: shameless plugs are always encouraged :)
17117:54 <larryruane> (well, i should say, avoids doing it TWICE)
17217:55 <larryruane> (we should really get that merged :) )
17317:55 <jnewbery> gene: right, the [] operator on a std::vector doesn't do any bounds checking
17417:56 <larryruane> jnewbery: gene: but would you say that since `i` is limited by `size()` it can't be out of bounds?
17517:56 <sipa> jnewbery: unless debug mode
17617:57 <jnewbery> larryruane: yes, I agree that bounds checking is not required here since it's done by the for loop
17717:57 <stickies-v> larryruane that looks interesting, I'll take a loot at #16981 as my next review
17817:57 <gene> larryruane: was thinking the same, unnecessary overhead
17917:59 <jnewbery> I'd really like it if there was a C++ equivalent to Python's enumerate: https://docs.python.org/3/library/functions.html#enumerate
18017:59 <stickies-v> jnewbery oh okay makes sense, so in this case it would be better to omit the initialization and just have `std::optional<size_t> opt_tx_position;` for improved readability without any downsides?
18117:59 <jnewbery> which would return a std::pair of the counter and object
18217:59 <jnewbery> unfortuantely, that's not something in the c++ language or standard library
18318:00 <jnewbery> stickies-v: yes, I think it's fine to use the default initialization there
18418:00 <jnewbery> ok, that's time
18518:00 <jnewbery> #endmeeting