Since the activation of SegWit, transactions are identifiable by two types of identifiers: txid and wtxid. Both identifiers are a uint256 resulting from a double sha256 hash of serialized transaction data. The core difference is that wtxid includes the witness data, while txid does not.
Type-safety is a concept in programming to help avoid type errors, enforcing strict correspondence between a variable and its type. It ensures that a program performs only legal operations on data, with validation occurring during compile-time. This proactive error detection and prevention lead to more reliable and maintainable software, minimizing unintended behavior and vulnerabilities. An example of an earlier type-safety improvement in Bitcoin Core is the incorporation of std::chrono, introduced in #16908.
In the context, of transaction identifiers, type-safety is important because it is impossible to deduce the type difference from just the uint256 value. With GenTxid type checking can be done, but it is more verbose and done at run-time instead of at compile-time. Additionally, transaction identifier types improve readability of code by being explicit about expected types instead of solely relying on clear variable naming.
This PR aims to improve type safety by introducing the new transaction_identifier type, with Txid and Wtxid typedefs for brevity.
There are instances where wtxid and txid are treated identically, such as for example in most_recent_block_txs, where the uint256 type is used to handle this. In other contexts such as CTxMemPool::info where wtxid and txid are treated differently but either is accepted, the GenTxid type is adopted.
What does it mean for a transaction identifier to be type-safe? Why is that important or helpful? Are there any downsides?
Why is it better to enforce types at compile-time instead of at run-time?
Conceptually, when writing new code that requires referring to transactions, when should you use txid and when should you use wtxid? Can you point to any examples in the code where using one instead of the other could be very bad? (1 point per example)
In which concrete way(s) could using transaction_identifier instead of uint256 help find existing bugs or prevent the introduction of new ones? On the other hand, could this change introduce new bugs?
SegWit activated at block height 481,824. Do transactions in blocks < 481,824 also have a wtxid? For example, what about transaction with txiddc01120a639283c11eeada6ce540178adfbbca7fe0d2f41076367cbc9e901822?
Are there any places where a (w)txid is returned as a uint256 where it would not make sense to return a transaction_identifier instead?
The GenTxid class already exists. How does it already enforce type correctness, and how is it different from the approach in this PR?
Are there any places where GenTxid is used but using transaction_identifier would be better?
How is transaction_identifier able to subclass uint256, given that, in C++, integers are types and not classes?
Why does transaction_identifier subclass uint256 instead of being a completely new type?
Txid and Wtxid are typedefs for transaction_identifier<has_witness>. Do you see any reason for code to directly use the transaction_identifier type? Should that be forbidden, and if so, how?
transaction_identifier::Compare is defined to return a int, yet reinterpret_cast<const uint256&> is used in the return statement. Is this correct, and if so, why? Why reinterpret_cast instead of another cast, such as static_cast?
Is the has_witness template parameter used for purposes other than type differentiation in transaction_identifier? What are the limitations of this approach? Do you see alternatives?
Why is TxOrphanage a good place to start introducing type-safe transaction ids?
<abubakarsadiq> Concept ACK, did not do code review but I think this is very useful improvement, not only that uint256 can be block hask, header, txid and wtxid. It will be nice to distinguish between wtxid and txid with type. Because there is ambiguity in the code whereby hash is referred to txid, wtxid and blockhash. Using type specifier for wtxid and txid will help distinguish even if an ambiguous variable name is used
<larryruane_> before I approach ACK, I'm trying to figure out if it would be better for a `Txid` and a `Wtxid` to *inheret* from `uint256`, or for them to simply *include* a `uint256` (and nothing more, you could call this wrapping)
<BrandonOdiwuor> A 'transaction identifier' is considered 'type-safe' when it undergoes rigorous type-checking during compile-time, guaranteeing that operations involving these identifiers are thoroughly validated and constrained to their intended data types before the program is executed, thus preventing type-related errors and ensuring program reliability.
<larryruane_> type safe means the two kinds of transaction IDs have distinct types (instead of both being `uint256`), so trying to assign one type to a variable of another type (for example) is like trying to assign a `float` to an `int`
<larryruane_> a C codebase i worked on years ago had a really cool concept of "wrapped" integers, so you couldn't assign a block count (this was a storage system) to a byte count, etc. .... so similar goal as here, but it used the wrapped approach (since there's no inheretence in C)
<larryruane_> no, a block count type was a C struct with just one member, conventionally called `i` (for `integer`), and the byte count was a different C struct (also with just `i`) ... so you couldn't assign one to the other for example
<stickies-v> BrandonOdiwuor94: yeah, catching stuff at compile time is great - makes it quick for the developer to spot their error as they are developing, as opposed to relying on writing extensive test suites etc (that still may miss things) to catch things in runtime
<stickies-v> larryruane_: i don't just mean typedefs: in a naive implementation, just having Txid and Wtxid each subclass uint256 does not provide type safety on its own, as e.g. we can still do comparisons between them
<stickies-v> maxedwards: so the complexity that I was talking about is that currently we have to specify which kinds of comparisons etc we allow, because currently both `Txid` and `Wtxid` inherit from `uint256`
<larryruane_> to me, with my not-great understanding of subclassing, the wrapping approach seems simpler and safer -- two classes both with just one member, a `uint256` (like we did with the wrapped integers in C) ... but never mind, we can move on, sorry to distrupt
<BrandonOdiwuor> Early Error Detection: Compile-time checking allows for the identification of type-related errors and mismatches at compile time before the program is even executed. This means that issues are caught during the development and testing phase, reducing the chances of runtime errors and unexpected program behavior.
<stickies-v> 4. Conceptually, when writing new code that requires referring to transactions, when should you use `txid` and when should you use `wtxid`? Can you point to any examples in the code where using one instead of the other could be very bad? (1 point per example)
<larryruane_> stickies-v: not sure if this is what you're asking, but two pure segwit transactions with the same txid (even if different wtxids) have the same *effect* (modify the UTXO set the same way)
<larryruane_> stickies-v: in the mempool, we definitely want to use txids as keys, because if we used wtxid, two tx with the same txid could be there simultaneously, and if that happened, an output from both of them could be spent separately, leading to double-spend bug?
<maxedwards> am I right in thinking that in theory this same/different effect shouldn't be a concern for this PR as the existing code should already be using the correct id, it's just that it's represented by the same type?
<stickies-v> 5. In which concrete way(s) could using `transaction_identifier` instead of `uint256` help find existing bugs or prevent the introduction of new ones? On the other hand, could this change introduce new bugs?
<stickies-v> larryruane_: yup. a function that takes a `uint256 hash` as an argument can specify in the docs that `hash` is supposed to be a wtxid, but if a developer (and the reviewers) don't read that properly then it's trivial to pass a txid and potentially (probably) break things
<stickies-v> dberkelmans: well one important use case that GenTxid solves is that in quite a few places we want to be able to take input that can be either a txid or a wtxid, as opposed to just one or the other