Type-safe transaction identifiers (utils/log/libs)

https://github.com/bitcoin/bitcoin/pull/28107

Host: stickies-v  -  PR author: dergoegge

The PR branch HEAD was e74152d5b48bdca8dfc2217f35842dbb24b2746c at the time of this review club meeting.

Notes

  • Since the activation of SegWit, transactions are identifiable by two types of identifiers: txid and wtxid. Both identifiers are a uint256 resulting from a double sha256 hash of serialized transaction data. The core difference is that wtxid includes the witness data, while txid does not.

  • Type-safety is a concept in programming to help avoid type errors, enforcing strict correspondence between a variable and its type. It ensures that a program performs only legal operations on data, with validation occurring during compile-time. This proactive error detection and prevention lead to more reliable and maintainable software, minimizing unintended behavior and vulnerabilities. An example of an earlier type-safety improvement in Bitcoin Core is the incorporation of std::chrono, introduced in #16908.

  • In the context, of transaction identifiers, type-safety is important because it is impossible to deduce the type difference from just the uint256 value. With GenTxid type checking can be done, but it is more verbose and done at run-time instead of at compile-time. Additionally, transaction identifier types improve readability of code by being explicit about expected types instead of solely relying on clear variable naming.

  • This PR aims to improve type safety by introducing the new transaction_identifier type, with Txid and Wtxid typedefs for brevity.

  • There are instances where wtxid and txid are treated identically, such as for example in most_recent_block_txs, where the uint256 type is used to handle this. In other contexts such as CTxMemPool::info where wtxid and txid are treated differently but either is accepted, the GenTxid type is adopted.

Questions

Concept

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. What does it mean for a transaction identifier to be type-safe? Why is that important or helpful? Are there any downsides?

  3. Why is it better to enforce types at compile-time instead of at run-time?

  4. Conceptually, when writing new code that requires referring to transactions, when should you use txid and when should you use wtxid? Can you point to any examples in the code where using one instead of the other could be very bad? (1 point per example)

  5. In which concrete way(s) could using transaction_identifier instead of uint256 help find existing bugs or prevent the introduction of new ones? On the other hand, could this change introduce new bugs?

  6. SegWit activated at block height 481,824. Do transactions in blocks < 481,824 also have a wtxid? For example, what about transaction with txid dc01120a639283c11eeada6ce540178adfbbca7fe0d2f41076367cbc9e901822?

  7. Are there any places where a (w)txid is returned as a uint256 where it would not make sense to return a transaction_identifier instead?

  8. The GenTxid class already exists. How does it already enforce type correctness, and how is it different from the approach in this PR?

  9. Are there any places where GenTxid is used but using transaction_identifier would be better?

Code Review

  1. How is transaction_identifier able to subclass uint256, given that, in C++, integers are types and not classes?

  2. Why does transaction_identifier subclass uint256 instead of being a completely new type?

  3. Txid and Wtxid are typedefs for transaction_identifier<has_witness>. Do you see any reason for code to directly use the transaction_identifier type? Should that be forbidden, and if so, how?

  4. transaction_identifier::Compare is defined to return a int, yet reinterpret_cast<const uint256&> is used in the return statement. Is this correct, and if so, why? Why reinterpret_cast instead of another cast, such as static_cast?

  5. Is the has_witness template parameter used for purposes other than type differentiation in transaction_identifier? What are the limitations of this approach? Do you see alternatives?

  6. Why is TxOrphanage a good place to start introducing type-safe transaction ids?

  7. Consider return m_wtxid_to_orphan_it.count(Wtxid::FromUint256(gtxid.GetHash()));. Why is it not type-safe to allow Wtxid to be constructed by just passing a uint256 to its constructor? Specifically, why doesn’t Wtxid{gtxid.GetHash()} compile?

Meeting Log

Meeting 1

  117:00 <stickies-v> #startmeeting
  217:00 <dberkelmans> hi
  317:00 <dergoegge> hi
  417:00 <BrandonOdiwuor> Hello
  517:01 <lightlike> hi
  617:01 <larryruane_> hi
  717:01 <vmammal> first timer here. im here to learn
  817:01 <stickies-v> welcome everyone! Today we're looking at #28107, authored by dergoegge . The notes and questions are available on https://bitcoincore.reviews/28107
  917:02 <stickies-v> great, glad you're joining us vmammal! any other first timers today?
 1017:02 <Murch[m]> Hey
 1117:02 <stickies-v> even if you're just lurking, feel free to say hi!
 1217:02 <sipa> hi
 1317:02 <larryruane_> will we be back here tomorrow? are we covering the first set of questions today (concept)?
 1417:03 <stickies-v> correct larryruane_ ! same time tomorrow for the second batch of questions
 1517:03 <glozow> hi
 1617:03 <effexzi> Hi every1
 1717:03 <maxedwards> hi
 1817:03 <stickies-v> who got the chance to review the PR or read the notes? (y/n)
 1917:03 <Murch[m]> n
 2017:03 <maxedwards> y
 2117:03 <BrandonOdiwuor> y
 2217:03 <lightlike> y
 2317:03 <dberkelmans62> y
 2417:03 <abubakarsadiq> hi
 2517:03 <glozow> woohooo welcome newcomers!
 2617:03 <glozow> y
 2717:04 <stickies-v> alright wonderful, let's dive into the wonderfully wholesome world of type-safety
 2817:05 <larryruane_> i heard someone say years ago, "strong types are for weak minds" (haha I don't agree!)
 2917:05 <stickies-v> for those of you who were able to review, would you give it a Concept ACK, Approach ACK, Tested ACK, or NACK? what was your review approach?
 3017:05 <vmammal> concept ACK
 3117:06 <larryruane_> i'm still reviewing
 3217:06 <maxedwards> Concept ACK
 3317:06 <BrandonOdiwuor> Concept ACK
 3417:06 <abubakarsadiq> Concept ACK, did not do code review but I think this is very useful improvement, not only that uint256 can be block hask, header, txid and wtxid. It will be nice to distinguish between wtxid and txid with type. Because there is ambiguity in the code whereby hash is referred to txid, wtxid and blockhash. Using type specifier for wtxid and txid will help distinguish even if an ambiguous variable name is used
 3517:07 <maxedwards> built and ran tests, read through the code, looked at the super class of uint256
 3617:07 <stickies-v> abubakarsadiq: I agree! compile-time checks (which we'll get to in a bit) are a great improvement on their own, but just improving code readability is quite nice too imo
 3717:08 <stickies-v> so, diving into the conceptual side of things a bit
 3817:08 <stickies-v> 2. What does it mean for a transaction identifier to be type-safe? Why is that important or helpful? Are there any downsides?
 3917:08 <larryruane_> before I approach ACK, I'm trying to figure out if it would be better for a `Txid` and a `Wtxid` to *inheret* from `uint256`, or for them to simply *include* a `uint256` (and nothing more, you could call this wrapping)
 4017:09 <BrandonOdiwuor> A 'transaction identifier' is considered 'type-safe' when it undergoes rigorous type-checking during compile-time, guaranteeing that operations involving these identifiers are thoroughly validated and constrained to their intended data types before the program is executed, thus preventing type-related errors and ensuring program reliability.
 4117:09 <BrandonOdiwuor> Some advantages include: Early detection of type-errors at compile time and preventing runtime errors
 4217:10 <larryruane_> type safe means the two kinds of transaction IDs have distinct types (instead of both being `uint256`), so trying to assign one type to a variable of another type (for example) is like trying to assign a `float` to an `int`
 4317:10 <stickies-v> larryruane_: I have had similar thoughts. I think inheritance makes the transition easier, i.e. if we don't want to update all places that use a transaction identifier
 4417:10 <stickies-v> but ignoring that it seems like just having a `uint256` member seems like it would simplify the implementation quite a bit
 4517:11 <larryruane_> @stickies-v thanks, that's my suspicion too, but wasn't sure, want to investigate further
 4617:12 <abubakarsadiq> there are two types of transaction identifiers in post segwit bitcoin, txid and wtxid.
 4717:12 <abubakarsadiq> To avoid ambiguity and logic error where both are interchanged, it will be better if they are distinct types
 4817:12 <abubakarsadiq> The code won't compile if one is used in the place of another.
 4917:13 <larryruane_> a C codebase i worked on years ago had a really cool concept of "wrapped" integers, so you couldn't assign a block count (this was a storage system) to a byte count, etc. .... so similar goal as here, but it used the wrapped approach (since there's no inheretence in C)
 5017:13 <stickies-v> BrandonOdiwuor: would you say the error catching happens mostly at compile time or at runtime?
 5117:13 <maxedwards> can you elaborate on the simplicity enabled by having an internal uint256 vs subclassing?
 5217:13 <stickies-v> larryruane_: is type safety guaranteed by just having two distinct types, though?
 5317:14 <BrandonOdiwuor94> stickies-v it would be better to catch the errors at compile-time rather than do runtime assertions to catch the errors
 5417:14 <larryruane_> no, a block count type was a C struct with just one member, conventionally called `i` (for `integer`), and the byte count was a different C struct (also with just `i`) ... so you couldn't assign one to the other for example
 5517:15 <maxedwards> simply providing the types is not enough, they have to be consumed
 5617:15 <Murch[m]> I assume that with inheritance assignments might permit to autoconvert between the types without complaining
 5717:15 <stickies-v> BrandonOdiwuor94: yeah, catching stuff at compile time is great - makes it quick for the developer to spot their error as they are developing, as opposed to relying on writing extensive test suites etc (that still may miss things) to catch things in runtime
 5817:15 <larryruane_> i.e. so it wasn't just two typedefs (which does NOT give type safety)
 5917:16 <stickies-v> larryruane_: i don't just mean typedefs: in a naive implementation, just having Txid and Wtxid each subclass uint256 does not provide type safety on its own, as e.g. we can still do comparisons between them
 6017:16 <larryruane_> yeah trying to catch things are runtime is tricky, you might never happen to go through the buggy code path! (but for sure one of our users will!)
 6117:17 <stickies-v> well, it does provide some type safety, just not entirely
 6217:17 <stickies-v> Murch[m]: yes exactly@
 6317:17 <maxedwards> Murch[m]: I believe there is a comment about doing just that in the PR
 6417:18 <stickies-v> maxedwards: so the complexity that I was talking about is that currently we have to specify which kinds of comparisons etc we allow, because currently both `Txid` and `Wtxid` inherit from `uint256`
 6517:19 <stickies-v> I guess we already largely covered the next question but I'll throw it in here regardless in case anyone has something to add:
 6617:19 <larryruane_> to me, with my not-great understanding of subclassing, the wrapping approach seems simpler and safer -- two classes both with just one member, a `uint256` (like we did with the wrapped integers in C) ... but never mind, we can move on, sorry to distrupt
 6717:19 <stickies-v> 3. Why is it better to enforce types at compile-time instead of at run-time?
 6817:19 <glozow> find bug moar fast.
 6917:19 <BrandonOdiwuor> Early Error Detection: Compile-time checking allows for the identification of type-related errors and mismatches at compile time before the program is even executed. This means that issues are caught during the development and testing phase, reducing the chances of runtime errors and unexpected program behavior.
 7017:20 <larryruane_> stickies-v: kind of answered already, but it prevents coding mistakes where the wrong kind of ID is used (txid instead of wtxid or the reverse)
 7117:20 <stickies-v> larryruane_: but with wrapping, would you still be able to not update all callsites of functions that return a Txid/Wtxid right away?
 7217:20 <stickies-v> I think there's merit to allowing this to be a gradual improvement
 7317:20 <stickies-v> glozow: much accurate
 7417:20 <larryruane_> @stickies-v yes very true! that's an important consideration
 7517:21 <stickies-v> BrandonOdiwuor: yeah it's generally much quicker
 7617:22 <BrandonOdiwuor> I saw this comment on the notes (https://github.com/bitcoin/bitcoin/pull/28031#discussion_r1261648754) I think it would help with this
 7717:23 <glozow> yeah I write this bug approximately once every 2 weeks
 7817:23 <maxedwards> assuming less than perfect test coverage and a bug, a runtime check could end up throwing on production
 7917:23 <maxedwards> worst case and I'm sure with the scrutiny of bitcoin it would be caught in an earlier stage
 8017:24 <glozow> Usually I catch it before I push a PR tho. always need to have a "try 2 transactions with same txid but diff witness" test
 8117:24 <glozow> catching at compile time would be nice
 8217:25 <stickies-v> maxedwards: even if it can be tested, being able to remove tests because it's checked at compile time is a big win
 8317:25 <abubakarsadiq> glozow: with this you dont have to do that right \o/?
 8417:25 <stickies-v> launching the next question already - but as always, feel free to continue the conversation on previous questions too!
 8517:25 <stickies-v> 4. Conceptually, when writing new code that requires referring to transactions, when should you use `txid` and when should you use `wtxid`? Can you point to any examples in the code where using one instead of the other could be very bad? (1 point per example)
 8617:26 <lightlike> seems like testing this would still be useful. having type safety won't prevent us from using the wrong type of id (but consistently) in the first place.
 8717:27 <larryruane_> stickies-v: not sure if this is what you're asking, but two pure segwit transactions with the same txid (even if different wtxids) have the same *effect* (modify the UTXO set the same way)
 8817:27 <larryruane_> lightlike: +1
 8917:27 <glozow> lightlike +1
 9017:28 <larryruane_> (pure meaning *all* the inputs are segwit)
 9117:28 <glozow> this is a good example imo: https://github.com/bitcoin/bitcoin/blob/3cd02806ecd2edd08236ede554f1685866625757/src/net_processing.cpp#L4334
 9217:28 <sipa> larryruane_: two non-segwit transactions with the same txid also have the same effect :p
 9317:28 <glozow> if you use txid, that means I can censor any tx I want to you by sending you a version with a mutated witness first
 9417:29 <stickies-v> glozow: 1 point!
 9517:30 <larryruane_> sipa: oh right... two tx with different txids could have the same effect (i think??)
 9617:30 <larryruane_> (that's malleability)
 9717:30 <sipa> larryruane_: yes
 9817:30 <sipa> the same txid implies the same effect, but different txid does not imply different effect
 9917:31 <lightlike> so "when in doubt use wtxid" is a good role of thumb i guess?
10017:31 <stickies-v> so even if you haven't found any examples, conceptually - what do you think is a good heuristic for when to use wtxid instead of txid, or txid instead of wtxid?
10117:32 <stickies-v> lightlike: that seems a reasonable starting point :-D
10217:33 <glozow> pretty much always wtxid unless it's prevout-related
10317:33 <sipa> or tx relay for pre-BIP339 peers
10417:33 <larryruane_> stickies-v: in the mempool, we definitely want to use txids as keys, because if we used wtxid, two tx with the same txid could be there simultaneously, and if that happened, an output from both of them could be spent separately, leading to double-spend bug?
10517:34 <stickies-v> glozow: yeah, and also almost always wtxid when it's about unconfirmed txs?
10617:35 <larryruane_> stickies-v: that sounds opposite to what i said? (but i may be wrong)
10717:36 <abubakarsadiq> we could have identical transaction with multiple wtxid's using wtxid almost everytime, will that not bring duplication especially for unconfirmed txs?
10817:36 <stickies-v> larryruane_: well i think gloria covered that already by it being prevout related?
10917:36 <sipa> the mempool is indexed by both txid and wtxid, because we need to be able to look up by both
11017:36 <glozow> unsure if unconfirmed txs are special. But one distinction that springs to mind is you should never try to cache feerate to a txid
11117:37 <glozow> I think larryruane's point might be that having an index by txid can be a useful way to bake in the assumption that no 2 transactions in the data structure should have the same txid
11217:37 <larryruane_> sipa: gotcha, but we definitely don't want to allow > 1 tx with the same txid
11317:38 <maxedwards> am I right in thinking that in theory this same/different effect shouldn't be a concern for this PR as the existing code should already be using the correct id, it's just that it's represented by the same type?
11417:39 <lightlike> larryruane: we don't want conflicting transactions in general, even if they don't share the txid.
11517:39 <sipa> my guess is actually that if we dropped the unicity of the txid index in the mempool, nothing would break, as different-wtxid-same-txid transactions would still just conflict with each other
11617:39 <sipa> the unicity just allows enforcing that constraint independently
11717:39 <sipa> uniqueness? unicity?
11817:40 <stickies-v> maxedwards: this is more conceptual discussion about txid/wtxid indeed and not directly critical for the PR
11917:40 <larryruane_> lightlike: oh yes that's right, as glozow said, the prevout is txid-based so would be the same
12017:40 <sipa> (tangent, english stackexchange says "I suspect most people would take "Unicity" to be a town where everyone rides unicycles.")
12117:40 <maxedwards> stickies-v: very interesting still
12217:41 <larryruane_> i took bicycle riding lessons but ran out of money halfway through, so now i can only ride a unicycle
12317:41 <glozow> sipa: tbh I would have thought of it as the british equivalent of "college town"
12417:41 <glozow> larryruane: are there 100 seats on 100 unicycles...
12517:42 <sipa> :D
12617:42 <stickies-v> alright time to move on πŸ˜…
12717:42 <sipa> glozow: and they ride one by one into a room with 100 boxes, ...
12817:42 <stickies-v> 5. In which concrete way(s) could using `transaction_identifier` instead of `uint256` help find existing bugs or prevent the introduction of new ones? On the other hand, could this change introduce new bugs?
12917:42 <glozow> πŸ˜‚
13017:43 <larryruane_> stickies-v: is it because lots of other things besides txids and wtxids can be uint256 type? so using `transaction_identifier` would exclude those accidental uses?
13117:45 <BrandonOdiwuor88> larryruane_ I think it would help prevent mixing up using Txid and Wtxid unintentionally such as the comment that was linked in the notes(https://github.com/bitcoin/bitcoin/pull/28031#discussion_r1261648754)
13217:45 <stickies-v> larryruane_: yup. a function that takes a `uint256 hash` as an argument can specify in the docs that `hash` is supposed to be a wtxid, but if a developer (and the reviewers) don't read that properly then it's trivial to pass a txid and potentially (probably) break things
13317:46 <glozow> fun anecdote related to this: one time while working in wallet code, I saw a variable named `wtxid` and assumed it was a witness ID. nope, it stood for "wallet transaction id"
13417:47 <stickies-v> on the other hand - does having this explicit type potentially introduce other, new bugs? can you think of any?
13517:47 <stickies-v> glozow: oooh yeah those are sneaky
13617:47 <maxedwards> can you pass a uint256 to a function that asks for a txid?
13717:47 <maxedwards> or do you have to cast?
13817:47 <larryruane_> i just noticed that there are 1687 occurrences of `uint256` in our code base -- many (maybe most) of which are not transaction ID related!
13917:49 <dergoegge> maxedwards: you would have to use `Txid::FromUint256`
14017:49 <glozow> maxedwards: try and see if it compiles!
14117:50 <stickies-v> maxedwards: see also question 16 that we'll be dealing with tomorrow!
14217:50 <lightlike> dergoegge: did you try to change it everywhere, just to see how big the effort is?
14317:50 <maxedwards> even with this, would it not be possible to change a parameter to a txid when it should be a wtxid and everything would still work?
14417:50 <maxedwards> the uint256 was representing a wtxid
14517:51 <maxedwards> but the type isn't actually enforcing that it is
14617:51 <maxedwards> and could lead to much confusion
14717:51 <maxedwards> even more confusion than had it been left as a uint256
14817:51 <BrandonOdiwuor88> maxedwards I think the compiler would throw an error you have to cast explicitly with reinterpret_cast
14917:51 <maxedwards> yes so I'm assuming you do cast
15017:52 <stickies-v> maxedwards: what do you mean with change a parameter? the function signature?
15117:52 <maxedwards> but there's nothing stopping you from casting a uint256 that represented a wtxid to a txid is there?
15217:52 <maxedwards> yes function signature
15317:52 <stickies-v> well obviously the function signature has to be correct, if you need a wtxid and have your function signature take a `Txid txid` then that's going to lead to problems
15417:53 <maxedwards> not compile time problems though is my question
15517:53 <dergoegge> lightlike: no haven't tried, my guess would be that it is a bunch of effort but also not to crazy
15617:54 <stickies-v> it depends on the callsites of course. sorry i'm not sure i fully understand your question - type-safety doesn't prevent a developer from writing wrong code, indeed
15717:55 <vmammal> dergoegge do you plan to continue the refactor in a future PR?
15817:55 <dergoegge> `Txid::FromUint256(Wtxid{})` might actually work because `uint256 a = Wtxid{}` is allowed.
15917:55 <stickies-v> gonna skip a few questions as we're running out of time
16017:55 <dergoegge> vmammal: yes but I'd also be happy to see other people going for it
16117:55 <stickies-v> 8. The `GenTxid` class already exists. How does it already enforce type correctness, and how is it different from the approach in this PR?
16217:55 <stickies-v> (link: https://github.com/bitcoin/bitcoin/blob/dcfbf3c2107c3cb9d343ebfa0eee78278dea8d66/src/primitives/transaction.h#L425)
16317:58 <dberkelmans> It also contains a bool saying if the witness is included
16417:58 <dberkelmans> It's one type instead of 2
16517:58 <dberkelmans> So it doesn't make the code more readable
16617:59 <larryruane_> for one thing, an instance of a txid and a wtxid (contained within a GenTxid) compare as false, even if the hashes are true ... but there's more I'm sure
16717:59 <stickies-v> dberkelmans: if it doesn't make the code more readable, should we then replace it entirely with transaction_identifier?
16818:00 <BrandonOdiwuor88> I think GenTxid uses uint256 to store the hash and the bool m_is_wtxid to indicate whether its Wtxid or Txid
16918:00 <dberkelmans> I haven't seen enough code to say anything sensible about that
17018:01 <stickies-v> I think an important difference is that GenTxid allows type checking, by exposing the `IsWtxid()` function, but it's still on the user to do that checking
17118:01 <stickies-v> and more importantly, it happens at runtime
17218:01 <dberkelmans> I suppose if you only need the uint256 then having it in one type makes sense
17318:02 <larryruane_> stickies-v: so it's runtime checking rather than compile-time
17418:02 <stickies-v> with transaction_identifer, the type checking is enforced at compile time
17518:02 <stickies-v> larryruane_: yeah
17618:02 <stickies-v> dberkelmans: well one important use case that GenTxid solves is that in quite a few places we want to be able to take input that can be either a txid or a wtxid, as opposed to just one or the other
17718:03 <dergoegge> If we go for the type safe ids, my thinking was that we would replace `GenTxid` with `std::variant<Txid, Wtxid>`
17818:03 <stickies-v> actually, from my quick skimming it seems like *all* the places where we use GenTxid we want to be able to accept both types (which surprised me a little)
17918:03 <stickies-v> so, we're at time for this meeting
18018:03 <stickies-v> #endmeeting

Meeting 2

18117:00 <stickies-v> #startmeeting
18217:00 <dberkelmans> hi
18317:00 <maxedwards> hello!
18417:00 <vmammal> hi
18517:01 <glozow> hi
18617:01 <stickies-v> hello everyone! welcome back to the second meeting about https://bitcoincore.reviews/28107
18717:01 <maxedwards> looking forward to learning the answers to these questions from you experienced people!
18817:02 <stickies-v> yesterday we focused on questions 1-9 which dealt with the conceptual aspects of type safety and wtxid/txid
18917:02 <stickies-v> today we'll be diving more into the code of the PR, so expect a more technical discussion
19017:02 <stickies-v> maxedwards: ooh we're definitely all here to learn though! the discussion is the most important bit
19117:03 <lightlike> hi
19217:03 <stickies-v> if anyone's here lurking - feel free to say hi so we know you're here!
19317:04 <stickies-v> alright let's kick it off with the first question
19417:04 <stickies-v> 10. How is `transaction_identifier` able to subclass `uint256`, given that, in C++, integers are types and not classes?
19517:04 <maxedwards> I believe that uint256 is not a built in type
19617:05 <maxedwards> and just a normal class
19717:06 <vmammal> maxedwards +1
19817:06 <stickies-v> yes correct maxedwards! we have quite a few places that rely on 256bit hash digests so having a dedicated type for it seems helpful
19917:06 <maxedwards> `class uint160 : public base_blob<160>`
20017:06 <maxedwards> ignore sorry, wrong code!
20117:06 <lightlike> i think c++ doesn't have a 256 bit built-in type we could use - even if we wanted to.
20217:07 <maxedwards> `class uint256 : public base_blob<256>`
20317:07 <stickies-v> so can we assume that a uint256 behaves otherwise the same as, for example a uint64_t?
20417:07 <stickies-v> maxedwards: correct!
20517:07 <stickies-v> lightlike: i don't think so either
20617:08 <larryruane_> I don't think it supports addition (for example)
20717:09 <sipa> it used to, until arith_uint256 got split from uint256
20817:09 <stickies-v> larryruane_: yeah correct, it doesn't implement a lot of the integer operations. Actually, the only defined operators are `operator==`, `operator!=` and `operator<`
20917:09 <stickies-v> (which... may seem familiar if you've seen the `transaction_identifier` code
21017:10 <stickies-v> sipa: oh - any background on when/why that got split?
21117:10 <maxedwards> it looks like now it supports GetHex, SetHex and ToString
21217:11 <glozow> we don't want to do arithmetic operations on hash identifiers?
21317:11 <maxedwards> Can someone point me to where the == operator is set?
21417:12 <larryruane_> ah I just noticed `GetHex()` and `ToString()` are the same, I always wondered about that (should have just looked it up)
21517:12 <stickies-v> glozow: having a `hash256` type or something would be a bit more intuitive than making `uint256` not an int, too :-D but it's probably way less code churn that way
21617:12 <stickies-v> maxedwards: https://github.com/bitcoin/bitcoin/blob/6e5cf8e95391cd9a8bfc0c357e8a6f3963bad4b4/src/uint256.h#L56
21717:13 <stickies-v> launching the next one already but we can keep conversation on previous questions going
21817:13 <maxedwards> ah yes I see them, thanks
21917:13 <stickies-v> 11. Why does `transaction_identifier` subclass `uint256` instead of being a completely new type?
22017:13 <sipa> stickies-v: https://github.com/bitcoin/bitcoin/pull/5490
22117:14 <sipa> in short, because arithmetic operations on hashes make no sense - they only matter in the context of PoW, but hashes (which is what uint256 is effectively used for) are used for many more things than that
22217:14 <maxedwards> I think this was touched on yesterday and the suggestion was that you could migrate in stages
22317:14 <sipa> going back even further, all of uint256 and arith_uint256 was implemented using OpenSSL's bignum
22417:15 <stickies-v> ooh, and then it became its own type when moving to libsecp256k1?
22517:15 <sipa> no, that's all unrelated
22617:15 <larryruane_> we don't depend on OpenSSL at all any more, do we?
22717:15 <sipa> oh right, it was one of the things we depended on OpenSSL for, but there were many others
22817:16 <sipa> CScriptNum too was introduced to avoid bignum dependency in the script interpreter
22917:16 <stickies-v> maxedwards: and specifically, what is it about subclassing that allows us to migrate in stages?
23017:16 <sipa> the BIP70 payment protocol was another openssl dependency
23117:16 <sipa> and SSL-encrypted RPC is also something we used to have
23217:17 <maxedwards> I think that you could pass a wtxid to a function that takes a uint256
23317:18 <maxedwards> so you can leave a lot of code as is?
23417:19 <stickies-v> yeah, we can use implicit conversions to leave code that is expecting a `uint256` unchanged until it's an appropriate time to move it to a more strict `Txid` or `Wtxid` type
23517:19 <larryruane_> stickies-v: if B is a subclass of A, then you can pass a B to a function that takes an A and it works... you'd think that would be a type mismatch but it's not!
23617:19 <stickies-v> i've learned to stop thinking and rely on cppreference instead :-D
23717:20 <sipa> cppreference is great
23817:20 <larryruane_> but usually aren't some of A's methods virtual? (so B can have its own implementation?) Not true of Txid and uint256
23917:21 <stickies-v> `base_blob` (which is `uint256`'s parent) and `transaction_identifier` are templated classes
24017:21 <stickies-v> templated methods cannot be virtual
24117:21 <sipa> this is not a traditional situation where type distinction matters
24217:21 <sipa> because the operations on a Txid and WTxid are the same
24317:21 <larryruane_> stickies-v: +1 thanks didn't know that
24417:21 <larryruane_> sipa: +1 thanks
24517:21 <sipa> the goal isn't to catch runtime mismatches for operations that don't apply in a way we can figure out at compile time
24617:22 <maxedwards> even if there were overridden methods, you could still pass it couldn't you?
24717:22 <sipa> rather, it's us deliberately making two types that are operationally compatible into separate ones, to make sure one doesn't get used in the wrong place
24817:23 <stickies-v> sipa: it's not really the point larryruane_ was trying to make I think, but have templated virtual methods could be helpful here though, I think
24917:23 <stickies-v> atm we have to redefine all the `operator{==, !=, <}` methods on `transaction_identifier`
25017:23 <stickies-v> but with a `virtual Compare` method that could be avoided
25117:24 <sipa> i don't see how anything virtual is useful here
25217:25 <stickies-v> maxedwards: could you elaborate a bit on that?
25317:25 <sipa> you could use the curious self-referencial template trick to do the same thing without needing the runtime overhead of a virtual function call
25417:26 <sipa> template<T> class transaction_identifier { bool operator==(const T& other) { ... } }
25517:26 <sipa> class Txid : public transaction_identifier<Txid> { ... }
25617:26 <maxedwards> It wasn't specific to this codebase and more of a C++ question about passing a subclass to a function that takes the superclass but I can look it up myself to save time.
25717:27 <stickies-v> well but what I'm trying to avoid is having to redefine the `operator` methods on `transaction_identifier` because i think the ones on `uint256` work just fine, except for the fact that they call `a.Compare...` and we need to use the `transaction_identifier::Compare` method instead of the `uint256` one?
25817:28 <stickies-v> maxedwards: yeah my understanding is that would get implicitly converted to the superclass
25917:28 <sipa> stickies-v: you don't need virtual functions for that (and generally don't want to, as they have a performance cost)
26017:29 <sipa> the point is that these "distinct" Compare functions aren't actually operationally distinct, they just operate on different types
26117:31 <sipa> but i'll need to look at the actual code to be more concrete
26217:31 <stickies-v> maybe i'm confused by "operationally", but they are distinct though?
26317:31 <sipa> they're just comparing the bytes, no?
26417:31 <sipa> the distinction is in which types they accept
26517:31 <stickies-v> yes, beyond the static_assert they are identical
26617:31 <sipa> right, exactly
26717:32 <sipa> the static_assert doesn't exist at runtime
26817:32 <stickies-v> right okay, I see
26917:33 <larryruane_> i never thought a `static_assert` could be inside an `if`, that's cool
27017:33 <stickies-v> thanks, i'll look into the self referential template trick after the meeting, that looks curious!
27117:33 <sipa> larryruane_: in a normal if, it doesn't matter whether it's inside or outside
27217:33 <sipa> in a constexpr if, where one branch can be discarded at compile-time, things are different
27317:34 <larryruane_> yes exactly so i'm used to seeing it outside ... yes i wasn't aware of `constexpr if`
27417:34 <sipa> larryruane_: cppreference to the rescue :p
27517:35 <larryruane_> on my next extended vacation i'm going to curl up with cppreference
27617:36 <stickies-v> very interesting rabbit hole, but regardless i'll pop the next question already for folks dying to answer:
27717:36 <stickies-v> 13. [`transaction_identifier::Compare`] is defined to return a `int`, yet `reinterpret_cast<const uint256&>` is used in the return statement. Is this correct, and if so, why? Why `reinterpret_cast` instead of another cast, such as `static_cast`?
27817:36 <stickies-v> (link: https://github.com/bitcoin-core-review-club/bitcoin/commit/28f9ff3d5416407d3c6d5374f6fc3c2767f4a7f3#diff-f3d91988bf7d98529d638fa829dc3fab695be002305a7614eaaeb283d4fb8101R39)
27917:36 <maxedwards> I have no idea
28017:37 <stickies-v> hint: think of your order of operations
28117:38 <sipa> stickies-v: it's not converting the `int`, though, it's converting the `this` object on which the operation is invoked
28217:39 <stickies-v> sipa: bingo!
28317:39 <larryruane_> can you go over that again? I'm not following
28417:40 <stickies-v> we're just casting our `transaction_identifier` into a `uint256`, and then using the `uint256::Compare` method, which does indeed return an `int`
28517:40 <larryruane_> i think i see, if we didn't do the `reinterpret_cast` it would be a recursive call?
28617:41 <sipa> did you mean a recursive call?
28717:41 <stickies-v> larryruane_: in `return reinterpret_cast<const uint256&>(*this).Compare(other);`, I was at first confused as to why we were casting the return value into a `uint256` when the function signature specifies an `int` return type
28817:41 <stickies-v> until I realized that the `reinterpret_cast` is not operating on the return value of `Compare`, but on `(*this)`
28917:42 <larryruane_> well i saw the `.Compare()` which returns an int, so that didn't confuse me
29017:42 <stickies-v> but yes, without the cast it would be a recursive call
29117:42 <larryruane_> interesting, i just removed the `reinterpret_cast<const uint256&>` (from before (*this)) and it compiles
29217:42 <stickies-v> i guess i'm just easily bamboozled :(
29317:43 <larryruane_> well never mind, it probably compiles but will infinite-loop (until stack overflow) maybe?
29417:44 <sipa> larryruane_: indeed
29517:44 <larryruane_> (i was expecting it wouldn't compile)
29617:44 <sipa> recursive calls are not illegal
29717:44 <sipa> infinite loops however are, but the compiler may not be smart enough to figure out that this is one
29817:45 <stickies-v> so, why `reinterpret_cast` instead of another cast, such as `static_cast`?
29917:45 <sipa> or, if it smart enough to figure out this gives rise to an infinite loop, it could assume it'd never be invoked
30017:45 <larryruane_> stickies-v: more efficient? `reinterpret_cast` has no runtime overhead (but `static_cast` could)?
30117:46 <sipa> a static cast would construct a new uint256 object, maybe (not actually sure if that applies to reference types)
30217:47 <larryruane_> hmm changing that to `static_cast` compiles and the binary is exactly the same size (not sure if that means the code is the same tho)
30317:47 <vmammal> stickies-v stackoverflow says static cast is unnecessary when casting up an inheritance hierarchy toward the base class
30417:48 <maxedwards> larryruane_ interesting that it's the same size
30517:48 <stickies-v> larryruane_: interestingly, my laptop just crashed while trying to compile without the reinterpret_cast πŸ˜…
30617:49 <larryruane_> crashed, wow
30717:49 <stickies-v> Ran out of memory and couldn't recover
30817:49 <larryruane_> oh because it's constexpr, so the infinite recursion is at compile time!!
30917:49 <maxedwards> I wonder if the compiler is clever enough to know in this direction a static_cast could be implemented as a reinterpret_cast?
31017:49 <larryruane_> my system (ubuntu) doesn't
31117:50 <larryruane_> seems really bad that can crash your entire laptop (not just OOM the compiler)
31217:51 <stickies-v> i'm on macos. not 100% it's related but this is the first time it happened. will reproduce when the meeting's over
31317:52 <stickies-v> 14. Is the `has_witness` template parameter used for purposes other than type differentiation in `transaction_identifier`?
31417:52 <stickies-v> (link: https://github.com/bitcoin-core-review-club/bitcoin/commit/28f9ff3d5416407d3c6d5374f6fc3c2767f4a7f3#diff-f3d91988bf7d98529d638fa829dc3fab695be002305a7614eaaeb283d4fb8101R17)
31517:53 <vmammal> stickies-v no
31617:54 <sipa> larryruane_: the constexpr will not cause recursion at compile time
31717:54 <sipa> i think
31817:54 <stickies-v> vmammal: correct!
31917:54 <stickies-v> What are the limitations of this approach? Do you see alternatives?
32017:54 <vmammal> nit: does anyone else prefer the `is_witness` naming style over `has_witness` ?
32117:54 <larryruane_> by this approach, what do you mean, the entire PR? or something more specific?
32217:55 <stickies-v> I mean using `template <bool has_witness>`
32317:55 <maxedwards> vmammal: no because it's more than just the witness
32417:56 <larryruane_> i guess an alternative would be to have `Txid` and `Wtxid` both subclass `uint256` directly ... but that would duplicate code
32517:56 <stickies-v> larryruane_: yeah that would probably not pass review :-D
32617:56 <vmammal> maxedwards fair
32717:57 <vmammal> is there a performance cost to templating?
32817:57 <larryruane_> vmammal: I think i prefer the existing `has_witness` because a transaction ID doesn't include a witness at all ever
32917:57 <stickies-v> vmammal: compile time yes, runtime i don't think so?
33017:58 <sipa> templating is effectively equivalent to the compiling generating the code for you
33117:58 <larryruane_> vmammal: i guess very indirectly, because templating can lead to executable bloat (if the alternative isn't duplicating code), and that could lead to more code cache misses
33217:59 <stickies-v> one thought I had (but it's almost definitely premature optimization) is that it's not guaranteed that we'll never have more than 2 types of tx identifiers, so then templating based on e.g. an enum instead of a bool could make sense
33318:00 <larryruane_> the C codebase i used to work on heavily used c-preprocessor templating, which was actually pretty cool! (so c++ isn't needed to implement this concept) ... it confused IDEs however
33418:00 <stickies-v> and i think when we don't have `transaction_identifier` subclass `uint256` (but wrap it instead) it'll be feasible to avoid templating too, but i didn't fully work that out either so could be wrong
33518:01 <stickies-v> alright that's all the time we have for today folks
33618:01 <stickies-v> #endmeeting