libs)

Oct 4, 2023

https://github.com/bitcoin/bitcoin/pull/28107

Host: stickies-v - PR author: dergoegge

The PR branch HEAD was e74152d5b48bdca8dfc2217f35842dbb24b2746c at the time of this review club meeting.

Notes

Since the activation of SegWit, transactions are identifiable by two types of identifiers: txid and wtxid. Both identifiers are a uint256 resulting from a double sha256 hash of serialized transaction data. The core difference is that wtxid includes the witness data, while txid does not.
Type-safety is a concept in programming to help avoid type errors, enforcing strict correspondence between a variable and its type. It ensures that a program performs only legal operations on data, with validation occurring during compile-time. This proactive error detection and prevention lead to more reliable and maintainable software, minimizing unintended behavior and vulnerabilities. An example of an earlier type-safety improvement in Bitcoin Core is the incorporation of std::chrono, introduced in #16908.
In the context, of transaction identifiers, type-safety is important because it is impossible to deduce the type difference from just the uint256 value. With GenTxid type checking can be done, but it is more verbose and done at run-time instead of at compile-time. Additionally, transaction identifier types improve readability of code by being explicit about expected types instead of solely relying on clear variable naming.
This PR aims to improve type safety by introducing the new transaction_identifier type, with Txid and Wtxid typedefs for brevity.
There are instances where wtxid and txid are treated identically, such as for example in most_recent_block_txs, where the uint256 type is used to handle this. In other contexts such as CTxMemPool::info where wtxid and txid are treated differently but either is accepted, the GenTxid type is adopted.

Questions

Concept

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
What does it mean for a transaction identifier to be type-safe? Why is that important or helpful? Are there any downsides?
Why is it better to enforce types at compile-time instead of at run-time?
Conceptually, when writing new code that requires referring to transactions, when should you use txid and when should you use wtxid? Can you point to any examples in the code where using one instead of the other could be very bad? (1 point per example)
In which concrete way(s) could using transaction_identifier instead of uint256 help find existing bugs or prevent the introduction of new ones? On the other hand, could this change introduce new bugs?
SegWit activated at block height 481,824. Do transactions in blocks < 481,824 also have a wtxid? For example, what about transaction with txid dc01120a639283c11eeada6ce540178adfbbca7fe0d2f41076367cbc9e901822?
Are there any places where a (w)txid is returned as a uint256 where it would not make sense to return a transaction_identifier instead?
The GenTxid class already exists. How does it already enforce type correctness, and how is it different from the approach in this PR?
Are there any places where GenTxid is used but using transaction_identifier would be better?

Code Review

How is transaction_identifier able to subclass uint256, given that, in C++, integers are types and not classes?
Why does transaction_identifier subclass uint256 instead of being a completely new type?
Txid and Wtxid are typedefs for transaction_identifier<has_witness>. Do you see any reason for code to directly use the transaction_identifier type? Should that be forbidden, and if so, how?
transaction_identifier::Compare is defined to return a int, yet reinterpret_cast<const uint256&> is used in the return statement. Is this correct, and if so, why? Why reinterpret_cast instead of another cast, such as static_cast?
Is the has_witness template parameter used for purposes other than type differentiation in transaction_identifier? What are the limitations of this approach? Do you see alternatives?
Why is TxOrphanage a good place to start introducing type-safe transaction ids?
Consider return m_wtxid_to_orphan_it.count(Wtxid::FromUint256(gtxid.GetHash()));. Why is it not type-safe to allow Wtxid to be constructed by just passing a uint256 to its constructor? Specifically, why doesn’t Wtxid{gtxid.GetHash()} compile?

Meeting Log

Meeting 1

17:00

<stickies-v> #startmeeting

17:00

<dberkelmans> hi

17:00

<dergoegge> hi

17:00

<BrandonOdiwuor> Hello

17:01

<lightlike> hi

17:01

<larryruane_> hi

17:01

<vmammal> first timer here. im here to learn

17:01

<stickies-v> welcome everyone! Today we're looking at #28107, authored by dergoegge . The notes and questions are available on https://bitcoincore.reviews/28107

17:02

<stickies-v> great, glad you're joining us vmammal! any other first timers today?

17:02

<Murch[m]> Hey

17:02

<stickies-v> even if you're just lurking, feel free to say hi!

17:02

<sipa> hi

17:02

<larryruane_> will we be back here tomorrow? are we covering the first set of questions today (concept)?

17:03

<stickies-v> correct larryruane_ ! same time tomorrow for the second batch of questions

17:03

<glozow> hi

17:03

<effexzi> Hi every1

17:03

<maxedwards> hi

17:03

<stickies-v> who got the chance to review the PR or read the notes? (y/n)

17:03

<Murch[m]> n

17:03

<maxedwards> y

17:03

<BrandonOdiwuor> y

17:03

<lightlike> y

17:03

<dberkelmans62> y

17:03

<abubakarsadiq> hi

17:03

<glozow> woohooo welcome newcomers!

17:03

<glozow> y

17:04

<stickies-v> alright wonderful, let's dive into the wonderfully wholesome world of type-safety

17:05

<larryruane_> i heard someone say years ago, "strong types are for weak minds" (haha I don't agree!)

17:05

<stickies-v> for those of you who were able to review, would you give it a Concept ACK, Approach ACK, Tested ACK, or NACK? what was your review approach?

17:05

<vmammal> concept ACK

17:06

<larryruane_> i'm still reviewing

17:06

<maxedwards> Concept ACK

17:06

<BrandonOdiwuor> Concept ACK

17:06

<abubakarsadiq> Concept ACK, did not do code review but I think this is very useful improvement, not only that uint256 can be block hask, header, txid and wtxid. It will be nice to distinguish between wtxid and txid with type. Because there is ambiguity in the code whereby hash is referred to txid, wtxid and blockhash. Using type specifier for wtxid and txid will help distinguish even if an ambiguous variable name is used

17:07

<maxedwards> built and ran tests, read through the code, looked at the super class of uint256

17:07

<stickies-v> abubakarsadiq: I agree! compile-time checks (which we'll get to in a bit) are a great improvement on their own, but just improving code readability is quite nice too imo

17:08

<stickies-v> so, diving into the conceptual side of things a bit

17:08

<stickies-v> 2. What does it mean for a transaction identifier to be type-safe? Why is that important or helpful? Are there any downsides?

17:08

<larryruane_> before I approach ACK, I'm trying to figure out if it would be better for a `Txid` and a `Wtxid` to *inheret* from `uint256`, or for them to simply *include* a `uint256` (and nothing more, you could call this wrapping)

17:09

<BrandonOdiwuor> A 'transaction identifier' is considered 'type-safe' when it undergoes rigorous type-checking during compile-time, guaranteeing that operations involving these identifiers are thoroughly validated and constrained to their intended data types before the program is executed, thus preventing type-related errors and ensuring program reliability.

17:09

<BrandonOdiwuor> Some advantages include: Early detection of type-errors at compile time and preventing runtime errors

17:10

<larryruane_> type safe means the two kinds of transaction IDs have distinct types (instead of both being `uint256`), so trying to assign one type to a variable of another type (for example) is like trying to assign a `float` to an `int`

17:10

<stickies-v> larryruane_: I have had similar thoughts. I think inheritance makes the transition easier, i.e. if we don't want to update all places that use a transaction identifier

17:10

<stickies-v> but ignoring that it seems like just having a `uint256` member seems like it would simplify the implementation quite a bit

17:11

<larryruane_> @stickies-v thanks, that's my suspicion too, but wasn't sure, want to investigate further

17:12

<abubakarsadiq> there are two types of transaction identifiers in post segwit bitcoin, txid and wtxid.

17:12

<abubakarsadiq> To avoid ambiguity and logic error where both are interchanged, it will be better if they are distinct types

17:12

<abubakarsadiq> The code won't compile if one is used in the place of another.

17:13

<larryruane_> a C codebase i worked on years ago had a really cool concept of "wrapped" integers, so you couldn't assign a block count (this was a storage system) to a byte count, etc. .... so similar goal as here, but it used the wrapped approach (since there's no inheretence in C)

17:13

<stickies-v> BrandonOdiwuor: would you say the error catching happens mostly at compile time or at runtime?

17:13

<maxedwards> can you elaborate on the simplicity enabled by having an internal uint256 vs subclassing?

17:13

<stickies-v> larryruane_: is type safety guaranteed by just having two distinct types, though?

17:14

<BrandonOdiwuor94> stickies-v it would be better to catch the errors at compile-time rather than do runtime assertions to catch the errors

17:14

<larryruane_> no, a block count type was a C struct with just one member, conventionally called `i` (for `integer`), and the byte count was a different C struct (also with just `i`) ... so you couldn't assign one to the other for example

17:15

<maxedwards> simply providing the types is not enough, they have to be consumed

17:15

<Murch[m]> I assume that with inheritance assignments might permit to autoconvert between the types without complaining

17:15

<stickies-v> BrandonOdiwuor94: yeah, catching stuff at compile time is great - makes it quick for the developer to spot their error as they are developing, as opposed to relying on writing extensive test suites etc (that still may miss things) to catch things in runtime

17:15

<larryruane_> i.e. so it wasn't just two typedefs (which does NOT give type safety)

17:16

<stickies-v> larryruane_: i don't just mean typedefs: in a naive implementation, just having Txid and Wtxid each subclass uint256 does not provide type safety on its own, as e.g. we can still do comparisons between them

17:16

<larryruane_> yeah trying to catch things are runtime is tricky, you might never happen to go through the buggy code path! (but for sure one of our users will!)

17:17

<stickies-v> well, it does provide some type safety, just not entirely

17:17

<stickies-v> Murch[m]: yes exactly@

17:17

<maxedwards> Murch[m]: I believe there is a comment about doing just that in the PR

17:18

<stickies-v> maxedwards: so the complexity that I was talking about is that currently we have to specify which kinds of comparisons etc we allow, because currently both `Txid` and `Wtxid` inherit from `uint256`

17:19

<stickies-v> I guess we already largely covered the next question but I'll throw it in here regardless in case anyone has something to add:

17:19

<larryruane_> to me, with my not-great understanding of subclassing, the wrapping approach seems simpler and safer -- two classes both with just one member, a `uint256` (like we did with the wrapped integers in C) ... but never mind, we can move on, sorry to distrupt

17:19

<stickies-v> 3. Why is it better to enforce types at compile-time instead of at run-time?

17:19

<glozow> find bug moar fast.

17:19

<BrandonOdiwuor> Early Error Detection: Compile-time checking allows for the identification of type-related errors and mismatches at compile time before the program is even executed. This means that issues are caught during the development and testing phase, reducing the chances of runtime errors and unexpected program behavior.

17:20

<larryruane_> stickies-v: kind of answered already, but it prevents coding mistakes where the wrong kind of ID is used (txid instead of wtxid or the reverse)

17:20

<stickies-v> larryruane_: but with wrapping, would you still be able to not update all callsites of functions that return a Txid/Wtxid right away?

17:20

<stickies-v> I think there's merit to allowing this to be a gradual improvement

17:20

<stickies-v> glozow: much accurate

17:20

<larryruane_> @stickies-v yes very true! that's an important consideration

17:21

<stickies-v> BrandonOdiwuor: yeah it's generally much quicker

17:22

<BrandonOdiwuor> I saw this comment on the notes (https://github.com/bitcoin/bitcoin/pull/28031#discussion_r1261648754) I think it would help with this

17:23

<glozow> yeah I write this bug approximately once every 2 weeks

17:23

<maxedwards> assuming less than perfect test coverage and a bug, a runtime check could end up throwing on production

17:23

<maxedwards> worst case and I'm sure with the scrutiny of bitcoin it would be caught in an earlier stage

17:24

<glozow> Usually I catch it before I push a PR tho. always need to have a "try 2 transactions with same txid but diff witness" test

17:24

<glozow> catching at compile time would be nice

17:25

<stickies-v> maxedwards: even if it can be tested, being able to remove tests because it's checked at compile time is a big win

17:25

<abubakarsadiq> glozow: with this you dont have to do that right \o/?

17:25

<stickies-v> launching the next question already - but as always, feel free to continue the conversation on previous questions too!

17:25

<stickies-v> 4. Conceptually, when writing new code that requires referring to transactions, when should you use `txid` and when should you use `wtxid`? Can you point to any examples in the code where using one instead of the other could be very bad? (1 point per example)

17:26

<lightlike> seems like testing this would still be useful. having type safety won't prevent us from using the wrong type of id (but consistently) in the first place.

17:27

<larryruane_> stickies-v: not sure if this is what you're asking, but two pure segwit transactions with the same txid (even if different wtxids) have the same *effect* (modify the UTXO set the same way)

17:27

<larryruane_> lightlike: +1

17:27

<glozow> lightlike +1

17:28

<larryruane_> (pure meaning *all* the inputs are segwit)

17:28

<glozow> this is a good example imo: https://github.com/bitcoin/bitcoin/blob/3cd02806ecd2edd08236ede554f1685866625757/src/net_processing.cpp#L4334

17:28

<sipa> larryruane_: two non-segwit transactions with the same txid also have the same effect :p

17:28

<glozow> if you use txid, that means I can censor any tx I want to you by sending you a version with a mutated witness first

17:29

<stickies-v> glozow: 1 point!

17:30

<larryruane_> sipa: oh right... two tx with different txids could have the same effect (i think??)

17:30

<larryruane_> (that's malleability)

17:30

<sipa> larryruane_: yes

17:30

<sipa> the same txid implies the same effect, but different txid does not imply different effect

17:31

<lightlike> so "when in doubt use wtxid" is a good role of thumb i guess?

100

17:31

<stickies-v> so even if you haven't found any examples, conceptually - what do you think is a good heuristic for when to use wtxid instead of txid, or txid instead of wtxid?

101

17:32

<stickies-v> lightlike: that seems a reasonable starting point :-D

102

17:33

<glozow> pretty much always wtxid unless it's prevout-related

103

17:33

<sipa> or tx relay for pre-BIP339 peers

104

17:33

<larryruane_> stickies-v: in the mempool, we definitely want to use txids as keys, because if we used wtxid, two tx with the same txid could be there simultaneously, and if that happened, an output from both of them could be spent separately, leading to double-spend bug?

105

17:34

<stickies-v> glozow: yeah, and also almost always wtxid when it's about unconfirmed txs?

106

17:35

<larryruane_> stickies-v: that sounds opposite to what i said? (but i may be wrong)

107

17:36

<abubakarsadiq> we could have identical transaction with multiple wtxid's using wtxid almost everytime, will that not bring duplication especially for unconfirmed txs?

108

17:36

<stickies-v> larryruane_: well i think gloria covered that already by it being prevout related?

109

17:36

<sipa> the mempool is indexed by both txid and wtxid, because we need to be able to look up by both

110

17:36

<glozow> unsure if unconfirmed txs are special. But one distinction that springs to mind is you should never try to cache feerate to a txid

111

17:37

<glozow> I think larryruane's point might be that having an index by txid can be a useful way to bake in the assumption that no 2 transactions in the data structure should have the same txid

112

17:37

<larryruane_> sipa: gotcha, but we definitely don't want to allow > 1 tx with the same txid

113

17:38

<maxedwards> am I right in thinking that in theory this same/different effect shouldn't be a concern for this PR as the existing code should already be using the correct id, it's just that it's represented by the same type?

114

17:39

<lightlike> larryruane: we don't want conflicting transactions in general, even if they don't share the txid.

115

17:39

<sipa> my guess is actually that if we dropped the unicity of the txid index in the mempool, nothing would break, as different-wtxid-same-txid transactions would still just conflict with each other

116

17:39

<sipa> the unicity just allows enforcing that constraint independently

117

17:39

<sipa> uniqueness? unicity?

118

17:40

<stickies-v> maxedwards: this is more conceptual discussion about txid/wtxid indeed and not directly critical for the PR

119

17:40

<larryruane_> lightlike: oh yes that's right, as glozow said, the prevout is txid-based so would be the same

120

17:40

<sipa> (tangent, english stackexchange says "I suspect most people would take "Unicity" to be a town where everyone rides unicycles.")

121

17:40

<maxedwards> stickies-v: very interesting still

122

17:41

<larryruane_> i took bicycle riding lessons but ran out of money halfway through, so now i can only ride a unicycle

123

17:41

<glozow> sipa: tbh I would have thought of it as the british equivalent of "college town"

124

17:41

<glozow> larryruane: are there 100 seats on 100 unicycles...

125

17:42

<sipa> :D

126

17:42

<stickies-v> alright time to move on 😅

127

17:42

<sipa> glozow: and they ride one by one into a room with 100 boxes, ...

128

17:42

<stickies-v> 5. In which concrete way(s) could using `transaction_identifier` instead of `uint256` help find existing bugs or prevent the introduction of new ones? On the other hand, could this change introduce new bugs?

129

17:42

<glozow> 😂

130

17:43

<larryruane_> stickies-v: is it because lots of other things besides txids and wtxids can be uint256 type? so using `transaction_identifier` would exclude those accidental uses?

131

17:45

<BrandonOdiwuor88> larryruane_ I think it would help prevent mixing up using Txid and Wtxid unintentionally such as the comment that was linked in the notes(https://github.com/bitcoin/bitcoin/pull/28031#discussion_r1261648754)

132

17:45

<stickies-v> larryruane_: yup. a function that takes a `uint256 hash` as an argument can specify in the docs that `hash` is supposed to be a wtxid, but if a developer (and the reviewers) don't read that properly then it's trivial to pass a txid and potentially (probably) break things

133

17:46

<glozow> fun anecdote related to this: one time while working in wallet code, I saw a variable named `wtxid` and assumed it was a witness ID. nope, it stood for "wallet transaction id"

134

17:47

<stickies-v> on the other hand - does having this explicit type potentially introduce other, new bugs? can you think of any?

135

17:47

<stickies-v> glozow: oooh yeah those are sneaky

136

17:47

<maxedwards> can you pass a uint256 to a function that asks for a txid?

137

17:47

<maxedwards> or do you have to cast?

138

17:47

<larryruane_> i just noticed that there are 1687 occurrences of `uint256` in our code base -- many (maybe most) of which are not transaction ID related!

139

17:49

<dergoegge> maxedwards: you would have to use `Txid::FromUint256`

140

17:49

<glozow> maxedwards: try and see if it compiles!

141

17:50

<stickies-v> maxedwards: see also question 16 that we'll be dealing with tomorrow!

142

17:50

<lightlike> dergoegge: did you try to change it everywhere, just to see how big the effort is?

143

17:50

<maxedwards> even with this, would it not be possible to change a parameter to a txid when it should be a wtxid and everything would still work?

144

17:50

<maxedwards> the uint256 was representing a wtxid

145

17:51

<maxedwards> but the type isn't actually enforcing that it is

146

17:51

<maxedwards> and could lead to much confusion

147

17:51

<maxedwards> even more confusion than had it been left as a uint256

148

17:51

<BrandonOdiwuor88> maxedwards I think the compiler would throw an error you have to cast explicitly with reinterpret_cast

149

17:51

<maxedwards> yes so I'm assuming you do cast

150

17:52

<stickies-v> maxedwards: what do you mean with change a parameter? the function signature?

151

17:52

<maxedwards> but there's nothing stopping you from casting a uint256 that represented a wtxid to a txid is there?

152

17:52

<maxedwards> yes function signature

153

17:52

<stickies-v> well obviously the function signature has to be correct, if you need a wtxid and have your function signature take a `Txid txid` then that's going to lead to problems

154

17:53

<maxedwards> not compile time problems though is my question

155

17:53

<dergoegge> lightlike: no haven't tried, my guess would be that it is a bunch of effort but also not to crazy

156

17:54

<stickies-v> it depends on the callsites of course. sorry i'm not sure i fully understand your question - type-safety doesn't prevent a developer from writing wrong code, indeed

157

17:55

<vmammal> dergoegge do you plan to continue the refactor in a future PR?

158

17:55

<dergoegge> `Txid::FromUint256(Wtxid{})` might actually work because `uint256 a = Wtxid{}` is allowed.

159

17:55

<stickies-v> gonna skip a few questions as we're running out of time

160

17:55

<dergoegge> vmammal: yes but I'd also be happy to see other people going for it

161

17:55

<stickies-v> 8. The `GenTxid` class already exists. How does it already enforce type correctness, and how is it different from the approach in this PR?

162

17:55

<stickies-v> (link: https://github.com/bitcoin/bitcoin/blob/dcfbf3c2107c3cb9d343ebfa0eee78278dea8d66/src/primitives/transaction.h#L425)

163

17:58

<dberkelmans> It also contains a bool saying if the witness is included

164

17:58

<dberkelmans> It's one type instead of 2

165

17:58

<dberkelmans> So it doesn't make the code more readable

166

17:59

<larryruane_> for one thing, an instance of a txid and a wtxid (contained within a GenTxid) compare as false, even if the hashes are true ... but there's more I'm sure

167

17:59

<stickies-v> dberkelmans: if it doesn't make the code more readable, should we then replace it entirely with transaction_identifier?

168

18:00

<BrandonOdiwuor88> I think GenTxid uses uint256 to store the hash and the bool m_is_wtxid to indicate whether its Wtxid or Txid

169

18:00

<dberkelmans> I haven't seen enough code to say anything sensible about that

170

18:01

<stickies-v> I think an important difference is that GenTxid allows type checking, by exposing the `IsWtxid()` function, but it's still on the user to do that checking

171

18:01

<stickies-v> and more importantly, it happens at runtime

172

18:01

<dberkelmans> I suppose if you only need the uint256 then having it in one type makes sense

173

18:02

<larryruane_> stickies-v: so it's runtime checking rather than compile-time

174

18:02

<stickies-v> with transaction_identifer, the type checking is enforced at compile time

175

18:02

<stickies-v> larryruane_: yeah

176

18:02

<stickies-v> dberkelmans: well one important use case that GenTxid solves is that in quite a few places we want to be able to take input that can be either a txid or a wtxid, as opposed to just one or the other

177

18:03

<dergoegge> If we go for the type safe ids, my thinking was that we would replace `GenTxid` with `std::variant<Txid, Wtxid>`

178

18:03

<stickies-v> actually, from my quick skimming it seems like *all* the places where we use GenTxid we want to be able to accept both types (which surprised me a little)

179

18:03

<stickies-v> so, we're at time for this meeting

180

18:03

<stickies-v> #endmeeting

Meeting 2

181

17:00

<stickies-v> #startmeeting

182

17:00

<dberkelmans> hi

183

17:00

<maxedwards> hello!

184

17:00

<vmammal> hi

185

17:01

<glozow> hi

186

17:01

<stickies-v> hello everyone! welcome back to the second meeting about https://bitcoincore.reviews/28107

187

17:01

<maxedwards> looking forward to learning the answers to these questions from you experienced people!

188

17:02

<stickies-v> yesterday we focused on questions 1-9 which dealt with the conceptual aspects of type safety and wtxid/txid

189

17:02

<stickies-v> today we'll be diving more into the code of the PR, so expect a more technical discussion

190

17:02

<stickies-v> maxedwards: ooh we're definitely all here to learn though! the discussion is the most important bit

191

17:03

<lightlike> hi

192

17:03

<stickies-v> if anyone's here lurking - feel free to say hi so we know you're here!

193

17:04

<stickies-v> alright let's kick it off with the first question

194

17:04

<stickies-v> 10. How is `transaction_identifier` able to subclass `uint256`, given that, in C++, integers are types and not classes?

195

17:04

<maxedwards> I believe that uint256 is not a built in type

196

17:05

<maxedwards> and just a normal class

197

17:06

<vmammal> maxedwards +1

198

17:06

<stickies-v> yes correct maxedwards! we have quite a few places that rely on 256bit hash digests so having a dedicated type for it seems helpful

199

17:06

<maxedwards> `class uint160 : public base_blob<160>`

200

17:06

<maxedwards> ignore sorry, wrong code!

201

17:06

<lightlike> i think c++ doesn't have a 256 bit built-in type we could use - even if we wanted to.

202

17:07

<maxedwards> `class uint256 : public base_blob<256>`

203

17:07

<stickies-v> so can we assume that a uint256 behaves otherwise the same as, for example a uint64_t?

204

17:07

<stickies-v> maxedwards: correct!

205

17:07

<stickies-v> lightlike: i don't think so either

206

17:08

<larryruane_> I don't think it supports addition (for example)

207

17:09

<sipa> it used to, until arith_uint256 got split from uint256

208

17:09

<stickies-v> larryruane_: yeah correct, it doesn't implement a lot of the integer operations. Actually, the only defined operators are `operator==`, `operator!=` and `operator<`

209

17:09

<stickies-v> (which... may seem familiar if you've seen the `transaction_identifier` code

210

17:10

<stickies-v> sipa: oh - any background on when/why that got split?

211

17:10

<maxedwards> it looks like now it supports GetHex, SetHex and ToString

212

17:11

<glozow> we don't want to do arithmetic operations on hash identifiers?

213

17:11

<maxedwards> Can someone point me to where the == operator is set?

214

17:12

<larryruane_> ah I just noticed `GetHex()` and `ToString()` are the same, I always wondered about that (should have just looked it up)

215

17:12

<stickies-v> glozow: having a `hash256` type or something would be a bit more intuitive than making `uint256` not an int, too :-D but it's probably way less code churn that way

216

17:12

<stickies-v> maxedwards: https://github.com/bitcoin/bitcoin/blob/6e5cf8e95391cd9a8bfc0c357e8a6f3963bad4b4/src/uint256.h#L56

217

17:13

<stickies-v> launching the next one already but we can keep conversation on previous questions going

218

17:13

<maxedwards> ah yes I see them, thanks

219

17:13

<stickies-v> 11. Why does `transaction_identifier` subclass `uint256` instead of being a completely new type?

220

17:13

<sipa> stickies-v: https://github.com/bitcoin/bitcoin/pull/5490

221

17:14

<sipa> in short, because arithmetic operations on hashes make no sense - they only matter in the context of PoW, but hashes (which is what uint256 is effectively used for) are used for many more things than that

222

17:14

<maxedwards> I think this was touched on yesterday and the suggestion was that you could migrate in stages

223

17:14

<sipa> going back even further, all of uint256 and arith_uint256 was implemented using OpenSSL's bignum

224

17:15

<stickies-v> ooh, and then it became its own type when moving to libsecp256k1?

225

17:15

<sipa> no, that's all unrelated

226

17:15

<larryruane_> we don't depend on OpenSSL at all any more, do we?

227

17:15

<sipa> oh right, it was one of the things we depended on OpenSSL for, but there were many others

228

17:16

<sipa> CScriptNum too was introduced to avoid bignum dependency in the script interpreter

229

17:16

<stickies-v> maxedwards: and specifically, what is it about subclassing that allows us to migrate in stages?

230

17:16

<sipa> the BIP70 payment protocol was another openssl dependency

231

17:16

<sipa> and SSL-encrypted RPC is also something we used to have

232

17:17

<maxedwards> I think that you could pass a wtxid to a function that takes a uint256

233

17:18

<maxedwards> so you can leave a lot of code as is?

234

17:19

<stickies-v> yeah, we can use implicit conversions to leave code that is expecting a `uint256` unchanged until it's an appropriate time to move it to a more strict `Txid` or `Wtxid` type

235

17:19

<larryruane_> stickies-v: if B is a subclass of A, then you can pass a B to a function that takes an A and it works... you'd think that would be a type mismatch but it's not!

236

17:19

<stickies-v> i've learned to stop thinking and rely on cppreference instead :-D

237

17:20

<sipa> cppreference is great

238

17:20

<larryruane_> but usually aren't some of A's methods virtual? (so B can have its own implementation?) Not true of Txid and uint256

239

17:21

<stickies-v> `base_blob` (which is `uint256`'s parent) and `transaction_identifier` are templated classes

240

17:21

<stickies-v> templated methods cannot be virtual

241

17:21

<sipa> this is not a traditional situation where type distinction matters

242

17:21

<sipa> because the operations on a Txid and WTxid are the same

243

17:21

<larryruane_> stickies-v: +1 thanks didn't know that

244

17:21

<larryruane_> sipa: +1 thanks

245

17:21

<sipa> the goal isn't to catch runtime mismatches for operations that don't apply in a way we can figure out at compile time

246

17:22

<maxedwards> even if there were overridden methods, you could still pass it couldn't you?

247

17:22

<sipa> rather, it's us deliberately making two types that are operationally compatible into separate ones, to make sure one doesn't get used in the wrong place

248

17:23

<stickies-v> sipa: it's not really the point larryruane_ was trying to make I think, but have templated virtual methods could be helpful here though, I think

249

17:23

<stickies-v> atm we have to redefine all the `operator{==, !=, <}` methods on `transaction_identifier`

250

17:23

<stickies-v> but with a `virtual Compare` method that could be avoided

251

17:24

<sipa> i don't see how anything virtual is useful here

252

17:25

<stickies-v> maxedwards: could you elaborate a bit on that?

253

17:25

<sipa> you could use the curious self-referencial template trick to do the same thing without needing the runtime overhead of a virtual function call

254

17:26

<sipa> template<T> class transaction_identifier { bool operator==(const T& other) { ... } }

255

17:26

<sipa> class Txid : public transaction_identifier<Txid> { ... }

256

17:26

<maxedwards> It wasn't specific to this codebase and more of a C++ question about passing a subclass to a function that takes the superclass but I can look it up myself to save time.

257

17:27

<stickies-v> well but what I'm trying to avoid is having to redefine the `operator` methods on `transaction_identifier` because i think the ones on `uint256` work just fine, except for the fact that they call `a.Compare...` and we need to use the `transaction_identifier::Compare` method instead of the `uint256` one?

258

17:28

<stickies-v> maxedwards: yeah my understanding is that would get implicitly converted to the superclass

259

17:28

<sipa> stickies-v: you don't need virtual functions for that (and generally don't want to, as they have a performance cost)

260

17:29

<sipa> the point is that these "distinct" Compare functions aren't actually operationally distinct, they just operate on different types

261

17:31

<sipa> but i'll need to look at the actual code to be more concrete

262

17:31

<stickies-v> maybe i'm confused by "operationally", but they are distinct though?

263

17:31

<sipa> they're just comparing the bytes, no?

264

17:31

<sipa> the distinction is in which types they accept

265

17:31

<stickies-v> yes, beyond the static_assert they are identical

266

17:31

<sipa> right, exactly

267

17:32

<sipa> the static_assert doesn't exist at runtime

268

17:32

<stickies-v> right okay, I see

269

17:33

<larryruane_> i never thought a `static_assert` could be inside an `if`, that's cool

270

17:33

<stickies-v> thanks, i'll look into the self referential template trick after the meeting, that looks curious!

271

17:33

<sipa> larryruane_: in a normal if, it doesn't matter whether it's inside or outside

272

17:33

<sipa> in a constexpr if, where one branch can be discarded at compile-time, things are different

273

17:34

<larryruane_> yes exactly so i'm used to seeing it outside ... yes i wasn't aware of `constexpr if`

274

17:34

<sipa> larryruane_: cppreference to the rescue :p

275

17:35

<larryruane_> on my next extended vacation i'm going to curl up with cppreference

276

17:36

<stickies-v> very interesting rabbit hole, but regardless i'll pop the next question already for folks dying to answer:

277

17:36

<stickies-v> 13. [`transaction_identifier::Compare`] is defined to return a `int`, yet `reinterpret_cast<const uint256&>` is used in the return statement. Is this correct, and if so, why? Why `reinterpret_cast` instead of another cast, such as `static_cast`?

278

17:36

<stickies-v> (link: https://github.com/bitcoin-core-review-club/bitcoin/commit/28f9ff3d5416407d3c6d5374f6fc3c2767f4a7f3#diff-f3d91988bf7d98529d638fa829dc3fab695be002305a7614eaaeb283d4fb8101R39)

279

17:36

<maxedwards> I have no idea

280

17:37

<stickies-v> hint: think of your order of operations

281

17:38

<sipa> stickies-v: it's not converting the `int`, though, it's converting the `this` object on which the operation is invoked

282

17:39

<stickies-v> sipa: bingo!

283

17:39

<larryruane_> can you go over that again? I'm not following

284

17:40

<stickies-v> we're just casting our `transaction_identifier` into a `uint256`, and then using the `uint256::Compare` method, which does indeed return an `int`

285

17:40

<larryruane_> i think i see, if we didn't do the `reinterpret_cast` it would be a recursive call?

286

17:41

<sipa> did you mean a recursive call?

287

17:41

<stickies-v> larryruane_: in `return reinterpret_cast<const uint256&>(*this).Compare(other);`, I was at first confused as to why we were casting the return value into a `uint256` when the function signature specifies an `int` return type

288

17:41

<stickies-v> until I realized that the `reinterpret_cast` is not operating on the return value of `Compare`, but on `(*this)`

289

17:42

<larryruane_> well i saw the `.Compare()` which returns an int, so that didn't confuse me

290

17:42

<stickies-v> but yes, without the cast it would be a recursive call

291

17:42

<larryruane_> interesting, i just removed the `reinterpret_cast<const uint256&>` (from before (*this)) and it compiles

292

17:42

<stickies-v> i guess i'm just easily bamboozled :(

293

17:43

<larryruane_> well never mind, it probably compiles but will infinite-loop (until stack overflow) maybe?

294

17:44

<sipa> larryruane_: indeed

295

17:44

<larryruane_> (i was expecting it wouldn't compile)

296

17:44

<sipa> recursive calls are not illegal

297

17:44

<sipa> infinite loops however are, but the compiler may not be smart enough to figure out that this is one

298

17:45

<stickies-v> so, why `reinterpret_cast` instead of another cast, such as `static_cast`?

299

17:45

<sipa> or, if it smart enough to figure out this gives rise to an infinite loop, it could assume it'd never be invoked

300

17:45

<larryruane_> stickies-v: more efficient? `reinterpret_cast` has no runtime overhead (but `static_cast` could)?

301

17:46

<sipa> a static cast would construct a new uint256 object, maybe (not actually sure if that applies to reference types)

302

17:47

<larryruane_> hmm changing that to `static_cast` compiles and the binary is exactly the same size (not sure if that means the code is the same tho)

303

17:47

<vmammal> stickies-v stackoverflow says static cast is unnecessary when casting up an inheritance hierarchy toward the base class

304

17:48

<maxedwards> larryruane_ interesting that it's the same size

305

17:48

<stickies-v> larryruane_: interestingly, my laptop just crashed while trying to compile without the reinterpret_cast 😅

306

17:49

<larryruane_> crashed, wow

307

17:49

<stickies-v> Ran out of memory and couldn't recover

308

17:49

<larryruane_> oh because it's constexpr, so the infinite recursion is at compile time!!

309

17:49

<maxedwards> I wonder if the compiler is clever enough to know in this direction a static_cast could be implemented as a reinterpret_cast?

310

17:49

<larryruane_> my system (ubuntu) doesn't

311

17:50

<larryruane_> seems really bad that can crash your entire laptop (not just OOM the compiler)

312

17:51

<stickies-v> i'm on macos. not 100% it's related but this is the first time it happened. will reproduce when the meeting's over

313

17:52

<stickies-v> 14. Is the `has_witness` template parameter used for purposes other than type differentiation in `transaction_identifier`?

314

17:52

<stickies-v> (link: https://github.com/bitcoin-core-review-club/bitcoin/commit/28f9ff3d5416407d3c6d5374f6fc3c2767f4a7f3#diff-f3d91988bf7d98529d638fa829dc3fab695be002305a7614eaaeb283d4fb8101R17)

315

17:53

<vmammal> stickies-v no

316

17:54

<sipa> larryruane_: the constexpr will not cause recursion at compile time

317

17:54

<sipa> i think

318

17:54

<stickies-v> vmammal: correct!

319

17:54

<stickies-v> What are the limitations of this approach? Do you see alternatives?

320

17:54

<vmammal> nit: does anyone else prefer the `is_witness` naming style over `has_witness` ?

321

17:54

<larryruane_> by this approach, what do you mean, the entire PR? or something more specific?

322

17:55

<stickies-v> I mean using `template <bool has_witness>`

323

17:55

<maxedwards> vmammal: no because it's more than just the witness

324

17:56

<larryruane_> i guess an alternative would be to have `Txid` and `Wtxid` both subclass `uint256` directly ... but that would duplicate code

325

17:56

<stickies-v> larryruane_: yeah that would probably not pass review :-D

326

17:56

<vmammal> maxedwards fair

327

17:57

<vmammal> is there a performance cost to templating?

328

17:57

<larryruane_> vmammal: I think i prefer the existing `has_witness` because a transaction ID doesn't include a witness at all ever

329

17:57

<stickies-v> vmammal: compile time yes, runtime i don't think so?

330

17:58

<sipa> templating is effectively equivalent to the compiling generating the code for you

331

17:58

<larryruane_> vmammal: i guess very indirectly, because templating can lead to executable bloat (if the alternative isn't duplicating code), and that could lead to more code cache misses

332

17:59

<stickies-v> one thought I had (but it's almost definitely premature optimization) is that it's not guaranteed that we'll never have more than 2 types of tx identifiers, so then templating based on e.g. an enum instead of a bool could make sense

333

18:00

<larryruane_> the C codebase i used to work on heavily used c-preprocessor templating, which was actually pretty cool! (so c++ isn't needed to implement this concept) ... it confused IDEs however

334

18:00

<stickies-v> and i think when we don't have `transaction_identifier` subclass `uint256` (but wrap it instead) it'll be feasible to avoid templating too, but i didn't fully work that out either so could be wrong

335

18:01

<stickies-v> alright that's all the time we have for today folks

336

18:01

<stickies-v> #endmeeting