Two small fixes to node broadcast logic (mempool, p2p)

Jun 23, 2021

https://github.com/bitcoin/bitcoin/pull/22261

Host: glozow - PR author: jnewbery

The PR branch HEAD was ba99f37a at the time of this review club meeting.

Notes

The mempool keeps a set of unbroadcast transactions containing the txids of transactions that have not passed initial broadcast yet (determined based on a heuristic, whether or not the node has received a getdata for it). The unbroadcast set was introduced in PR #18038, which we covered in a previous review club.
There are two (related) mechanisms for rebroadcasting transactions:
- The wallet re-submits its own transactions to the node in CWallet::ResendWalletTransactions() (which calls into the node’s BroadcastTransaction() function).
- The node broadcasts the unbroadcast transactions to its peers in PeerManagerImpl::ReattemptInitialBroadcast() (which calls PeerManagerImpl::_RelayTransaction()).
Both of these mechanisms are executed on the scheduler thread.
Transactions can be referred to by txid (without witness) or by wtxid (with witness, defined in BIP141). Multiple valid transactions can have the same non-witness data (same txid) but different witnesses (different wtxid).
Transactions are announced to peers either by txid or by wtxid (since PR #18044, which we also covered in a previous review club). Whether a peer wishes to receive announcements using txid or wtxid is negotiated during connection. We refer to peers that prefer to receive wtxid announcements as wtxid-relay peers.
There are two unexpected behaviors in BroadcastTransaction(): one is related to unbroadcast and the other is in wtxid-based transaction relay.
- Unbroadcast: If BroadcastTransaction() is called with a transaction that has the same txid as a transaction in the mempool (can be same witness, different witness or even invalid witness), it causes the transaction to be re-added to the unbroadcast set.
- Relay: If BroadcastTransaction() is called with a same-txid-different-wtxid transaction as something already in the mempool, it will call RelayTransaction() with the wtxid of the argument tx’s wtxid rather than the one in the mempool. This causes the relay to fail (INVs are not sent) for wtxid-relay peers because SendMessages() queries the mempool by wtxid, doesn’t find it, and drops the announcement.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
How is BroadcastTransaction() invoked (i.e. what code paths)?
What happens when PeerManager::RelayTransaction() is called with the wtxid of a transaction that isn’t in the mempool?
What does the unbroadcast set represent, conceptually? When should a transaction be added and removed from the unbroadcast set?
In what scenario would the mempool have a transaction with the same txid but different wtxid as a wallet transaction (feel free to give creative answers)?
What bugs are present prior to this PR? Can you think of a scenario in which they cause a privacy leak?
How does PR #22261 fix these bugs?
Bonus: Why does the unbroadcast set contain txids instead of wtxids?

Meeting Log

17:00

<glozow> #startmeeting

17:00

<LarryRuane> hi

17:01

<schmidty> howdy

17:01

<svav> hi

17:01

<dopedsilicon> Hey

17:01

<michaelfolkson> hola

17:01

<Azorcode> Hi

17:01

<glozow> hi everyone! welcome to PR Review Club :) we're looking at Two small fixes to node broadcast logic today: https://bitcoincore.reviews/22261

17:01

<lightlike> hi

17:01

<glozow> did anyone get a chance to look at the notes or review the PR?

17:02

<svav> Looked at the notes

17:02

<michaelfolkson> Yup, maybe can't answer all the questions though

17:03

<michaelfolkson> Took a while to get my head around why the fixes were needed

17:03

<glozow> okie dokie

17:03

<glozow> so this PR is making changes to the `BroadcastTransaction()` function - can anyone tell me what this function does?

17:03

<glozow> we're here: https://github.com/bitcoin/bitcoin/blob/7317e14a44c6efc545e6fb9bcedee7174e93a8fa/src/node/transaction.cpp#L29

17:04

<glozow> (guesses are fine too)

17:05

<michaelfolkson> So broadcasts a transaction :) Called by RPCs

17:06

<michaelfolkson> Too obvious?

17:06

<lightlike> adds transactions to the mempool and/or unbroadcast set and also relays them immediately

17:06

<LarryRuane> This is to send a transaction that originated in the current node (as opposed to being relayed, one that we recieved from a different node) ... (i'm not too confident in this answer)

17:06

<glozow> lightlike: wonderful, yes. invokes the mempool validation logic + relays

17:07

<glozow> LarryRuane: aha yes, it's interesting that it's in the src/node folder

17:07

<glozow> so these transactions would come from clients (e.g. RPC and wallet)

17:08

<glozow> where is `BroadcastTransaction()` called? is anyone able to link us to some call sites?

17:09

<svav> src/node/transaction.cpp

17:09

<LarryRuane> could we get a little more background on "RPC" versus "wallet"? If I had to guess, I'd say `sendrawtransaction` is RPC, and generally has NOTHING to do with the local wallet ... where as "wallet RPC" would be something like `sendmany`

17:09

<glozow> svav: that's where `BroadcastTransaction()` is defined yes

17:09

<sipa> LarryRuane: that sounds correct

17:10

<sipa> sendmany is instructing your local wallet to construct & broadcast a transaction

17:10

<svav> can be called by either sendrawtransaction RPC or wallet RPCs

17:10

<glozow> LarryRuane: correct - `sendrawtransaction` lets you submit a raw transaction directly to the node, you could call that without having the wallet ocmpiled

17:10

<sipa> sendrawtransaction is sending it yourself, no wallet involved

17:11

<glozow> right

17:11

<LarryRuane> thanks sipa and glozow, very helpful

17:11

<michaelfolkson> So is the BroadcastTransaction() function used for both transactions originated in user's wallet and transactions received from other peers?

17:11

<glozow> ok, does anyone know exactly what bugs are being fixed in #22261 or should we walk through it together?

17:12

<LarryRuane> michaelfolkson I think not, not used for tx received from other peers

17:12

<glozow> michaelfolkson: it wouldn't come from other peers

17:12

<glozow> that invokes `AcceptToMemoryPool()` from the net processing layer

17:13

<michaelfolkson> LarryRuane glozow: Ok thanks

17:13

<michaelfolkson> I think I understand what bugs are being fixed but I had to get my head round some terminology

17:13

<glozow> michaelfolkson: feel free to give a guess of what the bugs are?

17:14

<michaelfolkson> Ok so summarizing John's description...

17:15

<michaelfolkson> Rebroadcasting a transaction that peers aren't interested in

17:15

<michaelfolkson> (as it has already been broadcast to them)

17:15

<michaelfolkson> Need to define what the unbroadcast set is here

17:15

<glozow> mm okay, i think this probably touches on the unbroadcast set

17:15

<glozow> that's 1 part of the PR

17:16

<glozow> good idea, anybody want to define unbroadcast set?

17:16

<glozow> hint: https://github.com/bitcoin/bitcoin/blob/0844084c/src/txmempool.h#L586

17:16

<glozow> https://bitcoincore.reviews/18038

17:16

<michaelfolkson> I looked it up there, yeah

17:16

<michaelfolkson> "when the initial broadcast of a transaction hasn’t been deemed successful"

17:16

<michaelfolkson> And then you need to define what successful means :)

17:17

<michaelfolkson> receiving a single GETDATA

17:17

<LarryRuane> The way I understand unbroadcast set is, our node originates a tx, sends it out, but isn't sure that any other node has it (maybe the P2P message got dropped?)

17:17

<glozow> LarryRuane: right, the goal is to have an idea of whether our initial broadcast of transactions has succeeded.

17:18

<LarryRuane> other nodes *may* have it, but we're not sure

17:18

<michaelfolkson> LarryRuane: Right the message could have got dropped, the node might already know of the transaction or the node might not be interested in that transaction despite not having it?

17:18

<michaelfolkson> Is that right glozow?

17:19

<glozow> michaelfolkson: right, so we're naming scenarios for why a peer might not send us a GETDATA for a transaction (which would cause the tx to remain in the unbroadcast set)

17:19

<glozow> and you're correct - it's possible there's some connection issue, they might already know the transaction, and "might not be interested" could be that they already rejected the tx or heard it from someone else

17:20

<LarryRuane> so IIUC what makes the unbroadcast set special is that we don't retry those tx sends as quickly as we used to, we wait about up to one day

17:20

<glozow> i don't think we reject tx invs for no reason 🤔

17:20

<michaelfolkson> So it could be "unsuccessful" from your perspective even though all nodes already know and have that transaction

17:20

<glozow> LarryRuane: right

17:20

<glozow> michaelfolkson: yes. that's unlikely if this is the first time broadcasting that transaction, though

17:20

<LarryRuane> for better privacy (sorry if i'm being obvious :) )

17:21

<glozow> this is only intended for the initial broadcast

17:21

<michaelfolkson> glozow: Right if you are originating the transaction, gotcha

17:21

<glozow> LarryRuane: not at all, i'm sure this is new information for lots of people! or good to review

17:22

<lightlike> if that tx originates from your node, how could it happen that they know it already? (if they learnt about it from some other peer then *that peer or another* must have sent you a GETDATA at some time, so it would be cleared from the unbroadcast set already?)

17:23

<glozow> lightlike: right, it shouldn't be the case that your unbroadcast set has a transaction that has already gone through initial broadcast

17:23

<glozow> but one of the bugs here is you could re-add a transaction to your unbroadcast set

17:23

<LarryRuane> ok but if it's a *set* ... sets can't have duplicates, right?

17:24

<glozow> yes, but once you've removed it, the set won't remember that it's already seen it before

17:24

<LarryRuane> ah, ok

17:25

<glozow> so our issue is: after we've already broadcast the tx, we don't want to re-add it to our unbroadcast set, because that will cause us to keep rebroadcasting it (and we won't remove it because our peer won't re-request it from us)

17:25

<glozow> does that make sense?

17:25

<glozow> peers*

17:25

<lightlike> would we remove it though after it makes it into a block in that situation?

17:26

<michaelfolkson> What was the rationale for adding it to the unbroadcast set before this PR?

17:26

<michaelfolkson> Just an oversight?

17:26

<LarryRuane> is this caused by what may be considered a user error? Do I have this flow right?: User initiates a tx (on our local node) ... gets broadcast (and also added to unbroadcast set) ... GETDATA happens, so we remove from unbroadcast set .. then user re-submits the same tx?

17:26

<glozow> lightlike: i thiiiink so

17:27

<glozow> LarryRuane: correct, it would be because the user called it again

17:28

<LarryRuane> so all what we've discussed so far here, is independed of the txid / wtxid distinction? I'm unclear how that relates to all this

17:28

<michaelfolkson> Yeah that is coming now

17:28

<glozow> LarryRuane: that's relevant here too

17:29

<LarryRuane> is the map key for the unbroadcast set the txid or the wtxid? (guess i could just look it up!)

17:29

<michaelfolkson> glozow: Relevant to the first fix?

17:29

<svav> Are there any diagrams for all this?

17:29

<LarryRuane> yes i was asking about the first fix

17:30

<michaelfolkson> That's the second fix? There's no overlap right?

17:30

<glozow> well, they're intertwined

17:30

<michaelfolkson> Ohh

17:30

<glozow> so what happens if you call `BroadcastTransaction()` with a transaction that has same-txid-different-wtxid as a tx in the mempool?

17:30

<glozow> (on master, before this PR)

100

17:30

<glozow> svav: i'm not sure, i think the review club for 18038 would be best place for info

101

17:31

<glozow> (do we know what I mean when i say same-txid-different-wtxid?)

102

17:31

<svav> it will call RelayTransaction() with the wtxid of the argument tx’s wtxid rather than the one in the mempool. This causes the relay to fail (INVs are not sent) for wtxid-relay peers because SendMessages() queries the mempool by wtxid, doesn’t find it, and drops the announcement.

103

17:31

<LarryRuane> glozow yes I do get that distinction

104

17:31

<glozow> svav: yes! well said

105

17:32

<svav> I copied that lol

106

17:32

<glozow> i know

107

17:32

<glozow> :P

108

17:32

<glozow> so before this PR, if you called `BroadcastTransaction()` with a transaction that has same-txid-different-wtxid as a tx in the mempool, it would get re-added to the unbroadcast set without even going through validation

109

17:33

<glozow> not a huge issue, but not exactly the behavior we want

110

17:33

<glozow> so the first fix in the PR is to move adding to unbroadcast set after a successful submission to mempool

111

17:33

<svav> What is the fix then?

112

17:34

<glozow> the other problem is that we would call `RelayTransaction()` with the wtxid of this transaction, rather than the one in the mempool

113

17:35

<glozow> ("this" transaction = the argument to the function)

114

17:35

<svav> For beginners can someone explain how two transaction can have the same txid but different wtxids?

115

17:35

<michaelfolkson> I do agree with svav that a diagram would be nice here :)

116

17:35

<svav> I love diagrams :)

117

17:35

<michaelfolkson> The interaction between the mempool, the unbroadcast set etc

118

17:35

<LarryRuane> I'm trying to understand why there would be two tx with the same txid but different witnesses (different wtxid) ... is it because there's some random element in the witness data? so if I submit the same tx twice (but signing each separately), then this can happen?

119

17:35

<LarryRuane> svav yes you beat me to that question!

120

17:36

<michaelfolkson> It was discussed yesterday in the L2 onchain workshop right? You find a cheaper witness

121

17:36

<glozow> mm so the txid of a transaction doesn't commit to the witness data

122

17:37

<glozow> i'm trying to find a good diagram

123

17:37

<LarryRuane> right so IIUC the txid commits to the *effects* of a tx only, the wtxid commits also to the authorization

124

17:37

<glozow> LarryRuane: correct

125

17:37

<michaelfolkson> https://bitcoin.stackexchange.com/questions/99409/what-does-segwit-remove

126

17:38

<michaelfolkson> At the bottom

127

17:38

<michaelfolkson> The wtxids get collected up to the coinbase transaction and are independent of the txids

128

17:38

<glozow> nono, that's the coinbase witness commitment

129

17:38

<glozow> we're talking about what transaction data gets serialized to get the wtxid vs the txid

130

17:39

<glozow> https://usercontent.irccloud-cdn.com/file/1XFdPlGD/image.png

131

17:39

<glozow> this is the simplest diagram i can find

132

17:40

<glozow> basically, you don't include witness data in txid haha

133

17:40

<michaelfolkson> Right the coinbase tx commits to the witnesses but not the transaction attached to that witness

134

17:40

<sipa> yes they do

135

17:40

<glozow> er, we're not talking about the coinbase commitment

136

17:41

<sipa> the coinbase tx commitments contains the merkle root of the wtxid tree

137

17:41

<glozow> and ^

138

17:41

<sipa> wtxids commit to the full transaction, witness and non-witness data

139

17:41

<michaelfolkson> But you can change the wtxid without changing the txid of that transaction

140

17:41

<sipa> yes, by changing the witness

141

17:42

<michaelfolkson> The witness data is not committed to in the txid. Otherwise that would reintroduce malleability

142

17:42

<sipa> yes

143

17:42

<LarryRuane> sipa so was I correct in my earlier question, you can create an unlimited number of valid witnesses for a given tx (assuming you can create one)?

144

17:42

<michaelfolkson> Ok thanks. I thought I understood that lol

145

17:43

<sipa> LarryRuane: yes

146

17:43

<sipa> LarryRuane: well, technically speaking there is only a finite number of valid witnesses, due to block weight being finite :)

147

17:43

<glozow> teehee

148

17:44

<glozow> hok

149

17:44

<michaelfolkson> Ok sorry glozow let's get back to your question

150

17:44

<glozow> so our peermanager's `RelayTransaction()` function takes a txid and wtxid parameter

151

17:44

<glozow> (can anyone tell me why?)

152

17:45

<glozow> what happens if we're trying to relay a transaction to our peer and can't find it in our mempool?

153

17:46

<michaelfolkson> We can't find the right wtxid or we literally can't find the txid?

154

17:46

<glozow> we can't find the transaction

155

17:46

<glozow> we know the id

156

17:47

<michaelfolkson> We can't relay it? We don't have it?

157

17:47

<glozow> yup

158

17:47

<michaelfolkson> You mean like it returns an error?

159

17:47

<glozow> no, we just skip it

160

17:47

<svav> wtxid parameter also used because some peers that prefer to receive wtxid announcements, i.e. wtxid-relay peers

161

17:48

<glozow> it's fully possible that we want to announce a transaction but it gets evicted from mempool by the time we go to announce it

162

17:48

<svav> ?

163

17:48

<glozow> but in this case, it could be because we called `RelayTransaction()` with a wtxid of a transaction that isn't in our mempool

164

17:48

<glozow> svav: exactly

165

17:48

<michaelfolkson> What's the scenario here. The wallet doesn't know what the node is doing? That causes the confusion?

166

17:49

<michaelfolkson> If they were perfectly in sync this wouldn't happen

167

17:50

<glozow> ok the scenario is this: we have a transaction. the mempool has one with witness A and we call `BroadcastTransaction()` with the exact same tx, but different witness

168

17:50

<lightlike> why would you malleate your own transactions?

169

17:50

<glozow> idk 🤷 maybe there are other parties in it

170

17:50

<glozow> it's a multisig, idk

171

17:51

<michaelfolkson> lightlike: It is finding a cheaper witness after previously broadcasting a more expensive witness I think

172

17:51

<glozow> maybe it has multiple spending paths and you/a counterparty both made transactions

173

17:51

<LarryRuane> michaelfolkson cheaper meaning smaller (so lower fee)?

174

17:51

<michaelfolkson> It isn't deliberate malleation. And obviously the txid isn't malleated post SegWit

175

17:52

<michaelfolkson> LarryRuane: Smaller, so yeah lower fee

176

17:52

<glozow> this isn't a show-stopping bug, it's just not the behavior we'd expect

177

17:53

<glozow> either way, we should try to relay the transaction in our mempool

178

17:53

<michaelfolkson> "Prior to this commit, if BroadcastTransaction() is called with

179

17:53

<michaelfolkson> relay=true, then it'll call RelayTransaction() using the txid/wtxid of

180

17:53

<michaelfolkson> the new tx, not the txid/wtxid of the mempool tx."

181

17:53

<michaelfolkson> And the mempool doesn't have the new transaction?

182

17:53

<glozow> yes

183

17:54

<michaelfolkson> Ok

184

17:54

— michaelfolkson sweats

185

17:54

<LarryRuane> what's the worst effect (at a high level) of not fixing these bugs? transactions not being relayed around the network?

186

17:55

<glozow> good question. anybody have ideas?

187

17:55

<michaelfolkson> 2) just means less effective relaying right? More failures

188

17:55

<michaelfolkson> 1) is wasteful. Unnecessary relaying

189

17:55

<lightlike> for the first one, it would seem the opposite: tx beinged inv'ed too much (although the network already knows about it)

190

17:57

<glozow> right

191

17:58

<glozow> no consensus failures. but worth fixing, ya?

192

17:58

<lightlike> If i understand it correctly, "transactions not being relayed" would rather be a downside of the first fix: if we happen to have bad peers, broadcast to them, get a GETDATA but they don't relay any further, then we'd currently try again after ~24hours (when we have better peers), after the fix we wouldn't do this anymore.

193

17:58

<LarryRuane> we're about out of time, so this would be too long of a discussion, but I notice there's no new or changed test code with this PR .. would be interesting to get into why, and generally, what kind of PRs need tests and what kind don't

194

17:58

<michaelfolkson> LarryRuane: If you read the review comments tests were discussed

195

17:59

<LarryRuane> ok thanks (hadn't done that, will do)

196

17:59

<michaelfolkson> duncandean is working on tests it appears

197

17:59

<michaelfolkson> I think everyone agrees it should have tests