Two small fixes to node broadcast logic (mempool, p2p)

https://github.com/bitcoin/bitcoin/pull/22261

Host: glozow  -  PR author: jnewbery

The PR branch HEAD was ba99f37a at the time of this review club meeting.

Notes

  • The mempool keeps a set of unbroadcast transactions containing the txids of transactions that have not passed initial broadcast yet (determined based on a heuristic, whether or not the node has received a getdata for it). The unbroadcast set was introduced in PR #18038, which we covered in a previous review club.

  • There are two (related) mechanisms for rebroadcasting transactions:

    Both of these mechanisms are executed on the scheduler thread.

  • Transactions can be referred to by txid (without witness) or by wtxid (with witness, defined in BIP141). Multiple valid transactions can have the same non-witness data (same txid) but different witnesses (different wtxid).

  • Transactions are announced to peers either by txid or by wtxid (since PR #18044, which we also covered in a previous review club). Whether a peer wishes to receive announcements using txid or wtxid is negotiated during connection. We refer to peers that prefer to receive wtxid announcements as wtxid-relay peers.

  • There are two unexpected behaviors in BroadcastTransaction(): one is related to unbroadcast and the other is in wtxid-based transaction relay.

    • Unbroadcast: If BroadcastTransaction() is called with a transaction that has the same txid as a transaction in the mempool (can be same witness, different witness or even invalid witness), it causes the transaction to be re-added to the unbroadcast set.

    • Relay: If BroadcastTransaction() is called with a same-txid-different-wtxid transaction as something already in the mempool, it will call RelayTransaction() with the wtxid of the argument tx’s wtxid rather than the one in the mempool. This causes the relay to fail (INVs are not sent) for wtxid-relay peers because SendMessages() queries the mempool by wtxid, doesn’t find it, and drops the announcement.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. How is BroadcastTransaction() invoked (i.e. what code paths)?

  3. What happens when PeerManager::RelayTransaction() is called with the wtxid of a transaction that isn’t in the mempool?

  4. What does the unbroadcast set represent, conceptually? When should a transaction be added and removed from the unbroadcast set?

  5. In what scenario would the mempool have a transaction with the same txid but different wtxid as a wallet transaction (feel free to give creative answers)?

  6. What bugs are present prior to this PR? Can you think of a scenario in which they cause a privacy leak?

  7. How does PR #22261 fix these bugs?

  8. Bonus: Why does the unbroadcast set contain txids instead of wtxids?

Meeting Log

  117:00 <glozow> #startmeeting
  217:00 <LarryRuane> hi
  317:01 <schmidty> howdy
  417:01 <svav> hi
  517:01 <dopedsilicon> Hey
  617:01 <michaelfolkson> hola
  717:01 <Azorcode> Hi
  817:01 <glozow> hi everyone! welcome to PR Review Club :) we're looking at Two small fixes to node broadcast logic today: https://bitcoincore.reviews/22261
  917:01 <lightlike> hi
 1017:01 <glozow> did anyone get a chance to look at the notes or review the PR?
 1117:02 <svav> Looked at the notes
 1217:02 <michaelfolkson> Yup, maybe can't answer all the questions though
 1317:03 <michaelfolkson> Took a while to get my head around why the fixes were needed
 1417:03 <glozow> okie dokie
 1517:03 <glozow> so this PR is making changes to the `BroadcastTransaction()` function - can anyone tell me what this function does?
 1617:03 <glozow> we're here: https://github.com/bitcoin/bitcoin/blob/7317e14a44c6efc545e6fb9bcedee7174e93a8fa/src/node/transaction.cpp#L29
 1717:04 <glozow> (guesses are fine too)
 1817:05 <michaelfolkson> So broadcasts a transaction :) Called by RPCs
 1917:06 <michaelfolkson> Too obvious?
 2017:06 <lightlike> adds transactions to the mempool and/or unbroadcast set and also relays them immediately
 2117:06 <LarryRuane> This is to send a transaction that originated in the current node (as opposed to being relayed, one that we recieved from a different node) ... (i'm not too confident in this answer)
 2217:06 <glozow> lightlike: wonderful, yes. invokes the mempool validation logic + relays
 2317:07 <glozow> LarryRuane: aha yes, it's interesting that it's in the src/node folder
 2417:07 <glozow> so these transactions would come from clients (e.g. RPC and wallet)
 2517:08 <glozow> where is `BroadcastTransaction()` called? is anyone able to link us to some call sites?
 2617:09 <svav> src/node/transaction.cpp
 2717:09 <LarryRuane> could we get a little more background on "RPC" versus "wallet"? If I had to guess, I'd say `sendrawtransaction` is RPC, and generally has NOTHING to do with the local wallet ... where as "wallet RPC" would be something like `sendmany`
 2817:09 <glozow> svav: that's where `BroadcastTransaction()` is defined yes
 2917:09 <sipa> LarryRuane: that sounds correct
 3017:10 <sipa> sendmany is instructing your local wallet to construct & broadcast a transaction
 3117:10 <svav> can be called by either sendrawtransaction RPC or wallet RPCs
 3217:10 <glozow> LarryRuane: correct - `sendrawtransaction` lets you submit a raw transaction directly to the node, you could call that without having the wallet ocmpiled
 3317:10 <sipa> sendrawtransaction is sending it yourself, no wallet involved
 3417:11 <glozow> right
 3517:11 <LarryRuane> thanks sipa and glozow, very helpful
 3617:11 <michaelfolkson> So is the BroadcastTransaction() function used for both transactions originated in user's wallet and transactions received from other peers?
 3717:11 <glozow> ok, does anyone know exactly what bugs are being fixed in #22261 or should we walk through it together?
 3817:12 <LarryRuane> michaelfolkson I think not, not used for tx received from other peers
 3917:12 <glozow> michaelfolkson: it wouldn't come from other peers
 4017:12 <glozow> that invokes `AcceptToMemoryPool()` from the net processing layer
 4117:13 <michaelfolkson> LarryRuane glozow: Ok thanks
 4217:13 <michaelfolkson> I think I understand what bugs are being fixed but I had to get my head round some terminology
 4317:13 <glozow> michaelfolkson: feel free to give a guess of what the bugs are?
 4417:14 <michaelfolkson> Ok so summarizing John's description...
 4517:15 <michaelfolkson> Rebroadcasting a transaction that peers aren't interested in
 4617:15 <michaelfolkson> (as it has already been broadcast to them)
 4717:15 <michaelfolkson> Need to define what the unbroadcast set is here
 4817:15 <glozow> mm okay, i think this probably touches on the unbroadcast set
 4917:15 <glozow> that's 1 part of the PR
 5017:16 <glozow> good idea, anybody want to define unbroadcast set?
 5117:16 <glozow> hint: https://github.com/bitcoin/bitcoin/blob/0844084c/src/txmempool.h#L586
 5217:16 <glozow> https://bitcoincore.reviews/18038
 5317:16 <michaelfolkson> I looked it up there, yeah
 5417:16 <michaelfolkson> "when the initial broadcast of a transaction hasn’t been deemed successful"
 5517:16 <michaelfolkson> And then you need to define what successful means :)
 5617:17 <michaelfolkson> receiving a single GETDATA
 5717:17 <LarryRuane> The way I understand unbroadcast set is, our node originates a tx, sends it out, but isn't sure that any other node has it (maybe the P2P message got dropped?)
 5817:17 <glozow> LarryRuane: right, the goal is to have an idea of whether our initial broadcast of transactions has succeeded.
 5917:18 <LarryRuane> other nodes *may* have it, but we're not sure
 6017:18 <michaelfolkson> LarryRuane: Right the message could have got dropped, the node might already know of the transaction or the node might not be interested in that transaction despite not having it?
 6117:18 <michaelfolkson> Is that right glozow?
 6217:19 <glozow> michaelfolkson: right, so we're naming scenarios for why a peer might not send us a GETDATA for a transaction (which would cause the tx to remain in the unbroadcast set)
 6317:19 <glozow> and you're correct - it's possible there's some connection issue, they might already know the transaction, and "might not be interested" could be that they already rejected the tx or heard it from someone else
 6417:20 <LarryRuane> so IIUC what makes the unbroadcast set special is that we don't retry those tx sends as quickly as we used to, we wait about up to one day
 6517:20 <glozow> i don't think we reject tx invs for no reason 🤔
 6617:20 <michaelfolkson> So it could be "unsuccessful" from your perspective even though all nodes already know and have that transaction
 6717:20 <glozow> LarryRuane: right
 6817:20 <glozow> michaelfolkson: yes. that's unlikely if this is the first time broadcasting that transaction, though
 6917:20 <LarryRuane> for better privacy (sorry if i'm being obvious :) )
 7017:21 <glozow> this is only intended for the initial broadcast
 7117:21 <michaelfolkson> glozow: Right if you are originating the transaction, gotcha
 7217:21 <glozow> LarryRuane: not at all, i'm sure this is new information for lots of people! or good to review
 7317:22 <lightlike> if that tx originates from your node, how could it happen that they know it already? (if they learnt about it from some other peer then *that peer or another* must have sent you a GETDATA at some time, so it would be cleared from the unbroadcast set already?)
 7417:23 <glozow> lightlike: right, it shouldn't be the case that your unbroadcast set has a transaction that has already gone through initial broadcast
 7517:23 <glozow> but one of the bugs here is you could re-add a transaction to your unbroadcast set
 7617:23 <LarryRuane> ok but if it's a *set* ... sets can't have duplicates, right?
 7717:24 <glozow> yes, but once you've removed it, the set won't remember that it's already seen it before
 7817:24 <LarryRuane> ah, ok
 7917:25 <glozow> so our issue is: after we've already broadcast the tx, we don't want to re-add it to our unbroadcast set, because that will cause us to keep rebroadcasting it (and we won't remove it because our peer won't re-request it from us)
 8017:25 <glozow> does that make sense?
 8117:25 <glozow> peers*
 8217:25 <lightlike> would we remove it though after it makes it into a block in that situation?
 8317:26 <michaelfolkson> What was the rationale for adding it to the unbroadcast set before this PR?
 8417:26 <michaelfolkson> Just an oversight?
 8517:26 <LarryRuane> is this caused by what may be considered a user error? Do I have this flow right?: User initiates a tx (on our local node) ... gets broadcast (and also added to unbroadcast set) ... GETDATA happens, so we remove from unbroadcast set .. then user re-submits the same tx?
 8617:26 <glozow> lightlike: i thiiiink so
 8717:27 <glozow> LarryRuane: correct, it would be because the user called it again
 8817:28 <LarryRuane> so all what we've discussed so far here, is independed of the txid / wtxid distinction? I'm unclear how that relates to all this
 8917:28 <michaelfolkson> Yeah that is coming now
 9017:28 <glozow> LarryRuane: that's relevant here too
 9117:29 <LarryRuane> is the map key for the unbroadcast set the txid or the wtxid? (guess i could just look it up!)
 9217:29 <michaelfolkson> glozow: Relevant to the first fix?
 9317:29 <svav> Are there any diagrams for all this?
 9417:29 <LarryRuane> yes i was asking about the first fix
 9517:30 <michaelfolkson> That's the second fix? There's no overlap right?
 9617:30 <glozow> well, they're intertwined
 9717:30 <michaelfolkson> Ohh
 9817:30 <glozow> so what happens if you call `BroadcastTransaction()` with a transaction that has same-txid-different-wtxid as a tx in the mempool?
 9917:30 <glozow> (on master, before this PR)
10017:30 <glozow> svav: i'm not sure, i think the review club for 18038 would be best place for info
10117:31 <glozow> (do we know what I mean when i say same-txid-different-wtxid?)
10217:31 <svav> it will call RelayTransaction() with the wtxid of the argument tx’s wtxid rather than the one in the mempool. This causes the relay to fail (INVs are not sent) for wtxid-relay peers because SendMessages() queries the mempool by wtxid, doesn’t find it, and drops the announcement.
10317:31 <LarryRuane> glozow yes I do get that distinction
10417:31 <glozow> svav: yes! well said
10517:32 <svav> I copied that lol
10617:32 <glozow> i know
10717:32 <glozow> :P
10817:32 <glozow> so before this PR, if you called `BroadcastTransaction()` with a transaction that has same-txid-different-wtxid as a tx in the mempool, it would get re-added to the unbroadcast set without even going through validation
10917:33 <glozow> not a huge issue, but not exactly the behavior we want
11017:33 <glozow> so the first fix in the PR is to move adding to unbroadcast set after a successful submission to mempool
11117:33 <svav> What is the fix then?
11217:34 <glozow> the other problem is that we would call `RelayTransaction()` with the wtxid of this transaction, rather than the one in the mempool
11317:35 <glozow> ("this" transaction = the argument to the function)
11417:35 <svav> For beginners can someone explain how two transaction can have the same txid but different wtxids?
11517:35 <michaelfolkson> I do agree with svav that a diagram would be nice here :)
11617:35 <svav> I love diagrams :)
11717:35 <michaelfolkson> The interaction between the mempool, the unbroadcast set etc
11817:35 <LarryRuane> I'm trying to understand why there would be two tx with the same txid but different witnesses (different wtxid) ... is it because there's some random element in the witness data? so if I submit the same tx twice (but signing each separately), then this can happen?
11917:35 <LarryRuane> svav yes you beat me to that question!
12017:36 <michaelfolkson> It was discussed yesterday in the L2 onchain workshop right? You find a cheaper witness
12117:36 <glozow> mm so the txid of a transaction doesn't commit to the witness data
12217:37 <glozow> i'm trying to find a good diagram
12317:37 <LarryRuane> right so IIUC the txid commits to the *effects* of a tx only, the wtxid commits also to the authorization
12417:37 <glozow> LarryRuane: correct
12517:37 <michaelfolkson> https://bitcoin.stackexchange.com/questions/99409/what-does-segwit-remove
12617:38 <michaelfolkson> At the bottom
12717:38 <michaelfolkson> The wtxids get collected up to the coinbase transaction and are independent of the txids
12817:38 <glozow> nono, that's the coinbase witness commitment
12917:38 <glozow> we're talking about what transaction data gets serialized to get the wtxid vs the txid
13017:39 <glozow> https://usercontent.irccloud-cdn.com/file/1XFdPlGD/image.png
13117:39 <glozow> this is the simplest diagram i can find
13217:40 <glozow> basically, you don't include witness data in txid haha
13317:40 <michaelfolkson> Right the coinbase tx commits to the witnesses but not the transaction attached to that witness
13417:40 <sipa> yes they do
13517:40 <glozow> er, we're not talking about the coinbase commitment
13617:41 <sipa> the coinbase tx commitments contains the merkle root of the wtxid tree
13717:41 <glozow> and ^
13817:41 <sipa> wtxids commit to the full transaction, witness and non-witness data
13917:41 <michaelfolkson> But you can change the wtxid without changing the txid of that transaction
14017:41 <sipa> yes, by changing the witness
14117:42 <michaelfolkson> The witness data is not committed to in the txid. Otherwise that would reintroduce malleability
14217:42 <sipa> yes
14317:42 <LarryRuane> sipa so was I correct in my earlier question, you can create an unlimited number of valid witnesses for a given tx (assuming you can create one)?
14417:42 <michaelfolkson> Ok thanks. I thought I understood that lol
14517:43 <sipa> LarryRuane: yes
14617:43 <sipa> LarryRuane: well, technically speaking there is only a finite number of valid witnesses, due to block weight being finite :)
14717:43 <glozow> teehee
14817:44 <glozow> hok
14917:44 <michaelfolkson> Ok sorry glozow let's get back to your question
15017:44 <glozow> so our peermanager's `RelayTransaction()` function takes a txid and wtxid parameter
15117:44 <glozow> (can anyone tell me why?)
15217:45 <glozow> what happens if we're trying to relay a transaction to our peer and can't find it in our mempool?
15317:46 <michaelfolkson> We can't find the right wtxid or we literally can't find the txid?
15417:46 <glozow> we can't find the transaction
15517:46 <glozow> we know the id
15617:47 <michaelfolkson> We can't relay it? We don't have it?
15717:47 <glozow> yup
15817:47 <michaelfolkson> You mean like it returns an error?
15917:47 <glozow> no, we just skip it
16017:47 <svav> wtxid parameter also used because some peers that prefer to receive wtxid announcements, i.e. wtxid-relay peers
16117:48 <glozow> it's fully possible that we want to announce a transaction but it gets evicted from mempool by the time we go to announce it
16217:48 <svav> ?
16317:48 <glozow> but in this case, it could be because we called `RelayTransaction()` with a wtxid of a transaction that isn't in our mempool
16417:48 <glozow> svav: exactly
16517:48 <michaelfolkson> What's the scenario here. The wallet doesn't know what the node is doing? That causes the confusion?
16617:49 <michaelfolkson> If they were perfectly in sync this wouldn't happen
16717:50 <glozow> ok the scenario is this: we have a transaction. the mempool has one with witness A and we call `BroadcastTransaction()` with the exact same tx, but different witness
16817:50 <lightlike> why would you malleate your own transactions?
16917:50 <glozow> idk 🤷 maybe there are other parties in it
17017:50 <glozow> it's a multisig, idk
17117:51 <michaelfolkson> lightlike: It is finding a cheaper witness after previously broadcasting a more expensive witness I think
17217:51 <glozow> maybe it has multiple spending paths and you/a counterparty both made transactions
17317:51 <LarryRuane> michaelfolkson cheaper meaning smaller (so lower fee)?
17417:51 <michaelfolkson> It isn't deliberate malleation. And obviously the txid isn't malleated post SegWit
17517:52 <michaelfolkson> LarryRuane: Smaller, so yeah lower fee
17617:52 <glozow> this isn't a show-stopping bug, it's just not the behavior we'd expect
17717:53 <glozow> either way, we should try to relay the transaction in our mempool
17817:53 <michaelfolkson> "Prior to this commit, if BroadcastTransaction() is called with
17917:53 <michaelfolkson> relay=true, then it'll call RelayTransaction() using the txid/wtxid of
18017:53 <michaelfolkson> the new tx, not the txid/wtxid of the mempool tx."
18117:53 <michaelfolkson> And the mempool doesn't have the new transaction?
18217:53 <glozow> yes
18317:54 <michaelfolkson> Ok
18417:54 — michaelfolkson sweats
18517:54 <LarryRuane> what's the worst effect (at a high level) of not fixing these bugs? transactions not being relayed around the network?
18617:55 <glozow> good question. anybody have ideas?
18717:55 <michaelfolkson> 2) just means less effective relaying right? More failures
18817:55 <michaelfolkson> 1) is wasteful. Unnecessary relaying
18917:55 <lightlike> for the first one, it would seem the opposite: tx beinged inv'ed too much (although the network already knows about it)
19017:57 <glozow> right
19117:58 <glozow> no consensus failures. but worth fixing, ya?
19217:58 <lightlike> If i understand it correctly, "transactions not being relayed" would rather be a downside of the first fix: if we happen to have bad peers, broadcast to them, get a GETDATA but they don't relay any further, then we'd currently try again after ~24hours (when we have better peers), after the fix we wouldn't do this anymore.
19317:58 <LarryRuane> we're about out of time, so this would be too long of a discussion, but I notice there's no new or changed test code with this PR .. would be interesting to get into why, and generally, what kind of PRs need tests and what kind don't
19417:58 <michaelfolkson> LarryRuane: If you read the review comments tests were discussed
19517:59 <LarryRuane> ok thanks (hadn't done that, will do)
19617:59 <michaelfolkson> duncandean is working on tests it appears
19717:59 <michaelfolkson> I think everyone agrees it should have tests
19818:00 <glozow> shameless plug: if you're interested in same-txid-diff-wtxid stuff, i also recommend https://github.com/bitcoin/bitcoin/pull/22253
19918:00 <LarryRuane> often the tests are harder to write than the code being tested, I've noticed!
20018:00 <glozow> that's all we have time for :)
20118:00 <glozow> #endmeeting
20218:00 <glozow> This is a test: https://github.com/glozow/bitcoin/commit/5069a834ed06f2df5f43c416dc5a501c3b010b33