Retry notfounds with more urgency (p2p)

Mar 25, 2020

https://github.com/bitcoin/bitcoin/pull/18238

Host: amitiuttarwar - PR author: ajtowns

The PR branch HEAD was a204d158 at the time of this review club meeting.

Notes

Transaction relay is a three step process: INV -> GETDATA -> TX:
1. the relaying node sends an INV message to the receiving node to announce a new transaction.
2. the receiving node sends a GETDATA message to the relaying node to request the full transaction.
3. the relaying node delivers a TX message to the receiving node. If the relaying node is no longer able to deliver the transaction, it responds with NOTFOUND instead of the TX.
You can learn more about these messages here.
A node will not have more than one GETDATA in flight for a particular transaction. The TxDownloadState struct is used to manage the state of transactions for each peer. This struct has a very descriptive code comment that gives an overview of how transaction download is managed.
If a GETDATA request is in flight, a node will wait up to 2 minutes before timing out and attempting to request the transaction from another peer.
Currently, if a node receives a NOTFOUND message, it does not use that information to try to speed up requesting it from another peer. Instead, it continues to wait until the 2 minute period is over before sending out another GETDATA.

This PR introduces logic to speed up a request to another peer when possible.
PR 15505 was a previous attempt at implementing a solution and was covered in PR Review Club 15505. Reading the notes / logs / PR can help shine light on some of the nuances that make this feature tricky to implement.
Having this retry speedup is mentioned as a pre-req for PR 17303. That PR is trying to remove mapRelay, a data structure that keeps track of recently announced transactions. Doing so would increase the likelihood of receiving a NOTFOUND message. For example, nodes with small mempools could announce a transaction, but then evict it before receiving the GETDATA. Without logic for the receiving node to quickly retry, the node with a small mempool could be wasting bandwidth and accidentally DoS-ing its peer.

Questions

Did you review the PRs? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)
What are some scenarios that lead to NOTFOUND messages?
How does this PR implement the logic to speed up retries?
Upon receiving a NOTFOUND, why doesn’t the node immediately request the transaction from another peer? What does it do instead?
In human words, what information does TxDownloadState keep track of? What does this PR change?
PR 15505 review comments discussed various approaches for identifying which peer to request for the transaction after receiving a NOTFOUND. How does PR 18238 answer that question?
Why do we treat inbound and outbound peers differently? How does this PR manage preferential treatment?
Zooming back out to a birds-eye view, what are potential issues that could be introduced with this approach? What are other possiblities for how transaction relay could handle unresponsive peers?

Meeting Log

13:00

<amiti> #startmeeting

13:00

<jnewbery> hi

13:00

<gleb> hi!

13:00

<jkczyz> hi

13:00

<felixweis> hi

13:00

<lightlike> hi

13:00

<amiti> hi everyone! welcome to this weeks edition of PR review club

13:00

<nehan_> hi

13:01

<sipa> hi

13:01

<amiti> this weeks PR is ... quite a doozy!

13:01

<r251d> hi

13:01

<michaelfolkson> hi

13:01

<amiti> who has gotten a chance to review the pr? y/n

13:02

<gleb> y, I was ready to ACK module little things I asked about.

13:02

<jnewbery> 0.7 y

13:02

<lightlike> y

13:02

<jkczyz> y (briefly)

13:03

<michaelfolkson> y (also briefly)

13:03

<jonatack> hi

13:03

<nehan_> y but also briefly

13:03

<amiti> what are initial impressions?

13:03

<amiti> I saw some of you left concept ACKs on the PR

13:04

<nehan_> i'm a bit unclear on why this is important

13:05

<jnewbery> Concept ACK! I think I prefer the approach over #15505, which adds a counter to net

13:05

<amiti> can anyone shed light on some context that makes this PR relevant / helpful?

13:05

<lightlike> I like it better than the approach in the predecessor because it is less invasive.

13:06

<nehan_> what is the downside of waiting until the txn request times out to request from another peer?

13:06

<andrewtoth> hi

13:06

<jkczyz> Generally like the idea but implementation-wise would prefer if m_tx_announced did not have a dual use, if possible

13:07

<michaelfolkson> nehan: You'd want to get the transaction as fast as possible for performance reasons I'd guess. Especially if you're a miner

13:07

<amiti> nehan_: one explicit downside comes if we implement the changes in PR #17303, which is trying to remove mapRelay

13:07

<sipa> nehan_: are you asking why avoiding a 2 minute delay in tx propagation is desirable?

13:07

<gleb> Let's put it this way. How long would we wait before this PR even if NOTFOUND is received?

13:08

<nehan_> michaelfolk: amiti: sipa: thanks!

13:08

<nehan_> sipa: yes, in the cases this PR affects. perhaps this is obvious to everyone but me!

13:08

<jkczyz> nehan_: also some privacy benefits from 17303 which this PR is a prerequisite for

13:08

<lightlike> not wasting time seems a good idea in itself - I didn't completely understand the DOS connection with PR#17303, can someone explain this a bit more?

13:08

<sipa> if we were talking about a few seconds, this would probably be harmless, as our normal tx propagation mechanisms has delays of that order

13:09

<sipa> but minutes beings it in the same order of magnitude as blocks, so it very well may interfere with block propagation too (as bip 152 relay relies on having good tx propagation)

13:10

<jnewbery> lightlike: we currently keep a mapRelay of transactions we've recently announced. Even if a transaction is subsequently removed from the mempool (for expiry or some other reason), it's still available to be fetched by peers we've announced it to

13:10

<gleb> It's not super-clear it's 2 minutes though. I find the existing comment confusing.

13:10

<amiti> lightlike: my understanding is... mapRelay has a different logic for transactions it keeps than the mempool logic. so, in the case of a mempool with lots of tx churn, mapRelay might keep the txn for longer. if we remove it, then nodes with small mempools could more frequently be responding with `NOTFOUND`, and tying up the connection for upto the 2 minute period even though behaving honestly

13:11

<gleb> The comment says: "request unlike it's inflight from someone", and current code ALREADY cleans inflight when NOTFOUND is sent.

13:11

<jnewbery> 17303 proposed to remove this mapRelay. That means if the tx is removed from the mempool between the INV announcement and GETDATA request, then the node announcing it would have prevented the receiving node from fetching it from elsewhere for a minute

13:11

<gleb> While that comment says that currently, it's implicit, and the logic actually doesn't look at "inflight" variable

13:12

<sipa> amiti: that's my my understanding too

13:12

<lightlike> ah ok, thanks jnewbery, amiti

13:12

<sipa> mapRelay tries to implement a policy of "we promise to keep everything we've relayed recently available for fetching, regardless of what the mempool does"

13:13

<amiti> also, there's a lot of nuanced logic of how nodes are queued via the `process_time` that I'm just starting to wrap my head around... I'm interested in better familiarizing myself the implementation to understand different attack possibilities

13:13

<sipa> mapRelay predates the limited-size mempool though; i think its original purpose was making sure txn could be downloaded still even when a block confirmed them in the mean time

13:14

<amiti> so, can someone summarize this PR? how does it speed up retries?

13:14

<jnewbery> mapRelay existed in the initial git commit

13:15

<ariard> it speeds up retries by a) instead of relaying on GETDDATA_TX_INTERVAL for a non-received transaction and b) assuming we receive a NOTFOUND, we're gong to retry download with other peers

13:16

<andrewtoth> ajtowns' header comment is a good summary: Anytime we see a NOTFOUND in response to a request for a tx, look through

13:16

<andrewtoth> each of our peers for anyone else who announced the tx, find one who

13:16

<andrewtoth> doesn't already have its inflight tx count maxed out, and of those,

13:16

<andrewtoth> make the one who'd look at it first, look at it asap.

13:17

<jonatack> nehan_: thanks for your question; i reckon a non-trivial number of observers wondered the same or appreciated confirmation on the answers

13:17

<michaelfolkson> Yup definitely jonatack nehan_

13:18

<sipa> andrewtoth: so effectively, receiving a NOTFOUND makes us skip the timeout

13:18

<amiti> ariard, andrewtoth, sipa: exactly

13:18

<amiti> andrewtoth: you already mentioned this / OP mentions, but can someone make extra clear- how does this PR identify who to ask next for the txn?

13:19

<jnewbery> I was confused about how the 'asap' part is implemented. Comment here: https://github.com/bitcoin/bitcoin/pull/18238#discussion_r397925056

13:19

<amiti> bonus points: how is this different than the previous attempt at this functionality? PR 15505

13:19

<jnewbery> it brings forward the m_tx_process_time, but doesn't change the g_already_asked_for time so I don't know how a new GETDATA request is triggered

13:19

<nehan_> jonatack: you're welcome! i get why this is better than #15505 and why one might want something like it for #17303 but i'm not yet convinced #17303 is that important. and "always reduce tx propagation times" doesn't seem like a good metric to use without considering 1) when things are actually slowed down and 2) additional code complexity.

13:20

<nehan_> nehan_: but i don't want to distract from actually reviewing the PR!

13:20

<michaelfolkson> We're requesting the tx from a peer who announced the tx

13:21

<michaelfolkson> Oh that's the same PR 15505

13:21

<lightlike> in 15505, we would retry only from outbound peers, and not look at the originially scheduled process time but try to request from those who had announced the TX (whenever that was) with the same probability.

13:22

<gleb> jnewbery: I think g_already_asked_for is time of the previous request, and we only care it's not something in future, so should be fine? Why would it ever be in future and prevent us from triggering?

13:22

<amiti> nehan_: I think these questions are great! questioning the fundamental WHY of a PR is a great approach :)

13:22

<sipa> nehan_: certainly a blanket "reduce tx propagation times" would also be terrible for privacy; it's just that the delays involved here are potentially far higher than the time delays we normally target for privacy protection

13:22

<sipa> (which is in the order of seconds)

13:22

<nehan_> sipa: but the original justification for this from sdaftuar was not reducing delays, it was not turning small mempool nodes into dos'ers

13:23

<nehan_> https://github.com/bitcoin/bitcoin/pull/17303#issuecomment-547589047

13:23

<gleb> jnewbery: now that I look at 'minus GETDATA_TX_INTERVAL', I think you're right...

13:24

<ariard> gleb: because it's previous request time - GETDATA_TX_INTERVAL, so this would prevent new retry if NOTFOUND sequence happens less than 1 min ago

13:24

<jnewbery> gleb: because in SendMessages(), we'll only send a GETDATA for transactions that were last requested over a minute ago: https://github.com/bitcoin/bitcoin/blob/60a39a96fc04c9bde98f0a55c0deb2a0b3fc25a7/src/net_processing.cpp#L4048

13:24

<ariard> there is 2 times check one against tx_process_time and on against g_already_asked_for

13:24

<gleb> yes, good catch! This is a good justification to have a test there :) I guess me and aj just missed this part.

13:25

<sipa> nehan_: right; the problem (going from memory here, haven't checked the comments lately) is that our desire to get rid of mapRelay would trigger requested transactions not being found in honest situations a lot more, effectively hurting tx propagation (by delaying things in the order of minutes)

13:26

<amiti> ok, moving forward... can someone explain in human words what information `TxDownloadState` keeps track of?

13:26

<sipa> what i meant is that there is nuance: short delays are not a problem, and (in some settings) even desirable for privacy - but very big ones actually hurt propagation

13:26

<amiti> and how it gets changed by this PR?

13:28

<jnewbery> nehan_: perhaps more context on 17303 is that mapRelay doesn't have great memory bounds, so we'd like to remove it. Suhas tried to use it for improving tx relay privacy in https://github.com/bitcoin/bitcoin/pull/14220, but had to abandon that approach because it would have made memory exhaustion possible

13:28

<nehan_> sipa: so "node with small mempool being a dos'er" actually means "node forcing me to wait a long time before getting this txn"? i see. i think i misunderstood and thought the small mempool nodes were actually sending more messages in some way.

13:28

<nehan_> jnewbery: thanks!

13:29

<gleb> nehan_: yeah, sipa's explanation couple messages ago also worked much better for the than that I read somewhere before.

13:29

<sipa> nehan_: that's my understanding

13:29

<amiti> I spent a bunch of time trying to understand this struct & how its used. its fundamental to the coordination of requesting txns from different peers, so is central to this implementation

13:29

<amiti> nehan_, sipa: thats my understanding too

13:30

<michaelfolkson> amiti: So it is a struct to manage the timing of when to download the announced transactions

13:30

<ariard> amiti: a per-peer state machine to manage tx relay bundled with timers ?

13:30

<jnewbery> amiti: TxDownloadState (like all things) started small, and then got more bloated and had its members start overlapping in meaning

13:30

<amiti> cool! that all sounds right

13:31

<amiti> so what is this PR changing?

13:31

<jnewbery> ariard's answer is better than mine

13:31

<gleb> Yeah, that overlapping is not something I saw much in software before, but I have an intuition that it might be the best way in this particular case.

100

13:31

<amiti> gleb: can you explain the overlapping?

101

13:31

<gleb> I have a big desire to get rid of the overlapping of maps, but i couldn't find a way to do it well.

102

13:31

<amiti> are you talking about the one introduced in this PR?

103

13:31

<gleb> ye

104

13:31

<amiti> can you make it explicit

105

13:32

<gleb> Well, we have tx->time, and time->tx maps... Let me pull it up again

106

13:32

<amiti> yup exactly, `m_tx_process_time` is a multimap of time -> tx

107

13:33

<gleb> m_tx_announced, and m_tx_in_flight often go together and we do similar things to them, for example.

108

13:33

<jnewbery> I think there are three states that a tx can be in: (i) the peer has announced it and we haven't requested it from that peer; (ii) the peer has announced it, we've requested it and are waiting for delivery; (iii) the peer announced it, we requested it, and it timed out

109

13:33

<amiti> and this PR is turning `m_tx_announced` into a map of tx -> time

110

13:34

<amiti> ah, and for clarity, when I say time, I mean `process_time`

111

13:34

<ariard> yes but here introducing time tracking in m_tx_announced is just now to select a peer for retry

112

13:35

<sipa> it may be useful to write a consistency-check function for all these data structures, and have a command-line option that causes it to be run occasionally (similar to -checkmempool)

113

13:35

<sipa> unless they can be simplified of course, which is always preferable

114

13:35

<gleb> yeah, good idea.

115

13:35

<jnewbery> sipa: or turn this into a class and have an interface to that class rather than code all over the place reaching in and manipulating the members

116

13:35

<sipa> jnewbery: yup

117

13:36

<sipa> absolutely

118

13:36

<amiti> sipa: oh interesting. I'm not familiar. so the idea is a user can invoke a cli option that kicks off periodic consistency checks?

119

13:36

<jnewbery> amiti: see -checkmempool

120

13:37

<jkczyz> I tend to agree that making the code explicit (and hopefully simpler) is preferable over dependencies between fields and explanatory comments

121

13:37

<amiti> I'm definitely in favor of simplifying the code. there's a lot of implicit expectations that require additional code to support when writing features, but are a bug if they occur

122

13:37

<sipa> amiti: see CTxMempool::check in particular

123

13:37

<jnewbery> (and nCheckFrequency in txmempool.cpp|h)

124

13:38

<sipa> it's useful too for reviewers, as it spells out explicitly what expectations there are

125

13:38

<sipa> though i wouldn't call that function in particular very readable...

126

13:38

<amiti> for example, `m_tx_process_time` is a multimap, so can technically be a many to many, but txns should only be queued for one time in the future (per node), so this really should be a many to one

127

13:38

<amiti> (👆🏽 fact check?)

128

13:39

<sipa> that sounds easy to verify, if true

129

13:39

<jnewbery> amiti: that should be true. If a tx appears more than once in m_tx_process_time, then there's a logic bug somewhere

130

13:40

<jonatack> for anyone looking, -checkmempool is a "hidden" debug option

131

13:40

<jonatack> bitcoind -help-debug | grep -A2 checkmempool

132

13:40

<gleb> Anyway, this probably should be a separate PR? There are probably a bunch of other consistency checks we may do. I would say the goal here should be to minimize cross-var dependencies.

133

13:40

<sipa> gleb: agreed

134

13:41

<amiti> ya, good point

135

13:41

<amiti> ok lets jump to another question about this PR

136

13:41

<amiti> what are potential issues around this approach?

137

13:42

<ariard> jnewbery: about you're other comment on a swarm of dishonest inbounds delaying tx relay, that's assume there isn't at least a honest peer with a announcement time better

138

13:42

<ariard> than malicisus ones ?

139

13:42

<jnewbery> I think the class would be a container or {txid, state (i, ii or iii above), timestamp} with indexes into those objects

140

13:42

<amiti> gleb: you and aj had some discussion about a DoS vector. can anybody (gleb or otherwise) explain the concern here?

141

13:43

<gleb> ariard: can you send a link to the comment? I can't find anything by searching for "swarm" lol

142

13:43

<ariard> gleb: https://github.com/bitcoin/bitcoin/pull/18238#discussion_r397928486

143

13:43

<ariard> isn't only retrying with outbounds would be better to avoid this kind of attack or more severe ones ?

144

13:44

<jnewbery> ariard: the honest peer's time to retry gets reset here: https://github.com/bitcoin/bitcoin/blob/60a39a96fc04c9bde98f0a55c0deb2a0b3fc25a7/src/net_processing.cpp#L4063

145

13:44

<gleb> I think nullifying helps with not juggling?

146

13:44

<gleb> This comparison to chrono::zero helps to make sure we don't force-request from the already NOTFOUNDed peer.

147

13:44

<jnewbery> so as long as the adversary is continuing to juggle NOTFOUNDs/INVs, then he might be able to prevent the victim ever requesting the tx from the honest node

148

13:45

<amiti> ok time check- 15 minutes left

149

13:45

<ariard> jnewbery: okay and by constantly re-inving attack, he is sure to always trump honest peers

150

13:46

<jnewbery> gleb: comment is here: https://github.com/bitcoin/bitcoin/pull/18238#discussion_r397928486. The m_tx_announced is not nulled for peers that send NOTFOUND

151

13:46

<amiti> just want to call out that this has turned into a fairly advanced / deep divey session, and if anybody is lurking and unclear on any fundamentals, I encourage speaking up and ask your questions. I bet you're not the only one wondering :)

152

13:46

<jnewbery> amiti: +1

153

13:47

<ariard> okay but what not requesting only from outbounds? As initial approach in 15505

154

13:48

<michaelfolkson> I'll throw a high level question into the mix. Is it worth trying to understand more about your peers e.g. whether they are pruned/unpruned, their mempool policy etc. The approach as I understand it is to just treat all your peers as if you know nothing about them

155

13:48

<jonatack> Great stuff, amiti. I think this discussion is going to be very helpful for reviewing the PR.

156

13:48

<ariard> I expect outbounds to have mempool of a consistant-size, given they already offer bandwidth/ports to the network

157

13:48

<gleb> jnewbery: i think you are right. I was thinking about the same issue, but thought it was handled for wrong reason i guess.

158

13:48

<fjahr> hi

159

13:48

<amiti> hi fjahr :)

160

13:48

<jnewbery> ariard: I guess in RetryProcessTx() we could change the 'best' heuristic to always prefer outbounds?

161

13:50

<jnewbery> michaelfolkson: pruned/unpruned doesn't make any difference here (we're talking about GETDATA requests for unconfirmed txs, not blocks)

162

13:50

<ariard> jnewbery: yes that's my point, but is doing so would achieve goal of PR, there is still worst-case hitting 1min window in case of really bad-connected outbounds?

163

13:50

<amiti> RE logic of the outbound vs the inbound, isn't there already a preference built in to when you initially schedule the process_time, and by finding the best time and bumping up, you'd be honoring the same order?

164

13:50

<amiti> am I missing something?

165

13:50

<jnewbery> ariard: no, in the case that no outbound peer has announced the tx, the best would be the best inbound

166

13:51

<jnewbery> amiti: see aj's comment here: https://github.com/bitcoin/bitcoin/pull/18238#discussion_r396884515

167

13:51

<ariard> jnewbery: so iterate first on outbound, then inbound, but a malicious inbound can still make you juggle indefinetely?

168

13:51

<gleb> amiti: yeah, but it's possible (probably very early in the overall relay) for inbounds to occupy fastest slots, and I guess there's worrying about that scenario to be exploited further.

169

13:51

<ariard> (likely the announced tx is a malicious one)

170

13:52

<jnewbery> because the NOTFOUND can arrive at any time, we might have rescheduled our outbounds to be after our inbounds here: https://github.com/bitcoin/bitcoin/blob/60a39a96fc04c9bde98f0a55c0deb2a0b3fc25a7/src/net_processing.cpp#L4063

171

13:53

<amiti> ah. right.

172

13:53

<jnewbery> ariard: I think by null'ing rather than deleting entries from m_tx_announced here: https://github.com/bitcoin/bitcoin/pull/18238#discussion_r397928486, the malicious peer can't juggle NOTFOUND/INVs

173

13:54

<gleb> We should absolutely prevent juggling. Once that is done, it's worth discussing other threats (which I think are probably minor)

174

13:54

<michaelfolkson> jnewbery: A NOTFOUND can be response for a transaction request from a block / confirmed transaction, just not related to this particular PR. Got it

175

13:54

<gleb> Like, an attacker would have to come really early in the relay, and the damage is InvBlock for a minute or two at most. I think that's desirable requirement.

176

13:55

<gleb> (similar to what is already possible)

177

13:57

<gleb> this PR turned out to be much more complicated than I anticipated, especially after all the reviews, triggered by this review club too :)

178

13:57

<amiti> 3 minutes left !

179

13:57

<amiti> yeah its suuuuper nuanced! effective tx download is HARD

180

13:58

<jkczyz> +1 IMHO, the PR exhibits why striving to reduce complexity (or hiding it behind an interface when not possible) is so important. Even dependencies between fields of a small struct can have an enormous impact on code understanding and review velocity, not to mention the potential for introducing bugs

181

13:58

<michaelfolkson> Still a bit confused. Why are there no "announcements" for confirmed transactions, transactions already in a block?

182

13:58

<jnewbery> jkczyz: absoluely agree!

183

13:58

<amiti> jkczyz: strong agree

184

13:58

<jonatack> jkczyz: ^ this