Retry notfounds with more urgency (p2p)

https://github.com/bitcoin/bitcoin/pulls/18238

Host: amitiuttarwar  -  PR author: ajtowns

The PR branch HEAD was a204d158 at the time of this review club meeting.

Notes

  • Transaction relay is a three step process: INV -> GETDATA -> TX:
    1. the relaying node sends an INV message to the receiving node to announce a new transaction.
    2. the receiving node sends a GETDATA message to the relaying node to request the full transaction.
    3. the relaying node delivers a TX message to the receiving node. If the relaying node is no longer able to deliver the transaction, it responds with NOTFOUND instead of the TX.

    You can learn more about these messages here.

  • A node will not have more than one GETDATA in flight for a particular transaction. The TxDownloadState struct is used to manage the state of transactions for each peer. This struct has a very descriptive code comment that gives an overview of how transaction download is managed.

  • If a GETDATA request is in flight, a node will wait up to 2 minutes before timing out and attempting to request the transaction from another peer.

  • Currently, if a node receives a NOTFOUND message, it does not use that information to try to speed up requesting it from another peer. Instead, it continues to wait until the 2 minute period is over before sending out another GETDATA.

    This PR introduces logic to speed up a request to another peer when possible.

  • PR 15505 was a previous attempt at implementing a solution and was covered in PR Review Club 15505. Reading the notes / logs / PR can help shine light on some of the nuances that make this feature tricky to implement.

  • Having this retry speedup is mentioned as a pre-req for PR 17303. That PR is trying to remove mapRelay, a data structure that keeps track of recently announced transactions. Doing so would increase the likelihood of receiving a NOTFOUND message. For example, nodes with small mempools could announce a transaction, but then evict it before receiving the GETDATA. Without logic for the receiving node to quickly retry, the node with a small mempool could be wasting bandwidth and accidentally DoS-ing its peer.

Questions

  1. Did you review the PRs? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)

  2. What are some scenarios that lead to NOTFOUND messages?

  3. How does this PR implement the logic to speed up retries?

  4. Upon receiving a NOTFOUND, why doesn’t the node immediately request the transaction from another peer? What does it do instead?

  5. In human words, what information does TxDownloadState keep track of? What does this PR change?

  6. PR 15505 review comments discussed various approaches for identifying which peer to request for the transaction after receiving a NOTFOUND. How does PR 18238 answer that question?

  7. Why do we treat inbound and outbound peers differently? How does this PR manage preferential treatment?

  8. Zooming back out to a birds-eye view, what are potential issues that could be introduced with this approach? What are other possiblities for how transaction relay could handle unresponsive peers?

Meeting Log

  113:00 <amiti> #startmeeting
  213:00 <jnewbery> hi
  313:00 <gleb> hi!
  413:00 <jkczyz> hi
  513:00 <felixweis> hi
  613:00 <lightlike> hi
  713:00 <amiti> hi everyone! welcome to this weeks edition of PR review club
  813:00 <nehan_> hi
  913:01 <sipa> hi
 1013:01 <amiti> this weeks PR is ... quite a doozy!
 1113:01 <r251d> hi
 1213:01 <michaelfolkson> hi
 1313:01 <amiti> who has gotten a chance to review the pr? y/n
 1413:02 <gleb> y, I was ready to ACK module little things I asked about.
 1513:02 <jnewbery> 0.7 y
 1613:02 <lightlike> y
 1713:02 <jkczyz> y (briefly)
 1813:03 <michaelfolkson> y (also briefly)
 1913:03 <jonatack> hi
 2013:03 <nehan_> y but also briefly
 2113:03 <amiti> what are initial impressions?
 2213:03 <amiti> I saw some of you left concept ACKs on the PR
 2313:04 <nehan_> i'm a bit unclear on why this is important
 2413:05 <jnewbery> Concept ACK! I think I prefer the approach over #15505, which adds a counter to net
 2513:05 <amiti> can anyone shed light on some context that makes this PR relevant / helpful?
 2613:05 <lightlike> I like it better than the approach in the predecessor because it is less invasive.
 2713:06 <nehan_> what is the downside of waiting until the txn request times out to request from another peer?
 2813:06 <andrewtoth> hi
 2913:06 <jkczyz> Generally like the idea but implementation-wise would prefer if m_tx_announced did not have a dual use, if possible
 3013:07 <michaelfolkson> nehan: You'd want to get the transaction as fast as possible for performance reasons I'd guess. Especially if you're a miner
 3113:07 <amiti> nehan_: one explicit downside comes if we implement the changes in PR #17303, which is trying to remove mapRelay
 3213:07 <sipa> nehan_: are you asking why avoiding a 2 minute delay in tx propagation is desirable?
 3313:07 <gleb> Let's put it this way. How long would we wait before this PR even if NOTFOUND is received?
 3413:08 <nehan_> michaelfolk: amiti: sipa: thanks!
 3513:08 <nehan_> sipa: yes, in the cases this PR affects. perhaps this is obvious to everyone but me!
 3613:08 <jkczyz> nehan_: also some privacy benefits from 17303 which this PR is a prerequisite for
 3713:08 <lightlike> not wasting time seems a good idea in itself - I didn't completely understand the DOS connection with PR#17303, can someone explain this a bit more?
 3813:08 <sipa> if we were talking about a few seconds, this would probably be harmless, as our normal tx propagation mechanisms has delays of that order
 3913:09 <sipa> but minutes beings it in the same order of magnitude as blocks, so it very well may interfere with block propagation too (as bip 152 relay relies on having good tx propagation)
 4013:10 <jnewbery> lightlike: we currently keep a mapRelay of transactions we've recently announced. Even if a transaction is subsequently removed from the mempool (for expiry or some other reason), it's still available to be fetched by peers we've announced it to
 4113:10 <gleb> It's not super-clear it's 2 minutes though. I find the existing comment confusing.
 4213:10 <amiti> lightlike: my understanding is... mapRelay has a different logic for transactions it keeps than the mempool logic. so, in the case of a mempool with lots of tx churn, mapRelay might keep the txn for longer. if we remove it, then nodes with small mempools could more frequently be responding with `NOTFOUND`, and tying up the connection for upto the 2 minute period even though behaving honestly
 4313:11 <gleb> The comment says: "request unlike it's inflight from someone", and current code ALREADY cleans inflight when NOTFOUND is sent.
 4413:11 <jnewbery> 17303 proposed to remove this mapRelay. That means if the tx is removed from the mempool between the INV announcement and GETDATA request, then the node announcing it would have prevented the receiving node from fetching it from elsewhere for a minute
 4513:11 <gleb> While that comment says that currently, it's implicit, and the logic actually doesn't look at "inflight" variable
 4613:12 <sipa> amiti: that's my my understanding too
 4713:12 <lightlike> ah ok, thanks jnewbery, amiti
 4813:12 <sipa> mapRelay tries to implement a policy of "we promise to keep everything we've relayed recently available for fetching, regardless of what the mempool does"
 4913:13 <amiti> also, there's a lot of nuanced logic of how nodes are queued via the `process_time` that I'm just starting to wrap my head around... I'm interested in better familiarizing myself the implementation to understand different attack possibilities
 5013:13 <sipa> mapRelay predates the limited-size mempool though; i think its original purpose was making sure txn could be downloaded still even when a block confirmed them in the mean time
 5113:14 <amiti> so, can someone summarize this PR? how does it speed up retries?
 5213:14 <jnewbery> mapRelay existed in the initial git commit
 5313:15 <ariard> it speeds up retries by a) instead of relaying on GETDDATA_TX_INTERVAL for a non-received transaction and b) assuming we receive a NOTFOUND, we're gong to retry download with other peers
 5413:16 <andrewtoth> ajtowns' header comment is a good summary: Anytime we see a NOTFOUND in response to a request for a tx, look through
 5513:16 <andrewtoth> each of our peers for anyone else who announced the tx, find one who
 5613:16 <andrewtoth> doesn't already have its inflight tx count maxed out, and of those,
 5713:16 <andrewtoth> make the one who'd look at it first, look at it asap.
 5813:17 <jonatack> nehan_: thanks for your question; i reckon a non-trivial number of observers wondered the same or appreciated confirmation on the answers
 5913:17 <michaelfolkson> Yup definitely jonatack nehan_
 6013:18 <sipa> andrewtoth: so effectively, receiving a NOTFOUND makes us skip the timeout
 6113:18 <amiti> ariard, andrewtoth, sipa: exactly
 6213:18 <amiti> andrewtoth: you already mentioned this / OP mentions, but can someone make extra clear- how does this PR identify who to ask next for the txn?
 6313:19 <jnewbery> I was confused about how the 'asap' part is implemented. Comment here: https://github.com/bitcoin/bitcoin/pull/18238#discussion_r397925056
 6413:19 <amiti> bonus points: how is this different than the previous attempt at this functionality? PR 15505
 6513:19 <jnewbery> it brings forward the m_tx_process_time, but doesn't change the g_already_asked_for time so I don't know how a new GETDATA request is triggered
 6613:19 <nehan_> jonatack: you're welcome! i get why this is better than #15505 and why one might want something like it for #17303 but i'm not yet convinced #17303 is that important. and "always reduce tx propagation times" doesn't seem like a good metric to use without considering 1) when things are actually slowed down and 2) additional code complexity.
 6713:20 <nehan_> nehan_: but i don't want to distract from actually reviewing the PR!
 6813:20 <michaelfolkson> We're requesting the tx from a peer who announced the tx
 6913:21 <michaelfolkson> Oh that's the same PR 15505
 7013:21 <lightlike> in 15505, we would retry only from outbound peers, and not look at the originially scheduled process time but try to request from those who had announced the TX (whenever that was) with the same probability.
 7113:22 <gleb> jnewbery: I think g_already_asked_for is time of the previous request, and we only care it's not something in future, so should be fine? Why would it ever be in future and prevent us from triggering?
 7213:22 <amiti> nehan_: I think these questions are great! questioning the fundamental WHY of a PR is a great approach :)
 7313:22 <sipa> nehan_: certainly a blanket "reduce tx propagation times" would also be terrible for privacy; it's just that the delays involved here are potentially far higher than the time delays we normally target for privacy protection
 7413:22 <sipa> (which is in the order of seconds)
 7513:22 <nehan_> sipa: but the original justification for this from sdaftuar was not reducing delays, it was not turning small mempool nodes into dos'ers
 7613:23 <nehan_> https://github.com/bitcoin/bitcoin/pull/17303#issuecomment-547589047
 7713:23 <gleb> jnewbery: now that I look at 'minus GETDATA_TX_INTERVAL', I think you're right...
 7813:24 <ariard> gleb: because it's previous request time - GETDATA_TX_INTERVAL, so this would prevent new retry if NOTFOUND sequence happens less than 1 min ago
 7913:24 <jnewbery> gleb: because in SendMessages(), we'll only send a GETDATA for transactions that were last requested over a minute ago: https://github.com/bitcoin/bitcoin/blob/60a39a96fc04c9bde98f0a55c0deb2a0b3fc25a7/src/net_processing.cpp#L4048
 8013:24 <ariard> there is 2 times check one against tx_process_time and on against g_already_asked_for
 8113:24 <gleb> yes, good catch! This is a good justification to have a test there :) I guess me and aj just missed this part.
 8213:25 <sipa> nehan_: right; the problem (going from memory here, haven't checked the comments lately) is that our desire to get rid of mapRelay would trigger requested transactions not being found in honest situations a lot more, effectively hurting tx propagation (by delaying things in the order of minutes)
 8313:26 <amiti> ok, moving forward... can someone explain in human words what information `TxDownloadState` keeps track of?
 8413:26 <sipa> what i meant is that there is nuance: short delays are not a problem, and (in some settings) even desirable for privacy - but very big ones actually hurt propagation
 8513:26 <amiti> and how it gets changed by this PR?
 8613:28 <jnewbery> nehan_: perhaps more context on 17303 is that mapRelay doesn't have great memory bounds, so we'd like to remove it. Suhas tried to use it for improving tx relay privacy in https://github.com/bitcoin/bitcoin/pull/14220, but had to abandon that approach because it would have made memory exhaustion possible
 8713:28 <nehan_> sipa: so "node with small mempool being a dos'er" actually means "node forcing me to wait a long time before getting this txn"? i see. i think i misunderstood and thought the small mempool nodes were actually sending more messages in some way.
 8813:28 <nehan_> jnewbery: thanks!
 8913:29 <gleb> nehan_: yeah, sipa's explanation couple messages ago also worked much better for the than that I read somewhere before.
 9013:29 <sipa> nehan_: that's my understanding
 9113:29 <amiti> I spent a bunch of time trying to understand this struct & how its used. its fundamental to the coordination of requesting txns from different peers, so is central to this implementation
 9213:29 <amiti> nehan_, sipa: thats my understanding too
 9313:30 <michaelfolkson> amiti: So it is a struct to manage the timing of when to download the announced transactions
 9413:30 <ariard> amiti: a per-peer state machine to manage tx relay bundled with timers ?
 9513:30 <jnewbery> amiti: TxDownloadState (like all things) started small, and then got more bloated and had its members start overlapping in meaning
 9613:30 <amiti> cool! that all sounds right
 9713:31 <amiti> so what is this PR changing?
 9813:31 <jnewbery> ariard's answer is better than mine
 9913:31 <gleb> Yeah, that overlapping is not something I saw much in software before, but I have an intuition that it might be the best way in this particular case.
10013:31 <amiti> gleb: can you explain the overlapping?
10113:31 <gleb> I have a big desire to get rid of the overlapping of maps, but i couldn't find a way to do it well.
10213:31 <amiti> are you talking about the one introduced in this PR?
10313:31 <gleb> ye
10413:31 <amiti> can you make it explicit
10513:32 <gleb> Well, we have tx->time, and time->tx maps... Let me pull it up again
10613:32 <amiti> yup exactly, `m_tx_process_time` is a multimap of time -> tx
10713:33 <gleb> m_tx_announced, and m_tx_in_flight often go together and we do similar things to them, for example.
10813:33 <jnewbery> I think there are three states that a tx can be in: (i) the peer has announced it and we haven't requested it from that peer; (ii) the peer has announced it, we've requested it and are waiting for delivery; (iii) the peer announced it, we requested it, and it timed out
10913:33 <amiti> and this PR is turning `m_tx_announced` into a map of tx -> time
11013:34 <amiti> ah, and for clarity, when I say time, I mean `process_time`
11113:34 <ariard> yes but here introducing time tracking in m_tx_announced is just now to select a peer for retry
11213:35 <sipa> it may be useful to write a consistency-check function for all these data structures, and have a command-line option that causes it to be run occasionally (similar to -checkmempool)
11313:35 <sipa> unless they can be simplified of course, which is always preferable
11413:35 <gleb> yeah, good idea.
11513:35 <jnewbery> sipa: or turn this into a class and have an interface to that class rather than code all over the place reaching in and manipulating the members
11613:35 <sipa> jnewbery: yup
11713:36 <sipa> absolutely
11813:36 <amiti> sipa: oh interesting. I'm not familiar. so the idea is a user can invoke a cli option that kicks off periodic consistency checks?
11913:36 <jnewbery> amiti: see -checkmempool
12013:37 <jkczyz> I tend to agree that making the code explicit (and hopefully simpler) is preferable over dependencies between fields and explanatory comments
12113:37 <amiti> I'm definitely in favor of simplifying the code. there's a lot of implicit expectations that require additional code to support when writing features, but are a bug if they occur
12213:37 <sipa> amiti: see CTxMempool::check in particular
12313:37 <jnewbery> (and nCheckFrequency in txmempool.cpp|h)
12413:38 <sipa> it's useful too for reviewers, as it spells out explicitly what expectations there are
12513:38 <sipa> though i wouldn't call that function in particular very readable...
12613:38 <amiti> for example, `m_tx_process_time` is a multimap, so can technically be a many to many, but txns should only be queued for one time in the future (per node), so this really should be a many to one
12713:38 <amiti> (👆🏽 fact check?)
12813:39 <sipa> that sounds easy to verify, if true
12913:39 <jnewbery> amiti: that should be true. If a tx appears more than once in m_tx_process_time, then there's a logic bug somewhere
13013:40 <jonatack> for anyone looking, -checkmempool is a "hidden" debug option
13113:40 <jonatack> bitcoind -help-debug | grep -A2 checkmempool
13213:40 <gleb> Anyway, this probably should be a separate PR? There are probably a bunch of other consistency checks we may do. I would say the goal here should be to minimize cross-var dependencies.
13313:40 <sipa> gleb: agreed
13413:41 <amiti> ya, good point
13513:41 <amiti> ok lets jump to another question about this PR
13613:41 <amiti> what are potential issues around this approach?
13713:42 <ariard> jnewbery: about you're other comment on a swarm of dishonest inbounds delaying tx relay, that's assume there isn't at least a honest peer with a announcement time better
13813:42 <ariard> than malicisus ones ?
13913:42 <jnewbery> I think the class would be a container or {txid, state (i, ii or iii above), timestamp} with indexes into those objects
14013:42 <amiti> gleb: you and aj had some discussion about a DoS vector. can anybody (gleb or otherwise) explain the concern here?
14113:43 <gleb> ariard: can you send a link to the comment? I can't find anything by searching for "swarm" lol
14213:43 <ariard> gleb: https://github.com/bitcoin/bitcoin/pull/18238#discussion_r397928486
14313:43 <ariard> isn't only retrying with outbounds would be better to avoid this kind of attack or more severe ones ?
14413:44 <jnewbery> ariard: the honest peer's time to retry gets reset here: https://github.com/bitcoin/bitcoin/blob/60a39a96fc04c9bde98f0a55c0deb2a0b3fc25a7/src/net_processing.cpp#L4063
14513:44 <gleb> I think nullifying helps with not juggling?
14613:44 <gleb> This comparison to chrono::zero helps to make sure we don't force-request from the already NOTFOUNDed peer.
14713:44 <jnewbery> so as long as the adversary is continuing to juggle NOTFOUNDs/INVs, then he might be able to prevent the victim ever requesting the tx from the honest node
14813:45 <amiti> ok time check- 15 minutes left
14913:45 <ariard> jnewbery: okay and by constantly re-inving attack, he is sure to always trump honest peers
15013:46 <jnewbery> gleb: comment is here: https://github.com/bitcoin/bitcoin/pull/18238#discussion_r397928486. The m_tx_announced is not nulled for peers that send NOTFOUND
15113:46 <amiti> just want to call out that this has turned into a fairly advanced / deep divey session, and if anybody is lurking and unclear on any fundamentals, I encourage speaking up and ask your questions. I bet you're not the only one wondering :)
15213:46 <jnewbery> amiti: +1
15313:47 <ariard> okay but what not requesting only from outbounds? As initial approach in 15505
15413:48 <michaelfolkson> I'll throw a high level question into the mix. Is it worth trying to understand more about your peers e.g. whether they are pruned/unpruned, their mempool policy etc. The approach as I understand it is to just treat all your peers as if you know nothing about them
15513:48 <jonatack> Great stuff, amiti. I think this discussion is going to be very helpful for reviewing the PR.
15613:48 <ariard> I expect outbounds to have mempool of a consistant-size, given they already offer bandwidth/ports to the network
15713:48 <gleb> jnewbery: i think you are right. I was thinking about the same issue, but thought it was handled for wrong reason i guess.
15813:48 <fjahr> hi
15913:48 <amiti> hi fjahr :)
16013:48 <jnewbery> ariard: I guess in RetryProcessTx() we could change the 'best' heuristic to always prefer outbounds?
16113:50 <jnewbery> michaelfolkson: pruned/unpruned doesn't make any difference here (we're talking about GETDATA requests for unconfirmed txs, not blocks)
16213:50 <ariard> jnewbery: yes that's my point, but is doing so would achieve goal of PR, there is still worst-case hitting 1min window in case of really bad-connected outbounds?
16313:50 <amiti> RE logic of the outbound vs the inbound, isn't there already a preference built in to when you initially schedule the process_time, and by finding the best time and bumping up, you'd be honoring the same order?
16413:50 <amiti> am I missing something?
16513:50 <jnewbery> ariard: no, in the case that no outbound peer has announced the tx, the best would be the best inbound
16613:51 <jnewbery> amiti: see aj's comment here: https://github.com/bitcoin/bitcoin/pull/18238#discussion_r396884515
16713:51 <ariard> jnewbery: so iterate first on outbound, then inbound, but a malicious inbound can still make you juggle indefinetely?
16813:51 <gleb> amiti: yeah, but it's possible (probably very early in the overall relay) for inbounds to occupy fastest slots, and I guess there's worrying about that scenario to be exploited further.
16913:51 <ariard> (likely the announced tx is a malicious one)
17013:52 <jnewbery> because the NOTFOUND can arrive at any time, we might have rescheduled our outbounds to be after our inbounds here: https://github.com/bitcoin/bitcoin/blob/60a39a96fc04c9bde98f0a55c0deb2a0b3fc25a7/src/net_processing.cpp#L4063
17113:53 <amiti> ah. right.
17213:53 <jnewbery> ariard: I think by null'ing rather than deleting entries from m_tx_announced here: https://github.com/bitcoin/bitcoin/pull/18238#discussion_r397928486, the malicious peer can't juggle NOTFOUND/INVs
17313:54 <gleb> We should absolutely prevent juggling. Once that is done, it's worth discussing other threats (which I think are probably minor)
17413:54 <michaelfolkson> jnewbery: A NOTFOUND can be response for a transaction request from a block / confirmed transaction, just not related to this particular PR. Got it
17513:54 <gleb> Like, an attacker would have to come really early in the relay, and the damage is InvBlock for a minute or two at most. I think that's desirable requirement.
17613:55 <gleb> (similar to what is already possible)
17713:57 <gleb> this PR turned out to be much more complicated than I anticipated, especially after all the reviews, triggered by this review club too :)
17813:57 <amiti> 3 minutes left !
17913:57 <amiti> yeah its suuuuper nuanced! effective tx download is HARD
18013:58 <jkczyz> +1 IMHO, the PR exhibits why striving to reduce complexity (or hiding it behind an interface when not possible) is so important. Even dependencies between fields of a small struct can have an enormous impact on code understanding and review velocity, not to mention the potential for introducing bugs
18113:58 <michaelfolkson> Still a bit confused. Why are there no "announcements" for confirmed transactions, transactions already in a block?
18213:58 <jnewbery> jkczyz: absoluely agree!
18313:58 <amiti> jkczyz: strong agree
18413:58 <jonatack> jkczyz: ^ this
18513:58 <sipa> michaelfolkson: the block is the announcement?
18613:59 <michaelfolkson> But a a node could be really behind and announce a transaction that everyone already knows about and has been included in a block?
18713:59 <jnewbery> michaelfolkson: transaction relay is for unconfirmed transactions. Confirmed transactions are included in the BLOCK message
18813:59 <nehan_> thanks amiti!
18914:00 <jnewbery> michaelfolkson: tranaction wouldn't be valid for nodes that are caught up because its inputs have been spent
19014:00 <amiti> alright folks! thats time
19114:00 <ariard> michaelfolkson: see g_recent_confirmed_transactions, to avoid reqquesting tx already confirmed
19214:00 <amiti> #endmeeting
19314:00 <amiti> thank you all :)
19414:00 <jnewbery> thanks amiti. That was fun!
19514:00 <lightlike> thanks!
19614:00 <michaelfolkson> Cool, thanks fro the answers
19714:00 <michaelfolkson> *for
19814:00 <amiti> that was a fun discussion. I have a LOT of questions to go investigate :)
19914:00 <michaelfolkson> And thanks amiti. Great hosting
20014:01 <sipa> thanks amiti!
20114:01 <jonatack> thanks amiti and everyone!
20214:01 <jkczyz> amiti: thanks! Learning a lot
20314:01 <felixweis> thanks!
20414:01 <emzy> Thanks amiti and everyone!
20514:03 <sipa> [A
20614:04 <andrewtoth> thanks amiti and everyone!