A node will not have more than one GETDATA in flight for a particular
transaction. The TxDownloadState struct
is used to manage the state of transactions for each peer. This struct has a
very descriptive code comment that gives an overview of how transaction download
If a GETDATA request is in flight, a node will wait up to 2 minutes before timing
out and attempting to request the transaction from another peer.
Currently, if a node receives a NOTFOUND message, it does not use that
information to try to speed up requesting it from another peer. Instead, it
continues to wait until the 2 minute period is over before sending out another
This PR introduces logic to speed up a request to another peer when possible.
PR 15505 was a previous attempt
at implementing a solution and was covered in PR Review Club
15505. Reading the notes / logs /
PR can help shine light on some of the nuances that make this feature
tricky to implement.
Having this retry speedup is mentioned as a pre-req for PR 17303.
That PR is trying to remove mapRelay, a data structure that keeps track of
recently announced transactions. Doing so would increase the likelihood of
receiving a NOTFOUND message. For example, nodes with small mempools could announce a
transaction, but then evict it before receiving the GETDATA. Without logic
for the receiving node to quickly retry, the node with a small mempool could be wasting
bandwidth and accidentally DoS-ing its peer.
<jnewbery> lightlike: we currently keep a mapRelay of transactions we've recently announced. Even if a transaction is subsequently removed from the mempool (for expiry or some other reason), it's still available to be fetched by peers we've announced it to
<amiti> lightlike: my understanding is... mapRelay has a different logic for transactions it keeps than the mempool logic. so, in the case of a mempool with lots of tx churn, mapRelay might keep the txn for longer. if we remove it, then nodes with small mempools could more frequently be responding with `NOTFOUND`, and tying up the connection for upto the 2 minute period even though behaving honestly
<jnewbery> 17303 proposed to remove this mapRelay. That means if the tx is removed from the mempool between the INV announcement and GETDATA request, then the node announcing it would have prevented the receiving node from fetching it from elsewhere for a minute
<amiti> also, there's a lot of nuanced logic of how nodes are queued via the `process_time` that I'm just starting to wrap my head around... I'm interested in better familiarizing myself the implementation to understand different attack possibilities
<nehan_> jonatack: you're welcome! i get why this is better than #15505 and why one might want something like it for #17303 but i'm not yet convinced #17303 is that important. and "always reduce tx propagation times" doesn't seem like a good metric to use without considering 1) when things are actually slowed down and 2) additional code complexity.
<lightlike> in 15505, we would retry only from outbound peers, and not look at the originially scheduled process time but try to request from those who had announced the TX (whenever that was) with the same probability.
<gleb> jnewbery: I think g_already_asked_for is time of the previous request, and we only care it's not something in future, so should be fine? Why would it ever be in future and prevent us from triggering?
<sipa> nehan_: certainly a blanket "reduce tx propagation times" would also be terrible for privacy; it's just that the delays involved here are potentially far higher than the time delays we normally target for privacy protection
<sipa> nehan_: right; the problem (going from memory here, haven't checked the comments lately) is that our desire to get rid of mapRelay would trigger requested transactions not being found in honest situations a lot more, effectively hurting tx propagation (by delaying things in the order of minutes)
<jnewbery> nehan_: perhaps more context on 17303 is that mapRelay doesn't have great memory bounds, so we'd like to remove it. Suhas tried to use it for improving tx relay privacy in https://github.com/bitcoin/bitcoin/pull/14220, but had to abandon that approach because it would have made memory exhaustion possible
<nehan_> sipa: so "node with small mempool being a dos'er" actually means "node forcing me to wait a long time before getting this txn"? i see. i think i misunderstood and thought the small mempool nodes were actually sending more messages in some way.
<jnewbery> I think there are three states that a tx can be in: (i) the peer has announced it and we haven't requested it from that peer; (ii) the peer has announced it, we've requested it and are waiting for delivery; (iii) the peer announced it, we requested it, and it timed out
<amiti> for example, `m_tx_process_time` is a multimap, so can technically be a many to many, but txns should only be queued for one time in the future (per node), so this really should be a many to one
<amiti> just want to call out that this has turned into a fairly advanced / deep divey session, and if anybody is lurking and unclear on any fundamentals, I encourage speaking up and ask your questions. I bet you're not the only one wondering :)
<michaelfolkson> I'll throw a high level question into the mix. Is it worth trying to understand more about your peers e.g. whether they are pruned/unpruned, their mempool policy etc. The approach as I understand it is to just treat all your peers as if you know nothing about them
<amiti> RE logic of the outbound vs the inbound, isn't there already a preference built in to when you initially schedule the process_time, and by finding the best time and bumping up, you'd be honoring the same order?
<jkczyz> +1 IMHO, the PR exhibits why striving to reduce complexity (or hiding it behind an interface when not possible) is so important. Even dependencies between fields of a small struct can have an enormous impact on code understanding and review velocity, not to mention the potential for introducing bugs