The PR branch HEAD was 17f2822c at the time of this review club meeting.
Notes
On startup, if a node’s headers chain is more than a day behind the current
time, it will pick one peer to sync headers with until its best headers chain
is caught up. (See
here)
If a node’s headers chain is not caught up within 15 min + ~num headers
missing * 1ms (either because the chosen peer was not able to or because
they purposefully stalled the download), it will
disconnect
the chosen peer for stalling and pick a new peer to sync headers with. Blocks
are only downloaded and validated after the node’s headers chain has
sufficient work (nMinimumChainWork) to protect against DoS attacks.
When a node
receives
a block announcement via an inv during headers sync, it will
add
all announcing peers as additional headers sync peers. PR
#25720 changes this logic
such that only one of the announcing peers is added for headers sync.
BIP 130
(implemented in PR #7129)
introduced the sendheaders message, which indicates that a node prefers
receiving block announcements via a headers message rather than an inv.
Why do nodes (mostly) receive inv block announcements while they are doing
initial headers sync, even though they indicated preference for headers
announcements (BIP 130)?
Why is bandwidth wasted (during initial headers sync) by adding all peers
that announce a block to us via an inv as headers sync peers?
What would be your estimate (lower/upper bound) of how much bandwidth is
wasted? (There is no one true answer here, the estimate depends on a couple
of variables)
What’s the purpose of CNodeState’s members fSyncStarted and
m_headers_sync_timeout, and PeerManagerImpl::nSyncStarted? If we start
syncing headers with peers that announce a block to us via an inv, why do
we not increase nSyncStarted and set fSyncStarted = true and update
m_headers_sync_timeout?
An alternative to the approach taken in the PR would be to add an additional
headers sync peer after a timeout (fixed or random). What is the benefit of
the approach taken in the PR over this alternative?
<dergoegge> Next question: Why do nodes (mostly) receive inv block announcements while they are doing initial headers sync, even though they indicated preference for headers announcements (BIP 130)?
<juancama> It’s important to blocks-first nodes that the blocks be requested and sent in order because each block header references the header hash of the preceding block. That means the IBD node can’t fully validate a block until its parent block has been received.
<Amirreza> @dergoegge, I mean during I'm syncing my headers, a new block can be mined, and why not other nodes notify me to get that newly mined block?
<dergoegge> the reason is that a node will not announce a new block to a peer using a headers message if the peer has not previously sent a header to which the new header connects to
<juancama> nodes (mostly) receive inv block announcements while they are doing initial headers sync bc block inventories show up in the inv message in the same order they appear in the chain, first inv message contains inventories for blocks 1 - 501. IBD node uses the received inventories to request 128 blocks from sync node in the “getdata” msg.
<dergoegge> Next question: Why is bandwidth wasted (during initial headers sync) by adding all peers that announce a block to us via an inv as headers sync peers?
<larryruane_> lightlike: thanks, I didn't know about that, but the attempted fix that had to be reverted in 2016 involved not just one peer but across multiple peers (like the current PR)
<dergoegge> next question: What would be your estimate (lower/upper bound) of how much bandwidth is wasted? (There is no one true answer here, the estimate depends on a couple of variables)
<lightlike> I think it's weird that we tie the act of adding more headers-sync peers to a random event (when a block is found). Sometimes it's happening multiple times within 10 minutes, sometimes it doesn't happen for an hour or so. Waiting for a fixed period (e.g. 10 minutes) before adding another peers would seem more natural to me.
<larryruane_> lightlike: I think it's always been more of an accidental kind of thing... I traced through this code once, and when we get a new block from a peer, that naturally begins a new "thread" of getheaders and headers reply sequence to and from that peer
<dergoegge> question 5: What’s the purpose of CNodeState’s members fSyncStarted and m_headers_sync_timeout, and PeerManagerImpl::nSyncStarted? If we start syncing headers with peers that announce a block to us via an inv, why do we not increase nSyncStarted and set fSyncStarted = true and update m_headers_sync_timeout?
<glozow> wait dergoegge, when you said "yeah once our header chain is close to today we request headers from all peers" were you saying that when we're not close to today we try fewer peers?
<dergoegge> glozow: when we are not close to today, we choose one peer to sync the headers from (and add additional peers if an inv announcement is received)
<larryruane_> if we're in that mode, not close to today, would it be good to round-robin among the peers that we request headers from? (instead of just one peer)?
<juancama> fsyncstarted tells us whether we've started headers synchronization with this peer, m_headers_sync_timeout tells us when to potentially disconnect peer for stalling headers download
<lightlike> dergoegge: Do I understand it correctly that it doesn't really matter if the our peer sends us nothing at all for 20 minutes, or 99% of the headers within that time - they will get disconnected after 20 minutes for stalling either way.
<hernanmarino> might it be the case that we only don trust our randomly chosen node, beacuse it will eventually be replaced if unresponsive, and we do not want to do that with the other peers we are adding with this new logic ?
<lightlike> dergoegge: I think it could make sense to also require some progress for them, and disconnecting them much earlier than 20min if they are just completely unresponsive.
<juancama> For the second part of question 5, do we not increase nSyncStarted and set fSyncStarted = true and update m_headers_sync_timeout if we start syncing headers with peers that announce a block to us via an inv because it would lead to even more wasted bandwidth?
<sipa> But it turned outs that (a) we actually do and (b) this is actually somewhat advantageous because it prevents a situation that the singular chosen nSyncStarted peer is malicious/broken/infinitely slow, and stalls your headers sync forever.
<sipa> (I can say this as the person who first introduced the headers syncing and nSyncStarted concept, until not too recently I wasn't actually aware that we'd start fetching headers from other peers as well if they'd announce a new block to us)
<larryruane_> Just for my historical understanding, it used to be that header and block download would proceed simultaneously (with headers obviously being ahead of blocks in height) ... then some years ago it was changed to download only headers until we're close to today, then start downloading blocks (this makes a lot more sense)
<larryruane_> (i say makes more sense because the old way, we could end up downloading a bunch of blocks that turn out not to be part of the best chain)
<dergoegge> lets use the last 5 minutes for approach discussion: An alternative to the approach taken in the PR would be to add an additional headers sync peer after a timeout (fixed or random). What is the benefit of the approach taken in the PR over this alternative?
<lightlike> I think one benefit is that the peer that manages to send us the block inv first is often also a very fast peer. So we'd not pick another slow peer if for some reason our initial peer is slow.