The PR branch HEAD was 17f2822c at the time of this review club meeting.
On startup, if a node’s headers chain is more than a day behind the current
time, it will pick one peer to sync headers with until its best headers chain
is caught up. (See
If a node’s headers chain is not caught up within 15 min + ~num headers
missing * 1ms (either because the chosen peer was not able to or because
they purposefully stalled the download), it will
the chosen peer for stalling and pick a new peer to sync headers with. Blocks
are only downloaded and validated after the node’s headers chain has
sufficient work (nMinimumChainWork) to protect against DoS attacks.
When a node
a block announcement via an inv during headers sync, it will
all announcing peers as additional headers sync peers. PR
#25720 changes this logic
such that only one of the announcing peers is added for headers sync.
(implemented in PR #7129)
introduced the sendheaders message, which indicates that a node prefers
receiving block announcements via a headers message rather than an inv.
Why do nodes (mostly) receive inv block announcements while they are doing
initial headers sync, even though they indicated preference for headers
announcements (BIP 130)?
Why is bandwidth wasted (during initial headers sync) by adding all peers
that announce a block to us via an inv as headers sync peers?
What would be your estimate (lower/upper bound) of how much bandwidth is
wasted? (There is no one true answer here, the estimate depends on a couple
What’s the purpose of CNodeState’s members fSyncStarted and
m_headers_sync_timeout, and PeerManagerImpl::nSyncStarted? If we start
syncing headers with peers that announce a block to us via an inv, why do
we not increase nSyncStarted and set fSyncStarted = true and update
An alternative to the approach taken in the PR would be to add an additional
headers sync peer after a timeout (fixed or random). What is the benefit of
the approach taken in the PR over this alternative?
<juancama> It’s important to blocks-first nodes that the blocks be requested and sent in order because each block header references the header hash of the preceding block. That means the IBD node can’t fully validate a block until its parent block has been received.
<juancama> nodes (mostly) receive inv block announcements while they are doing initial headers sync bc block inventories show up in the inv message in the same order they appear in the chain, first inv message contains inventories for blocks 1 - 501. IBD node uses the received inventories to request 128 blocks from sync node in the “getdata” msg.
<lightlike> I think it's weird that we tie the act of adding more headers-sync peers to a random event (when a block is found). Sometimes it's happening multiple times within 10 minutes, sometimes it doesn't happen for an hour or so. Waiting for a fixed period (e.g. 10 minutes) before adding another peers would seem more natural to me.
<larryruane_> lightlike: I think it's always been more of an accidental kind of thing... I traced through this code once, and when we get a new block from a peer, that naturally begins a new "thread" of getheaders and headers reply sequence to and from that peer
<dergoegge> question 5: What’s the purpose of CNodeState’s members fSyncStarted and m_headers_sync_timeout, and PeerManagerImpl::nSyncStarted? If we start syncing headers with peers that announce a block to us via an inv, why do we not increase nSyncStarted and set fSyncStarted = true and update m_headers_sync_timeout?
<lightlike> dergoegge: Do I understand it correctly that it doesn't really matter if the our peer sends us nothing at all for 20 minutes, or 99% of the headers within that time - they will get disconnected after 20 minutes for stalling either way.
<hernanmarino> might it be the case that we only don trust our randomly chosen node, beacuse it will eventually be replaced if unresponsive, and we do not want to do that with the other peers we are adding with this new logic ?
<juancama> For the second part of question 5, do we not increase nSyncStarted and set fSyncStarted = true and update m_headers_sync_timeout if we start syncing headers with peers that announce a block to us via an inv because it would lead to even more wasted bandwidth?
<sipa> But it turned outs that (a) we actually do and (b) this is actually somewhat advantageous because it prevents a situation that the singular chosen nSyncStarted peer is malicious/broken/infinitely slow, and stalls your headers sync forever.
<sipa> (I can say this as the person who first introduced the headers syncing and nSyncStarted concept, until not too recently I wasn't actually aware that we'd start fetching headers from other peers as well if they'd announce a new block to us)
<larryruane_> Just for my historical understanding, it used to be that header and block download would proceed simultaneously (with headers obviously being ahead of blocks in height) ... then some years ago it was changed to download only headers until we're close to today, then start downloading blocks (this makes a lot more sense)
<dergoegge> lets use the last 5 minutes for approach discussion: An alternative to the approach taken in the PR would be to add an additional headers sync peer after a timeout (fixed or random). What is the benefit of the approach taken in the PR over this alternative?
<lightlike> I think one benefit is that the peer that manages to send us the block inv first is often also a very fast peer. So we'd not pick another slow peer if for some reason our initial peer is slow.