Aug 11, 2021
PR author: Sjors
The PR branch HEAD was 30e2ba6c at the time of this review club meeting.
Bitcoin Core uses headers-first synchronization (implemented in
4468). It first downloads
headers from its peers to build a tree of verified headers. Once the headers
are downloaded and verified, it requests blocks that build towards the
most-work chain. The decision whether to request a block for download is made
based on the headers tree.
This automatic download behavior is beneficial for the normal operation
of a node, saving bandwidth and protecting it against DoS attacks.
On the other hand, this also restricts the possibilities for analysis of blocks
that have not been chosen for download. Examples are
blocks (i.e. blocks that have
not become part of the best chain).
This PR introduces a new RPC
getblockfrompeer that manually attempts to fetch
a block from a peer, while specifying the block hash and the id of the peer.
This is achieved by constructing a GETDATA message for the block and sending it
to the selected peer.
Did you review the PR?
Concept ACK, approach ACK, tested ACK, or NACK?
What are the possible applications of the new RPC
Where in the codebase does a node make the autonomous decision whether to
request a block for download after hearing about it from a peer?
What is the main decision criterion?
first commit collects various
Ensure* helper functions and moves them
rpc/server_util.h. What is the purpose of these functions, and why did this PR move them?
What happens if we request a block from a peer, but the peer doesn’t have the
This PR requests a block but adds no additional logic for dealing with the
answer to the manual block request. Why is this not necessary?
What would happen if we’d use
getblockfrompeer to request a block for which
our node didn’t previously know the header?
Can you think of ways to further improve the new RPC command in the future?
(some ideas are mentioned in the PR discussion)
1 17:00 <lightlike> #startmeeting
3 17:00 <lightlike> Hi everyone! Welcome to the PR Review club!
11 17:01 <lightlike> Is anybody here for the first time this week?
13 17:02 <lightlike> Seems to be not the case
14 17:02 <lightlike> So, today we'll be looking at #20295 (getblockfrompeer).
16 17:02 <lightlike> Did you review the PR? (y/n)
23 17:03 <lightlike> For those who had the chance to take a look, what is your impression? (Concept ACK, approach ACK, tested ACK, or NACK?)
25 17:04 <raj> tested ACK. Seems like an useful rpc command. Though might warrant some future improvements.
26 17:04 <larryruane> tested ACK, although I don't have a strong understanding of the motivation
27 17:05 <Naiza> tested ACK, but couldn't understand the real application of the added feature.
28 17:05 <lightlike> ok, let's begin with the questions - the first one is about the motivation:
30 17:06 <lightlike> What are the possible applications of the new RPC getblockfrompeer
31 17:06 <raj> lightlike, majorly to get blocks from the orphan chain?
32 17:06 <larryruane> the PR mentions testing, but it's unclear to me exactly how it helps with testing
33 17:06 <b10c> Concept ACK, I see how it's useful for the forkmonitor and other applications that require blocks from stale chains
34 17:07 <lightlike> raj: I agree, that is one application.
36 17:07 <glozow> i was a bit unclear on this - is it common for regular users to want data for stale blocks?
37 17:07 <sipa_> raj: i'll only mention this once, but a per peeve of mine is that blocks in a non-active branch of the chain shouldn't be called orphaned. it's not like they don't have parents...
38 17:08 <theStack> getting pruned blocks seems to be another application (discussed in one of the comments on top by Fi3 and Sjors)
39 17:08 <glozow> sipa_: do you call them stale? or?
41 17:08 <raj> sipa_, yes thats correct. I should have wrote stale blocks, but i mixed it up with orphan transactions.. :D
42 17:09 <glozow> great reason to not call them orphan blocks hoho
43 17:09 <lightlike> larryruane: Not sure about testing, I think it's more about being able to better observe and analyze what is happening on mainnet with stale blocks.
44 17:09 <glozow> theStack: oh, getting pruned blocks seems to be a good reason
45 17:09 <lightlike> One other application that was mentioned was to be able to retrieve old blocks on a pruned node.
46 17:10 <lightlike> theStack: yes, what you said (missed your post)
47 17:10 <raj> lightlike, it also seems there should be a way to fetch any blocks from the network (as long as it known to be in the tree). Isn't there any current way to do that?
48 17:10 <glozow> wait, if you manually download a pruned block from a peer, do you prune it again immediately if it's old?
49 17:11 <schmidty> that was my original question as well ^
50 17:11 <lightlike> raj: Not manually, I don't think so.
51 17:11 <jnewbery> right, you could imagine having a full node and wanting to fetch a transaction, but the block that it's in has been pruned. You could run this commend to redownload the block, and then call getrawtransaction with the txid and blockhash
52 17:12 <lightlike> glozow: I'm not sure how quickly it would be pruned.
53 17:12 <larryruane> glozow: I wondered the same thing, so I fetched an older block (my node is pruned) about an hour ago and it's still there (`getblock` shows it)
54 17:12 <lightlike> I think at the latest it would be pruned at the next restart.
55 17:13 <larryruane> jnewbery: that's cool, so it's not just for accessing stale blocks
56 17:14 <lightlike> next question is about the automatic block download algorithm:
57 17:14 <lightlike> Where in the codebase does a node make the autonomous decision whether to request a block for download after hearing about it from a peer? What is the main decision criterion?
58 17:14 <sipa_> but i assume there is no guarantee about how long it'll stay available when in pruning mode?
59 17:15 <larryruane> lightlike: "... pruned at the next restart" I just restarted my node, and that old block (beyond my prune window) is still there... I think it's because it's now in the `blocks` directory (which is persisted)
60 17:16 <jnewbery> I think that if we redownloaded a block we'd generally keep it for at least a couple of days. We'd write it to the latest block file (after the tip block), and we'd only prune that file once the tip had been buried by 288 blocks.
61 17:16 <sipa_> larryruane: normally downloaded blocks are also stored in the blocks directory, and they are subject to pruning
62 17:17 <larryruane> jnewbery: that makes sense ... i know it never removes blocks from the middle `blk?????.dat` files
63 17:17 <b10c> jnewbery: uh that makes sense
64 17:17 <glozow> jnewbery: so pruning is per-file basis?
65 17:17 <sipa_> yes, pruning is per file
67 17:17 <b10c> on the other hand, if we call getblockfrompeer on many pruned blocks, could we cause the current chain tip to be pruned?
68 17:18 <theStack> is it already possible now to explicitely keep old blocks on a pruned node? or would this PR enable the first time ever to have "gaps" between blocks?
69 17:18 <sipa_> theStack: gaps are always possible, because blocks are downloafed out of.order
70 17:18 <sipa_> and files are deleted when they only contain blocks that are old eno7gh
71 17:19 <theStack> sipa_: ah, good to know. i was assuming the download order is strictly linear
72 17:19 <b10c> sipa_: thanks, that answers my question
73 17:19 <sipa_> that's hard to do when you're downloading from multiple peers in parallel :)
74 17:19 <sipa_> downloading is parallel since headers-fire fetching was introduced in 0.10
75 17:19 <larryruane> sipa_: but isn't it the case that if we advertise ourself as a NETWORK_LIMITED node, we may not send a block even though we have it?
76 17:20 <sipa_> larryruane: unsure
77 17:20 <larryruane> (but we still do have access to it locally of course)
78 17:20 <jnewbery> I may not have got that exactly 100% accurate, but roughly I'd expect the block to stay around for a short while due to the way files are pruned and the minimum prune value
79 17:20 <larryruane> i thought murch said that a week or 2 ago
80 17:20 <glozow> larryruane: think so, only send last 288 blocks
81 17:21 <larryruane> yeah i think it had to do with preventing fingerprinting
84 17:22 <jnewbery> *blocks below more than a certain depth
85 17:22 <lightlike> That's good, if it stays long enough to make analysis via other RPCs possible, it's a valid use case for the RPC imo.
86 17:22 <glozow> yeah makes sense
88 17:24 <jnewbery> ah yes, here's the constant for keeping all files that contain a block within the last 2 days: /** Block files containing a block-height within MIN_BLOCKS_TO_KEEP of ::ChainActive().Tip() will not be pruned. */
89 17:24 <jnewbery> static const unsigned int MIN_BLOCKS_TO_KEEP = 288;
90 17:24 <raj> lightlike, thanks. I was trying but was lost in the maze..
91 17:24 <larryruane> while testing this PR, i ran `getchaintips` but was surprised that i was not able to fetch any of those stale blocks from my peers (although i didn't try every one) ... maybe those peers just didn't ever consider those blocks to be part of best chain?
92 17:24 <lightlike> So, if the last header of a chain of headers doesn't have at least as much work as our chain, we wouldn't download the block automatically.
94 17:25 <theStack> hope that my question is not too much off-topic, but i wondered (more than once) what is rationale of the 550 MB minimum prune size? i expected more like a multiple of 144 :) (i guess it's correlated with MIN_BLOCKS_TO_KEEP somehow though?)
96 17:27 <lightlike> larryruane: yes, maybe they never downloaded the blocks either.
97 17:27 <jnewbery> (it's very conservatively trying to keep at least the most recent 288 blocks and undo data)
98 17:27 <theStack> jnewbery: thanks! seems i was blind in the past :D
99 17:27 <larryruane> in case this is helpful to anyone, here's a command you can use to see which of your peers are non-pruning (look for NETWORK): `bitcoin-cli getpeerinfo|jq '.|(.id,.servicesnames)'`
100 17:27 <lightlike> Next q: The first commit collects various Ensure* helper functions and moves them to rpc/server_util.h. What is the purpose of these functions, and why did this PR move them?
101 17:27 <larryruane> (i love jq)
102 17:28 <raj> lightlike, this begs the question, if most nodes doesn't download stale blocks, doesn't that effectively reduces usefulness of this rpc?
103 17:28 <larryruane> lightlike: rather than asserting, if we're handling an RPC, we just want to throw (user error, not internal error)
104 17:29 <larryruane> but one thing i'm unclear on is why we would sometimes not have those things (like a mempool or a node context)
105 17:30 <lightlike> raj: yes, there seems to be some manual steps possible (try existing peers, if you don't get the block you want disconnect and them and try others) that could be automated
106 17:30 <b10c> ray: when the forkmonitor detects two blocks, at least one of their nodes will have the block. And for everyone else it can still be useful to get pruned blocks
107 17:30 <b10c> detects a stale block*
108 17:31 <jnewbery> b10c: because at least one of their peers must have sent a header or cmpctblock for the stale block?
109 17:32 <lightlike> larryruane: yes - also I think it's possible (or will be with increasing modularity) not to have certain parts (mempool, peerman etc.), in which case the specific rpcs don't make sense but an assert would be wrong.
110 17:33 <larryruane> there use of `std::any` with those Ensure functions that i don't understand, but that's probably too deep of a rabbit hole
111 17:33 <glozow> lightlike: i agree about the automating, could just try all existing peers instead of requiring a nodeid to be specified
112 17:34 <larryruane> i think the commit to move the Ensure functions fixes a circular dependency, but I don't understand very well
113 17:34 <lightlike> glozow: yes - I think the author didn't include that functionality on purpose, which is connected to the next question:
114 17:35 <lightlike> What happens if we request a block from a peer (via getblockfrompeer), but the peer doesn’t have the block either?
115 17:35 <jnewbery> glozow: if it was automated how long should the node wait before trying the next peer?
116 17:35 <larryruane> lightlike: by enabling "net" logging, the answer seems to be that the peer just doesn't reply
117 17:36 <larryruane> (i wonder if it would ban us if we ask too many times?)
118 17:36 <lightlike> larryruane: correct! This PR uses EnsurePeerman() in rpc/blockchain, which was located in rpc/net before. But rpc/net includes other stuff from rpc/blockchain, hence the circular dependency
119 17:37 <larryruane> and there's a tool that detects those dependencies, but there's a way to add exceptions, and we'd like to keep the exceptions to a minimum (i think)
120 17:37 <lightlike> larryruane: yes - there is no NOTFOUND send like it would be for transactions if the peer doesnt have the block.
121 17:38 <glozow> jnewbery: idk oops
122 17:38 <raj> Cant the node just say "sorry I dont have the block"?
123 17:39 <raj> jnewbery, then we would also know how long to wait and then try the next one?
124 17:39 <lightlike> so that means automation would either send to all peers at the same time (and potentially receive the block many times which would be wasteful), or it would need to define some kind of waiting period after requesting from one peer, which introduces some complications.
125 17:39 <jnewbery> larryruane: right. Do you know where the logic is to serve blocks to peers?
126 17:40 <larryruane> net_processing i would say, but i'll have to look it up
127 17:41 <larryruane> `PeerManagerImpl::ProcessGetBlockData()` is one place
129 17:42 <lightlike> raj: I think it could (thats exactly what NOTFOUND does for transactions), but this is not currently implemented (probably because it is not such a common situation for blocks?)
130 17:42 <lightlike> Next q: This PR requests a block but adds no additional logic for dealing with the answer to the manual block request. Why is this not necessary?
132 17:45 <raj> lightlike, yes that makes sense..
135 17:47 <larryruane> it's a cool design that when we receive a block, we don't really care *why* we're receiving it ... we're like, oh cool, here's another block i can learn about!
136 17:47 <lightlike> larryruane: yes, in particular AcceptBlock() in validation just deals fine with saving the block to disk, but only because we have marked it as requested before (BlockRequested())
137 17:48 <glozow> larryruane: as long as it's a block we asked for
138 17:48 <lightlike> which leads to the next q: What would happen if we’d use getblockfrompeer to request a block for which our node didn’t previously know the header?
139 17:48 <larryruane> glozow: lightlike: +1 thanks didn't know that
141 17:50 <larryruane> lightlike: again I enabled "net" logging and tried this (by just specifying a nonexistent block hash arg) and it did send it out to our peer... who just ignored it (didn't reply)
142 17:50 <larryruane> so our node does send it out (even if we don't have it in our block index)
143 17:51 <lightlike> larryruane: yes - but that could be because the peer doesnt have the block, not because you didnt know the header.
144 17:51 <lightlike> i tried it too by extending the functional test.
145 17:52 <lightlike> But yes, the node sends it out - and if the peer knows about the block it will send it to us anyway - but we will drop it because we didn't call BlockRequested() in this case.
146 17:53 <jnewbery> lightlike: does validation ignore the block because force_processing is set to false?
147 17:53 <jnewbery> (in ProcessNewBlock())
150 17:55 <lightlike> but in this case, I wonder why we even bother requesting it, and don't just return an error for the RPC.
151 17:56 <jnewbery> haha snap
152 17:56 <jnewbery> interestingly if that TODO in AcceptBlock is implemented, then I don't think this RPC could ever work
153 17:56 <jnewbery> // TODO: Decouple this function from the block download logic by removing fRequested
154 17:56 <jnewbery> // This requires some new chain data structure to efficiently look up if a
155 17:56 <jnewbery> // block is in a chain leading to a candidate for best tip, despite not
156 17:56 <jnewbery> // being such a candidate itself.
157 17:57 <lightlike> since time is getting low, last q: Can you think of ways to further improve the new RPC command in the future? (some ideas are mentioned in the PR discussion)
158 17:58 <larryruane> jnewbery: that logic in AcceptBlock() seems to prevent DoS vector, because those checks are very quick (we don't even look at the transactions for example)
159 17:59 <larryruane> jnewbery: good find, that TODO should be mentioned in the PR!
160 18:00 <jnewbery> larryruane: I imagine sjors didn't know about it (I didn't know about it until 3 minutes ago)
161 18:00 <larryruane> I just want to say, the review that jnewbery did on this PR is amazing, I hope I can be half as good someday! If you haven't already, check out the resolved comments
162 18:00 <lightlike> I think some steps toward automation would make sense, it seems bothersome to request from each peer manually.
163 18:01 <lightlike> but time's up
164 18:01 <lightlike> #endmeeting