zmq)

Aug 11, 2021

https://github.com/bitcoin/bitcoin/pull/20295

Host: mzumsande - PR author: Sjors

The PR branch HEAD was 30e2ba6c at the time of this review club meeting.

Notes

Bitcoin Core uses headers-first synchronization (implemented in PR 4468). It first downloads headers from its peers to build a tree of verified headers. Once the headers are downloaded and verified, it requests blocks that build towards the most-work chain. The decision whether to request a block for download is made based on the headers tree.
This automatic download behavior is beneficial for the normal operation of a node, saving bandwidth and protecting it against DoS attacks.
On the other hand, this also restricts the possibilities for analysis of blocks that have not been chosen for download. Examples are stale blocks (i.e. blocks that have not become part of the best chain).
This PR introduces a new RPC getblockfrompeer that manually attempts to fetch a block from a peer, while specifying the block hash and the id of the peer. This is achieved by constructing a GETDATA message for the block and sending it to the selected peer.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?
What are the possible applications of the new RPC getblockfrompeer?
Where in the codebase does a node make the autonomous decision whether to request a block for download after hearing about it from a peer? What is the main decision criterion?
The first commit collects various Ensure* helper functions and moves them to rpc/server_util.h. What is the purpose of these functions, and why did this PR move them?
What happens if we request a block from a peer, but the peer doesn’t have the block either?
This PR requests a block but adds no additional logic for dealing with the answer to the manual block request. Why is this not necessary?
What would happen if we’d use getblockfrompeer to request a block for which our node didn’t previously know the header?
Can you think of ways to further improve the new RPC command in the future? (some ideas are mentioned in the PR discussion)

Meeting Log

17:00

<lightlike> #startmeeting

17:00

<emzy> hi

17:00

<lightlike> Hi everyone! Welcome to the PR Review club!

17:00

<schmidty> hi

17:00

<theStack> hi

17:00

<glozow> hi!

17:00

<b10c> hi!

17:00

<janb> hi

17:00

<Naiza> hi!

17:00

<larryruane> hi

17:01

<lightlike> Is anybody here for the first time this week?

17:01

<raj> hi

17:02

<lightlike> Seems to be not the case

17:02

<lightlike> So, today we'll be looking at #20295 (getblockfrompeer).

17:02

<lightlike> Notes and questions are at the website (https://bitcoincore.reviews/20295 ) as usual.

17:02

<lightlike> Did you review the PR? (y/n)

17:03

<raj> y

17:03

<larryruane> y

17:03

<janb> n

17:03

<theStack> n

17:03

<emzy> n

17:03

<Naiza> y

17:03

<lightlike> For those who had the chance to take a look, what is your impression? (Concept ACK, approach ACK, tested ACK, or NACK?)

17:04

<glozow> y

17:04

<raj> tested ACK. Seems like an useful rpc command. Though might warrant some future improvements.

17:04

<larryruane> tested ACK, although I don't have a strong understanding of the motivation

17:05

<Naiza> tested ACK, but couldn't understand the real application of the added feature.

17:05

<lightlike> ok, let's begin with the questions - the first one is about the motivation:

17:05

<jnewbery> hi

17:06

<lightlike> What are the possible applications of the new RPC getblockfrompeer

17:06

<raj> lightlike, majorly to get blocks from the orphan chain?

17:06

<larryruane> the PR mentions testing, but it's unclear to me exactly how it helps with testing

17:06

<b10c> Concept ACK, I see how it's useful for the forkmonitor and other applications that require blocks from stale chains

17:07

<lightlike> raj: I agree, that is one application.

17:07

<larryruane> also the PR mentions the fork monitor https://forkmonitor.info/nodes/btc (which I think the author maintains)

17:07

<glozow> i was a bit unclear on this - is it common for regular users to want data for stale blocks?

17:07

<sipa_> raj: i'll only mention this once, but a per peeve of mine is that blocks in a non-active branch of the chain shouldn't be called orphaned. it's not like they don't have parents...

17:08

<theStack> getting pruned blocks seems to be another application (discussed in one of the comments on top by Fi3 and Sjors)

17:08

<glozow> sipa_: do you call them stale? or?

17:08

<jnewbery> There's a good writeup of stale blocks here: https://bitcoin.stackexchange.com/questions/5859/what-are-orphaned-and-stale-blocks/5869#5869

17:08

<raj> sipa_, yes thats correct. I should have wrote stale blocks, but i mixed it up with orphan transactions.. :D

17:09

<glozow> great reason to not call them orphan blocks hoho

17:09

<lightlike> larryruane: Not sure about testing, I think it's more about being able to better observe and analyze what is happening on mainnet with stale blocks.

17:09

<glozow> theStack: oh, getting pruned blocks seems to be a good reason

17:09

<lightlike> One other application that was mentioned was to be able to retrieve old blocks on a pruned node.

17:10

<lightlike> theStack: yes, what you said (missed your post)

17:10

<raj> lightlike, it also seems there should be a way to fetch any blocks from the network (as long as it known to be in the tree). Isn't there any current way to do that?

17:10

<glozow> wait, if you manually download a pruned block from a peer, do you prune it again immediately if it's old?

17:11

<schmidty> that was my original question as well ^

17:11

<lightlike> raj: Not manually, I don't think so.

17:11

<jnewbery> right, you could imagine having a full node and wanting to fetch a transaction, but the block that it's in has been pruned. You could run this commend to redownload the block, and then call getrawtransaction with the txid and blockhash

17:12

<lightlike> glozow: I'm not sure how quickly it would be pruned.

17:12

<larryruane> glozow: I wondered the same thing, so I fetched an older block (my node is pruned) about an hour ago and it's still there (`getblock` shows it)

17:12

<lightlike> I think at the latest it would be pruned at the next restart.

17:13

<larryruane> jnewbery: that's cool, so it's not just for accessing stale blocks

17:14

<lightlike> next question is about the automatic block download algorithm:

17:14

<lightlike> Where in the codebase does a node make the autonomous decision whether to request a block for download after hearing about it from a peer? What is the main decision criterion?

17:14

<sipa_> but i assume there is no guarantee about how long it'll stay available when in pruning mode?

17:15

<larryruane> lightlike: "... pruned at the next restart" I just restarted my node, and that old block (beyond my prune window) is still there... I think it's because it's now in the `blocks` directory (which is persisted)

17:16

<jnewbery> I think that if we redownloaded a block we'd generally keep it for at least a couple of days. We'd write it to the latest block file (after the tip block), and we'd only prune that file once the tip had been buried by 288 blocks.

17:16

<sipa_> larryruane: normally downloaded blocks are also stored in the blocks directory, and they are subject to pruning

17:17

<larryruane> jnewbery: that makes sense ... i know it never removes blocks from the middle `blk?????.dat` files

17:17

<b10c> jnewbery: uh that makes sense

17:17

<glozow> jnewbery: so pruning is per-file basis?

17:17

<sipa_> yes, pruning is per file

17:17

<jnewbery> the minimum you can set pruning to is 550MB. Rationale is here: https://github.com/bitcoin/bitcoin/blob/77e23ca945030d557559a7391cb8bd368693548c/src/validation.h#L91-L99

17:17

<b10c> on the other hand, if we call getblockfrompeer on many pruned blocks, could we cause the current chain tip to be pruned?

17:18

<theStack> is it already possible now to explicitely keep old blocks on a pruned node? or would this PR enable the first time ever to have "gaps" between blocks?

17:18

<sipa_> theStack: gaps are always possible, because blocks are downloafed out of.order

17:18

<sipa_> and files are deleted when they only contain blocks that are old eno7gh

17:19

<theStack> sipa_: ah, good to know. i was assuming the download order is strictly linear

17:19

<b10c> sipa_: thanks, that answers my question

17:19

<sipa_> that's hard to do when you're downloading from multiple peers in parallel :)

17:19

<sipa_> downloading is parallel since headers-fire fetching was introduced in 0.10

17:19

<larryruane> sipa_: but isn't it the case that if we advertise ourself as a NETWORK_LIMITED node, we may not send a block even though we have it?

17:20

<sipa_> larryruane: unsure

17:20

<larryruane> (but we still do have access to it locally of course)

17:20

<jnewbery> I may not have got that exactly 100% accurate, but roughly I'd expect the block to stay around for a short while due to the way files are pruned and the minimum prune value

17:20

<larryruane> i thought murch said that a week or 2 ago

17:20

<glozow> larryruane: think so, only send last 288 blocks

17:21

<larryruane> yeah i think it had to do with preventing fingerprinting

17:21

<glozow> per BIP159 https://github.com/bitcoin/bips/blob/master/bip-0159.mediawiki

17:21

<jnewbery> larryruane: yes, if we're NODE_NETWORK_LIMITED, we won't provide blocks more than a certain depth: https://github.com/bitcoin/bitcoin/blob/77e23ca945030d557559a7391cb8bd368693548c/src/net_processing.cpp#L1787-L1794

17:22

<jnewbery> *blocks below more than a certain depth

17:22

<lightlike> That's good, if it stays long enough to make analysis via other RPCs possible, it's a valid use case for the RPC imo.

17:22

<glozow> yeah makes sense

17:23

<lightlike> As for the last q: I think the relevant logic is in ProcessHeadersMessage() https://github.com/bitcoin/bitcoin/blob/0b5344b0d18788e011f2d4a279c8c12a29f1428a/src/net_processing.cpp#L2126-L2140

17:24

<jnewbery> ah yes, here's the constant for keeping all files that contain a block within the last 2 days: /** Block files containing a block-height within MIN_BLOCKS_TO_KEEP of ::ChainActive().Tip() will not be pruned. */

17:24

<jnewbery> static const unsigned int MIN_BLOCKS_TO_KEEP = 288;

17:24

<raj> lightlike, thanks. I was trying but was lost in the maze..

17:24

<larryruane> while testing this PR, i ran `getchaintips` but was surprised that i was not able to fetch any of those stale blocks from my peers (although i didn't try every one) ... maybe those peers just didn't ever consider those blocks to be part of best chain?

17:24

<lightlike> So, if the last header of a chain of headers doesn't have at least as much work as our chain, we wouldn't download the block automatically.

17:24

<jnewbery> oops, meant to paste the link: https://github.com/bitcoin/bitcoin/blob/77e23ca945030d557559a7391cb8bd368693548c/src/validation.h#L87-L88

17:25

<theStack> hope that my question is not too much off-topic, but i wondered (more than once) what is rationale of the 550 MB minimum prune size? i expected more like a multiple of 144 :) (i guess it's correlated with MIN_BLOCKS_TO_KEEP somehow though?)

17:25

<jnewbery> theStack: https://github.com/bitcoin/bitcoin/blob/77e23ca945030d557559a7391cb8bd368693548c/src/validation.h#L91-L99

17:27

<lightlike> larryruane: yes, maybe they never downloaded the blocks either.

17:27

<jnewbery> (it's very conservatively trying to keep at least the most recent 288 blocks and undo data)

17:27

<theStack> jnewbery: thanks! seems i was blind in the past :D

17:27

<larryruane> in case this is helpful to anyone, here's a command you can use to see which of your peers are non-pruning (look for NETWORK): `bitcoin-cli getpeerinfo|jq '.[]|(.id,.servicesnames)'`

100

17:27

<lightlike> Next q: The first commit collects various Ensure* helper functions and moves them to rpc/server_util.h. What is the purpose of these functions, and why did this PR move them?

101

17:27

<larryruane> (i love jq)

102

17:28

<raj> lightlike, this begs the question, if most nodes doesn't download stale blocks, doesn't that effectively reduces usefulness of this rpc?

103

17:28

<larryruane> lightlike: rather than asserting, if we're handling an RPC, we just want to throw (user error, not internal error)

104

17:29

<larryruane> but one thing i'm unclear on is why we would sometimes not have those things (like a mempool or a node context)

105

17:30

<lightlike> raj: yes, there seems to be some manual steps possible (try existing peers, if you don't get the block you want disconnect and them and try others) that could be automated

106

17:30

<b10c> ray: when the forkmonitor detects two blocks, at least one of their nodes will have the block. And for everyone else it can still be useful to get pruned blocks

107

17:30

<b10c> detects a stale block*

108

17:31

<jnewbery> b10c: because at least one of their peers must have sent a header or cmpctblock for the stale block?

109

17:32

<lightlike> larryruane: yes - also I think it's possible (or will be with increasing modularity) not to have certain parts (mempool, peerman etc.), in which case the specific rpcs don't make sense but an assert would be wrong.

110

17:33

<larryruane> there use of `std::any` with those Ensure functions that i don't understand, but that's probably too deep of a rabbit hole

111

17:33

<glozow> lightlike: i agree about the automating, could just try all existing peers instead of requiring a nodeid to be specified

112

17:34

<larryruane> i think the commit to move the Ensure functions fixes a circular dependency, but I don't understand very well

113

17:34

<lightlike> glozow: yes - I think the author didn't include that functionality on purpose, which is connected to the next question:

114

17:35

<lightlike> What happens if we request a block from a peer (via getblockfrompeer), but the peer doesn’t have the block either?

115

17:35

<jnewbery> glozow: if it was automated how long should the node wait before trying the next peer?

116

17:35

<larryruane> lightlike: by enabling "net" logging, the answer seems to be that the peer just doesn't reply

117

17:36

<larryruane> (i wonder if it would ban us if we ask too many times?)

118

17:36

<lightlike> larryruane: correct! This PR uses EnsurePeerman() in rpc/blockchain, which was located in rpc/net before. But rpc/net includes other stuff from rpc/blockchain, hence the circular dependency

119

17:37

<larryruane> and there's a tool that detects those dependencies, but there's a way to add exceptions, and we'd like to keep the exceptions to a minimum (i think)

120

17:37

<lightlike> larryruane: yes - there is no NOTFOUND send like it would be for transactions if the peer doesnt have the block.

121

17:38

<glozow> jnewbery: idk oops

122

17:38

<raj> Cant the node just say "sorry I dont have the block"?

123

17:39

<raj> jnewbery, then we would also know how long to wait and then try the next one?

124

17:39

<lightlike> so that means automation would either send to all peers at the same time (and potentially receive the block many times which would be wasteful), or it would need to define some kind of waiting period after requesting from one peer, which introduces some complications.

125

17:39

<jnewbery> larryruane: right. Do you know where the logic is to serve blocks to peers?

126

17:40

<larryruane> net_processing i would say, but i'll have to look it up

127

17:41

<larryruane> `PeerManagerImpl::ProcessGetBlockData()` is one place

128

17:42

<jnewbery> larryruane: yes, it's ProcessGetBlockData(). Here's where we drop the request if we don't have the block: https://github.com/bitcoin/bitcoin/blob/77e23ca945030d557559a7391cb8bd368693548c/src/net_processing.cpp#L1768-L1771

129

17:42

<lightlike> raj: I think it could (thats exactly what NOTFOUND does for transactions), but this is not currently implemented (probably because it is not such a common situation for blocks?)

130

17:42

<lightlike> Next q: This PR requests a block but adds no additional logic for dealing with the answer to the manual block request. Why is this not necessary?

131

17:44

<jnewbery> notfound has never been sent in response to a block request. It was added here for transactions: https://github.com/bitcoin/bitcoin/pull/2192/files

132

17:45

<raj> lightlike, yes that makes sense..

133

17:46

<glozow> "getdata obviously needs a response" ? https://github.com/bitcoin/bitcoin/pull/2192#issuecomment-12540792

134

17:46

<larryruane> lightlike: ".. not necessary?" because it's seamlessly handled here https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L3711

135

17:47

<larryruane> it's a cool design that when we receive a block, we don't really care *why* we're receiving it ... we're like, oh cool, here's another block i can learn about!

136

17:47

<lightlike> larryruane: yes, in particular AcceptBlock() in validation just deals fine with saving the block to disk, but only because we have marked it as requested before (BlockRequested())

137

17:48

<glozow> larryruane: as long as it's a block we asked for

138

17:48

<lightlike> which leads to the next q: What would happen if we’d use getblockfrompeer to request a block for which our node didn’t previously know the header?

139

17:48

<larryruane> glozow: lightlike: +1 thanks didn't know that

140

17:49

<lightlike> relevant code is https://github.com/bitcoin/bitcoin/blob/0b5344b0d18788e011f2d4a279c8c12a29f1428a/src/validation.cpp#L3421-L3431

141

17:50

<larryruane> lightlike: again I enabled "net" logging and tried this (by just specifying a nonexistent block hash arg) and it did send it out to our peer... who just ignored it (didn't reply)

142

17:50

<larryruane> so our node does send it out (even if we don't have it in our block index)

143

17:51

<lightlike> larryruane: yes - but that could be because the peer doesnt have the block, not because you didnt know the header.

144

17:51

<lightlike> i tried it too by extending the functional test.

145

17:52

<lightlike> But yes, the node sends it out - and if the peer knows about the block it will send it to us anyway - but we will drop it because we didn't call BlockRequested() in this case.

146

17:53

<jnewbery> lightlike: does validation ignore the block because force_processing is set to false?

147

17:53

<jnewbery> (in ProcessNewBlock())

148

17:55

<lightlike> jnewbery: not sure, I thought AcceptBlock() would ignore it because of the last criterion in https://github.com/bitcoin/bitcoin/blob/0b5344b0d18788e011f2d4a279c8c12a29f1428a/src/validation.cpp#L3421-L3431 (if it has lower work)

149

17:55

<jnewbery> here's the logic in AcceptBlock() that doesn't save the block if we didn't request it and it isn't part of the best chain: https://github.com/bitcoin/bitcoin/blob/77e23ca945030d557559a7391cb8bd368693548c/src/validation.cpp#L3413-L3424

150

17:55

<lightlike> but in this case, I wonder why we even bother requesting it, and don't just return an error for the RPC.

151

17:56

<jnewbery> haha snap

152

17:56

<jnewbery> interestingly if that TODO in AcceptBlock is implemented, then I don't think this RPC could ever work

153

17:56

<jnewbery> // TODO: Decouple this function from the block download logic by removing fRequested

154

17:56

<jnewbery> // This requires some new chain data structure to efficiently look up if a

155

17:56

<jnewbery> // block is in a chain leading to a candidate for best tip, despite not

156

17:56

<jnewbery> // being such a candidate itself.

157

17:57

<lightlike> since time is getting low, last q: Can you think of ways to further improve the new RPC command in the future? (some ideas are mentioned in the PR discussion)

158

17:58

<larryruane> jnewbery: that logic in AcceptBlock() seems to prevent DoS vector, because those checks are very quick (we don't even look at the transactions for example)

159

17:59

<larryruane> jnewbery: good find, that TODO should be mentioned in the PR!

160

18:00

<jnewbery> larryruane: I imagine sjors didn't know about it (I didn't know about it until 3 minutes ago)

161

18:00

<larryruane> I just want to say, the review that jnewbery did on this PR is amazing, I hope I can be half as good someday! If you haven't already, check out the resolved comments

162

18:00

<lightlike> I think some steps toward automation would make sense, it seems bothersome to request from each peer manually.

163

18:01

<lightlike> but time's up

164

18:01

<lightlike> #endmeeting