rpc: getblockfrompeer (p2p, rpc/rest/zmq)

https://github.com/bitcoin/bitcoin/pull/20295

Host: mzumsande  -  PR author: Sjors

The PR branch HEAD was 30e2ba6c at the time of this review club meeting.

Notes

  • Bitcoin Core uses headers-first synchronization (implemented in PR 4468). It first downloads headers from its peers to build a tree of verified headers. Once the headers are downloaded and verified, it requests blocks that build towards the most-work chain. The decision whether to request a block for download is made based on the headers tree.

  • This automatic download behavior is beneficial for the normal operation of a node, saving bandwidth and protecting it against DoS attacks.

  • On the other hand, this also restricts the possibilities for analysis of blocks that have not been chosen for download. Examples are stale blocks (i.e. blocks that have not become part of the best chain).

  • This PR introduces a new RPC getblockfrompeer that manually attempts to fetch a block from a peer, while specifying the block hash and the id of the peer. This is achieved by constructing a GETDATA message for the block and sending it to the selected peer.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. What are the possible applications of the new RPC getblockfrompeer?

  3. Where in the codebase does a node make the autonomous decision whether to request a block for download after hearing about it from a peer? What is the main decision criterion?

  4. The first commit collects various Ensure* helper functions and moves them to rpc/server_util.h. What is the purpose of these functions, and why did this PR move them?

  5. What happens if we request a block from a peer, but the peer doesn’t have the block either?

  6. This PR requests a block but adds no additional logic for dealing with the answer to the manual block request. Why is this not necessary?

  7. What would happen if we’d use getblockfrompeer to request a block for which our node didn’t previously know the header?

  8. Can you think of ways to further improve the new RPC command in the future? (some ideas are mentioned in the PR discussion)

Meeting Log

  117:00 <lightlike> #startmeeting
  217:00 <emzy> hi
  317:00 <lightlike> Hi everyone! Welcome to the PR Review club!
  417:00 <schmidty> hi
  517:00 <theStack> hi
  617:00 <glozow> hi!
  717:00 <b10c> hi!
  817:00 <janb> hi
  917:00 <Naiza> hi!
 1017:00 <larryruane> hi
 1117:01 <lightlike> Is anybody here for the first time this week?
 1217:01 <raj> hi
 1317:02 <lightlike> Seems to be not the case
 1417:02 <lightlike> So, today we'll be looking at #20295 (getblockfrompeer).
 1517:02 <lightlike> Notes and questions are at the website (https://bitcoincore.reviews/20295 ) as usual.
 1617:02 <lightlike> Did you review the PR? (y/n)
 1717:03 <raj> y
 1817:03 <larryruane> y
 1917:03 <janb> n
 2017:03 <theStack> n
 2117:03 <emzy> n
 2217:03 <Naiza> y
 2317:03 <lightlike> For those who had the chance to take a look, what is your impression? (Concept ACK, approach ACK, tested ACK, or NACK?)
 2417:04 <glozow> y
 2517:04 <raj> tested ACK. Seems like an useful rpc command. Though might warrant some future improvements.
 2617:04 <larryruane> tested ACK, although I don't have a strong understanding of the motivation
 2717:05 <Naiza> tested ACK, but couldn't understand the real application of the added feature.
 2817:05 <lightlike> ok, let's begin with the questions - the first one is about the motivation:
 2917:05 <jnewbery> hi
 3017:06 <lightlike> What are the possible applications of the new RPC getblockfrompeer
 3117:06 <raj> lightlike, majorly to get blocks from the orphan chain?
 3217:06 <larryruane> the PR mentions testing, but it's unclear to me exactly how it helps with testing
 3317:06 <b10c> Concept ACK, I see how it's useful for the forkmonitor and other applications that require blocks from stale chains
 3417:07 <lightlike> raj: I agree, that is one application.
 3517:07 <larryruane> also the PR mentions the fork monitor https://forkmonitor.info/nodes/btc (which I think the author maintains)
 3617:07 <glozow> i was a bit unclear on this - is it common for regular users to want data for stale blocks?
 3717:07 <sipa_> raj: i'll only mention this once, but a per peeve of mine is that blocks in a non-active branch of the chain shouldn't be called orphaned. it's not like they don't have parents...
 3817:08 <theStack> getting pruned blocks seems to be another application (discussed in one of the comments on top by Fi3 and Sjors)
 3917:08 <glozow> sipa_: do you call them stale? or?
 4017:08 <jnewbery> There's a good writeup of stale blocks here: https://bitcoin.stackexchange.com/questions/5859/what-are-orphaned-and-stale-blocks/5869#5869
 4117:08 <raj> sipa_, yes thats correct. I should have wrote stale blocks, but i mixed it up with orphan transactions.. :D
 4217:09 <glozow> great reason to not call them orphan blocks hoho
 4317:09 <lightlike> larryruane: Not sure about testing, I think it's more about being able to better observe and analyze what is happening on mainnet with stale blocks.
 4417:09 <glozow> theStack: oh, getting pruned blocks seems to be a good reason
 4517:09 <lightlike> One other application that was mentioned was to be able to retrieve old blocks on a pruned node.
 4617:10 <lightlike> theStack: yes, what you said (missed your post)
 4717:10 <raj> lightlike, it also seems there should be a way to fetch any blocks from the network (as long as it known to be in the tree). Isn't there any current way to do that?
 4817:10 <glozow> wait, if you manually download a pruned block from a peer, do you prune it again immediately if it's old?
 4917:11 <schmidty> that was my original question as well ^
 5017:11 <lightlike> raj: Not manually, I don't think so.
 5117:11 <jnewbery> right, you could imagine having a full node and wanting to fetch a transaction, but the block that it's in has been pruned. You could run this commend to redownload the block, and then call getrawtransaction with the txid and blockhash
 5217:12 <lightlike> glozow: I'm not sure how quickly it would be pruned.
 5317:12 <larryruane> glozow: I wondered the same thing, so I fetched an older block (my node is pruned) about an hour ago and it's still there (`getblock` shows it)
 5417:12 <lightlike> I think at the latest it would be pruned at the next restart.
 5517:13 <larryruane> jnewbery: that's cool, so it's not just for accessing stale blocks
 5617:14 <lightlike> next question is about the automatic block download algorithm:
 5717:14 <lightlike> Where in the codebase does a node make the autonomous decision whether to request a block for download after hearing about it from a peer? What is the main decision criterion?
 5817:14 <sipa_> but i assume there is no guarantee about how long it'll stay available when in pruning mode?
 5917:15 <larryruane> lightlike: "... pruned at the next restart" I just restarted my node, and that old block (beyond my prune window) is still there... I think it's because it's now in the `blocks` directory (which is persisted)
 6017:16 <jnewbery> I think that if we redownloaded a block we'd generally keep it for at least a couple of days. We'd write it to the latest block file (after the tip block), and we'd only prune that file once the tip had been buried by 288 blocks.
 6117:16 <sipa_> larryruane: normally downloaded blocks are also stored in the blocks directory, and they are subject to pruning
 6217:17 <larryruane> jnewbery: that makes sense ... i know it never removes blocks from the middle `blk?????.dat` files
 6317:17 <b10c> jnewbery: uh that makes sense
 6417:17 <glozow> jnewbery: so pruning is per-file basis?
 6517:17 <sipa_> yes, pruning is per file
 6617:17 <jnewbery> the minimum you can set pruning to is 550MB. Rationale is here: https://github.com/bitcoin/bitcoin/blob/77e23ca945030d557559a7391cb8bd368693548c/src/validation.h#L91-L99
 6717:17 <b10c> on the other hand, if we call getblockfrompeer on many pruned blocks, could we cause the current chain tip to be pruned?
 6817:18 <theStack> is it already possible now to explicitely keep old blocks on a pruned node? or would this PR enable the first time ever to have "gaps" between blocks?
 6917:18 <sipa_> theStack: gaps are always possible, because blocks are downloafed out of.order
 7017:18 <sipa_> and files are deleted when they only contain blocks that are old eno7gh
 7117:19 <theStack> sipa_: ah, good to know. i was assuming the download order is strictly linear
 7217:19 <b10c> sipa_: thanks, that answers my question
 7317:19 <sipa_> that's hard to do when you're downloading from multiple peers in parallel :)
 7417:19 <sipa_> downloading is parallel since headers-fire fetching was introduced in 0.10
 7517:19 <larryruane> sipa_: but isn't it the case that if we advertise ourself as a NETWORK_LIMITED node, we may not send a block even though we have it?
 7617:20 <sipa_> larryruane: unsure
 7717:20 <larryruane> (but we still do have access to it locally of course)
 7817:20 <jnewbery> I may not have got that exactly 100% accurate, but roughly I'd expect the block to stay around for a short while due to the way files are pruned and the minimum prune value
 7917:20 <larryruane> i thought murch said that a week or 2 ago
 8017:20 <glozow> larryruane: think so, only send last 288 blocks
 8117:21 <larryruane> yeah i think it had to do with preventing fingerprinting
 8217:21 <glozow> per BIP159 https://github.com/bitcoin/bips/blob/master/bip-0159.mediawiki
 8317:21 <jnewbery> larryruane: yes, if we're NODE_NETWORK_LIMITED, we won't provide blocks more than a certain depth: https://github.com/bitcoin/bitcoin/blob/77e23ca945030d557559a7391cb8bd368693548c/src/net_processing.cpp#L1787-L1794
 8417:22 <jnewbery> *blocks below more than a certain depth
 8517:22 <lightlike> That's good, if it stays long enough to make analysis via other RPCs possible, it's a valid use case for the RPC imo.
 8617:22 <glozow> yeah makes sense
 8717:23 <lightlike> As for the last q: I think the relevant logic is in ProcessHeadersMessage() https://github.com/bitcoin/bitcoin/blob/0b5344b0d18788e011f2d4a279c8c12a29f1428a/src/net_processing.cpp#L2126-L2140
 8817:24 <jnewbery> ah yes, here's the constant for keeping all files that contain a block within the last 2 days: /** Block files containing a block-height within MIN_BLOCKS_TO_KEEP of ::ChainActive().Tip() will not be pruned. */
 8917:24 <jnewbery> static const unsigned int MIN_BLOCKS_TO_KEEP = 288;
 9017:24 <raj> lightlike, thanks. I was trying but was lost in the maze..
 9117:24 <larryruane> while testing this PR, i ran `getchaintips` but was surprised that i was not able to fetch any of those stale blocks from my peers (although i didn't try every one) ... maybe those peers just didn't ever consider those blocks to be part of best chain?
 9217:24 <lightlike> So, if the last header of a chain of headers doesn't have at least as much work as our chain, we wouldn't download the block automatically.
 9317:24 <jnewbery> oops, meant to paste the link: https://github.com/bitcoin/bitcoin/blob/77e23ca945030d557559a7391cb8bd368693548c/src/validation.h#L87-L88
 9417:25 <theStack> hope that my question is not too much off-topic, but i wondered (more than once) what is rationale of the 550 MB minimum prune size? i expected more like a multiple of 144 :) (i guess it's correlated with MIN_BLOCKS_TO_KEEP somehow though?)
 9517:25 <jnewbery> theStack: https://github.com/bitcoin/bitcoin/blob/77e23ca945030d557559a7391cb8bd368693548c/src/validation.h#L91-L99
 9617:27 <lightlike> larryruane: yes, maybe they never downloaded the blocks either.
 9717:27 <jnewbery> (it's very conservatively trying to keep at least the most recent 288 blocks and undo data)
 9817:27 <theStack> jnewbery: thanks! seems i was blind in the past :D
 9917:27 <larryruane> in case this is helpful to anyone, here's a command you can use to see which of your peers are non-pruning (look for NETWORK): `bitcoin-cli getpeerinfo|jq '.[]|(.id,.servicesnames)'`
10017:27 <lightlike> Next q: The first commit collects various Ensure* helper functions and moves them to rpc/server_util.h. What is the purpose of these functions, and why did this PR move them?
10117:27 <larryruane> (i love jq)
10217:28 <raj> lightlike, this begs the question, if most nodes doesn't download stale blocks, doesn't that effectively reduces usefulness of this rpc?
10317:28 <larryruane> lightlike: rather than asserting, if we're handling an RPC, we just want to throw (user error, not internal error)
10417:29 <larryruane> but one thing i'm unclear on is why we would sometimes not have those things (like a mempool or a node context)
10517:30 <lightlike> raj: yes, there seems to be some manual steps possible (try existing peers, if you don't get the block you want disconnect and them and try others) that could be automated
10617:30 <b10c> ray: when the forkmonitor detects two blocks, at least one of their nodes will have the block. And for everyone else it can still be useful to get pruned blocks
10717:30 <b10c> detects a stale block*
10817:31 <jnewbery> b10c: because at least one of their peers must have sent a header or cmpctblock for the stale block?
10917:32 <lightlike> larryruane: yes - also I think it's possible (or will be with increasing modularity) not to have certain parts (mempool, peerman etc.), in which case the specific rpcs don't make sense but an assert would be wrong.
11017:33 <larryruane> there use of `std::any` with those Ensure functions that i don't understand, but that's probably too deep of a rabbit hole
11117:33 <glozow> lightlike: i agree about the automating, could just try all existing peers instead of requiring a nodeid to be specified
11217:34 <larryruane> i think the commit to move the Ensure functions fixes a circular dependency, but I don't understand very well
11317:34 <lightlike> glozow: yes - I think the author didn't include that functionality on purpose, which is connected to the next question:
11417:35 <lightlike> What happens if we request a block from a peer (via getblockfrompeer), but the peer doesn’t have the block either?
11517:35 <jnewbery> glozow: if it was automated how long should the node wait before trying the next peer?
11617:35 <larryruane> lightlike: by enabling "net" logging, the answer seems to be that the peer just doesn't reply
11717:36 <larryruane> (i wonder if it would ban us if we ask too many times?)
11817:36 <lightlike> larryruane: correct! This PR uses EnsurePeerman() in rpc/blockchain, which was located in rpc/net before. But rpc/net includes other stuff from rpc/blockchain, hence the circular dependency
11917:37 <larryruane> and there's a tool that detects those dependencies, but there's a way to add exceptions, and we'd like to keep the exceptions to a minimum (i think)
12017:37 <lightlike> larryruane: yes - there is no NOTFOUND send like it would be for transactions if the peer doesnt have the block.
12117:38 <glozow> jnewbery: idk oops
12217:38 <raj> Cant the node just say "sorry I dont have the block"?
12317:39 <raj> jnewbery, then we would also know how long to wait and then try the next one?
12417:39 <lightlike> so that means automation would either send to all peers at the same time (and potentially receive the block many times which would be wasteful), or it would need to define some kind of waiting period after requesting from one peer, which introduces some complications.
12517:39 <jnewbery> larryruane: right. Do you know where the logic is to serve blocks to peers?
12617:40 <larryruane> net_processing i would say, but i'll have to look it up
12717:41 <larryruane> `PeerManagerImpl::ProcessGetBlockData()` is one place
12817:42 <jnewbery> larryruane: yes, it's ProcessGetBlockData(). Here's where we drop the request if we don't have the block: https://github.com/bitcoin/bitcoin/blob/77e23ca945030d557559a7391cb8bd368693548c/src/net_processing.cpp#L1768-L1771
12917:42 <lightlike> raj: I think it could (thats exactly what NOTFOUND does for transactions), but this is not currently implemented (probably because it is not such a common situation for blocks?)
13017:42 <lightlike> Next q: This PR requests a block but adds no additional logic for dealing with the answer to the manual block request. Why is this not necessary?
13117:44 <jnewbery> notfound has never been sent in response to a block request. It was added here for transactions: https://github.com/bitcoin/bitcoin/pull/2192/files
13217:45 <raj> lightlike, yes that makes sense..
13317:46 <glozow> "getdata obviously needs a response" ? https://github.com/bitcoin/bitcoin/pull/2192#issuecomment-12540792
13417:46 <larryruane> lightlike: ".. not necessary?" because it's seamlessly handled here https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L3711
13517:47 <larryruane> it's a cool design that when we receive a block, we don't really care *why* we're receiving it ... we're like, oh cool, here's another block i can learn about!
13617:47 <lightlike> larryruane: yes, in particular AcceptBlock() in validation just deals fine with saving the block to disk, but only because we have marked it as requested before (BlockRequested())
13717:48 <glozow> larryruane: as long as it's a block we asked for
13817:48 <lightlike> which leads to the next q: What would happen if we’d use getblockfrompeer to request a block for which our node didn’t previously know the header?
13917:48 <larryruane> glozow: lightlike: +1 thanks didn't know that
14017:49 <lightlike> relevant code is https://github.com/bitcoin/bitcoin/blob/0b5344b0d18788e011f2d4a279c8c12a29f1428a/src/validation.cpp#L3421-L3431
14117:50 <larryruane> lightlike: again I enabled "net" logging and tried this (by just specifying a nonexistent block hash arg) and it did send it out to our peer... who just ignored it (didn't reply)
14217:50 <larryruane> so our node does send it out (even if we don't have it in our block index)
14317:51 <lightlike> larryruane: yes - but that could be because the peer doesnt have the block, not because you didnt know the header.
14417:51 <lightlike> i tried it too by extending the functional test.
14517:52 <lightlike> But yes, the node sends it out - and if the peer knows about the block it will send it to us anyway - but we will drop it because we didn't call BlockRequested() in this case.
14617:53 <jnewbery> lightlike: does validation ignore the block because force_processing is set to false?
14717:53 <jnewbery> (in ProcessNewBlock())
14817:55 <lightlike> jnewbery: not sure, I thought AcceptBlock() would ignore it because of the last criterion in https://github.com/bitcoin/bitcoin/blob/0b5344b0d18788e011f2d4a279c8c12a29f1428a/src/validation.cpp#L3421-L3431 (if it has lower work)
14917:55 <jnewbery> here's the logic in AcceptBlock() that doesn't save the block if we didn't request it and it isn't part of the best chain: https://github.com/bitcoin/bitcoin/blob/77e23ca945030d557559a7391cb8bd368693548c/src/validation.cpp#L3413-L3424
15017:55 <lightlike> but in this case, I wonder why we even bother requesting it, and don't just return an error for the RPC.
15117:56 <jnewbery> haha snap
15217:56 <jnewbery> interestingly if that TODO in AcceptBlock is implemented, then I don't think this RPC could ever work
15317:56 <jnewbery> // TODO: Decouple this function from the block download logic by removing fRequested
15417:56 <jnewbery> // This requires some new chain data structure to efficiently look up if a
15517:56 <jnewbery> // block is in a chain leading to a candidate for best tip, despite not
15617:56 <jnewbery> // being such a candidate itself.
15717:57 <lightlike> since time is getting low, last q: Can you think of ways to further improve the new RPC command in the future? (some ideas are mentioned in the PR discussion)
15817:58 <larryruane> jnewbery: that logic in AcceptBlock() seems to prevent DoS vector, because those checks are very quick (we don't even look at the transactions for example)
15917:59 <larryruane> jnewbery: good find, that TODO should be mentioned in the PR!
16018:00 <jnewbery> larryruane: I imagine sjors didn't know about it (I didn't know about it until 3 minutes ago)
16118:00 <larryruane> I just want to say, the review that jnewbery did on this PR is amazing, I hope I can be half as good someday! If you haven't already, check out the resolved comments
16218:00 <lightlike> I think some steps toward automation would make sense, it seems bothersome to request from each peer manually.
16318:01 <lightlike> but time's up
16418:01 <lightlike> #endmeeting