BIP 157: Signal support for compact block filters with NODE_COMPACT_FILTERS (p2p)

https://github.com/bitcoin/bitcoin/pulls/19070

Host: jnewbery  -  PR author: jnewbery

The PR branch HEAD was f5c003d3e at the time of this review club meeting.

This is the fourth PR in our special series on the BIP 157 implementation. See the previous meetings on PR 18877, PR 18960 and PR 19010.

Notes

  • This PR adds support for the NODE_COMPACT_FILTERS service bit, used in version and addr messages to signal a node’s capabilities. The existing service bit are defined in protocol.h.

  • Prior to this PR, even if a node is configured to serve cfilter, cfheaders and cfcheckpts messages, it won’t signal that capability to peers on the network.

  • This PR includes two important changes from the original implementation in PR 16442:

    • PrepareBlockFilterRequest() no longer calls BlockUntilSyncedToCurrentChain() to avoid blocking the message handler thread.

    • -peerblockfilters can be set and NODE_COMPACT_FILTERS can be signaled before the index is fully built.

    See the justification for those changes and jimpo’s response in the PR. See also the summary of prior discussion in PR 16442.

Questions

  • What are the implications of PrepareBlockFilterRequest() calling into BlockUntilSyncedToCurrentChain()? What are the implications of not calling BlockUntilSyncedToCurrentChain()? Which is preferable? Are there any alternative solutions?

  • What are the potential problems with a node signaling NODE_COMPACT_FILTERS before its compact block filters index is built? What are the potential issues if a node changes its service bits dynamically?

Meeting Log

  113:00 <jnewbery> #startmeeting
  213:00 <jnewbery> Hi! Welcome to the final special BIP 157 review club meeting.
  313:00 <michaelfolkson> hi
  413:00 <jnewbery> Notes and questions are here: https://bitcoincore.reviews/19070.html
  513:01 <jkczyz> hi
  613:01 <jnewbery> I think this might be the most interesting of the BIP 157 PRs, since it touches on some protocol design questions
  713:02 <jnewbery> The code changes are small and simple, but the design decisions have important implications for BIP 157 clients
  813:02 <jnewbery> is someone able to summarize the changes?
  913:03 <jonatack> hi!
 1013:03 <jnewbery> hi jon!
 1113:03 <michaelfolkson> You mean the changes between jimpo's code and your code or this PR generally?
 1213:04 <jnewbery> ok, I'll summarize. It's adding signaling for BIP 157 support, using the NODE_COMPACT_FILTERS service bit in version and addr messages
 1313:05 <jnewbery> michaelfolkson: I meant generally in this PR, but if you want to summarize the differences from Jimpo's implementation, that'd be great
 1413:06 <michaelfolkson> Ok. One of the changes is that your code signals a node can provide filters for temporarily when it actually can't (as I understand)
 1513:07 <jnewbery> That's not exactly it
 1613:07 <jnewbery> it'll signal support for filters before it's fully built the cache. It's able to serve filters from blocks that it's processed
 1713:08 <michaelfolkson> Ah ok gotcha
 1813:09 <jnewbery> sorry, s/cache/index
 1913:09 <michaelfolkson> So it is not as simple saying willing to and unable to versus willing to and able to
 2013:09 <jnewbery> no
 2113:10 <jnewbery> with this PR, a node that signals NODE_COMPACT_FILTERS will always be able to serve the filters that it has built, it's just possible that it hasn't fully built the index yet
 2213:11 <michaelfolkson> I'm stuck in this IBD paradigm. Although there are some parallels they don't map perfectly
 2313:11 <jnewbery> There's one other change, which is that PrepareBlockFilterRequest() no longer calls BlockUntilSyncedToCurrentChain().
 2413:11 <jnewbery> What are the implications of PrepareBlockFilterRequest() calling into BlockUntilSyncedToCurrentChain()? What are the implications of not calling BlockUntilSyncedToCurrentChain()? Which is preferable? Are there any alternative solutions?
 2513:12 <jnewbery> michaelfolkson: it's very similar to IBD. A node will signal NODE_NETWORK before it has completed IBD
 2613:12 <jkczyz> Calling it can prevent other messages from being processed. Not calling it may lead to requests not being served if the index does not contain the filter for the requested block
 2713:12 <jnewbery> that means "I can serve blocks". It doesn't mean "I'll be able to serve you any block that you request"
 2813:13 <jnewbery> jkczyz: exactly. Calling that function blocks the message handling thread on the scheduler thread, which I think we should avoid unless absolutely necessary
 2913:13 <jnewbery> the scheduler thread may be doing slow things like flushing the wallet to disk
 3013:14 <jkczyz> jnewbery: What if only some of the requested filters are available? Will we respond with a partial set?
 3113:15 <jkczyz> Haven't looked into this myself but it just came to mind
 3213:15 <jnewbery> And not blocking on the scheduler thread means that if we receive a request before the BlockConnected callback to the compact block index has been called, we won't be able to respond to that thread
 3313:16 <jnewbery> I think we'd reply with just a partial set. Let me check
 3413:17 <jnewbery> oh no, I'm wrong, we wouldn't respond with any of the filters
 3513:18 <michaelfolkson> I'm going to keep hammering away at this IBD comparison. I know you discuss it here https://github.com/bitcoin/bitcoin/pull/19070#issuecomment-637056107
 3613:19 <jnewbery> we return early here: https://github.com/bitcoin/bitcoin/blob/f5c003d3ead182335252558c5c6c9b9ca8968065/src/index/blockfilterindex.cpp#L426-L428
 3713:19 <jonatack> the original impl would have also entailed full node users having to stop and restart the node once the index finished building
 3813:19 <jkczyz> michaelfolkson: I was trying to think if there is some analogy there. As in, if a node knows it shouldn't request a block from a peer, could there be a similar mechanism for filters
 3913:20 <jnewbery> jonatack: right. Well actually the original original implementation toggled the service bit dynamically. Some reviewers (including me) didn't like that, so jimpo changed it to be toggled manually
 4013:20 <michaelfolkson> But what is the reason for not responding with a partial set? Because filters are smaller than blocks and therefore you should expect to receive all of them in one go?
 4113:20 <jonatack> yeah, dynamic isn't better, for the reasons everyone stated
 4213:22 <jnewbery> the main difference with NODE_NETWORK is that in the initial version handshake, a node will tell its peer what its best block height is: https://btcinformation.org/en/developer-reference#version
 4313:22 <jnewbery> so the peer knows not to ask for higher blocks than that
 4413:22 <michaelfolkson> jkczyz: Yeah they do not map perfectly but I think they share enough characteristics such that there are a few questions like "Well IBD doesn't do what filter serving is doing. Why not?"
 4513:22 <MarcoFalke> with block filters the peer also knows not to ask for filters of block that the node hasn't announced, no?
 4613:24 <jnewbery> MarcoFalke: that's what the BIP says, but I think it's not a great design. I think it should be fine for a node to receive and relay a block before it's built the filter
 4713:24 <jnewbery> tying those two functions together seems like a bad idea
 4813:26 <jnewbery> it constrains implementers to couple together internal components that needn't be (eg here, blocking message handling on the scheduler thread so that the index gets built)
 4913:26 <jkczyz> jnewbery: ah, so doing so would also require re-introducing BlockUntilSyncedToCurrentChain?
 5013:26 <MarcoFalke> I see your point, but the validation interface queue needs to be processed either way at some point. Otherwise it will "overflow"
 5113:27 <jonatack> is it relatively common to see implementation development pushback and drive BIP updates?
 5213:27 <jnewbery> jkczyz: it requires some kind of sychronicity
 5313:27 <MarcoFalke> jonatack: Often issues with a BIP are only realized after it has been deployed. (E.g. bloomfilters, ouch)
 5413:28 <sipa> MarcoFalke: that's a fair example, but the issue there wasn't really discovered through implementation
 5513:28 <michaelfolkson> Steelmanning jimpo's argument. Changing the BIP is no big deal (imo).
 5613:28 <MarcoFalke> jup, not implementation, but more analysis
 5713:28 <michaelfolkson> "The BIP was designed to expose a consistent interface to clients, which is conceptually simpler, making it easier to implement client code correctly."
 5813:29 <michaelfolkson> You disagree with this jnewbery?
 5913:29 <jnewbery> one other way to achieve the same would be to have some kind of 'queue' of outstanding BIP157 requests, and if they can't be serviced, delay processing them (I put queue in quotes because I think we'd want to limit it to one request)
 6013:29 <sipa> jnewbery: i think you can also look at it the other way... why should an implementation detail in core affect the BIP?
 6113:29 <jonatack> MarcoFalke: thanks. that's not terribly surprising.
 6213:30 <sipa> for all kinds of network-servicable data peers can reasonable expect synchronicity... if you've processed a block, you can give it to others
 6313:30 <sipa> the bloom filter is weird in that it's started as an optional index, but now is becoming a network-servicable thing
 6413:30 <jnewbery> that'd be similar to the getdata queue, which can be processed out-of-order with other peers' requests: https://github.com/bitcoin/bitcoin/blob/365f1082e1e6ff1c2f53552c3871223e87a9d43f/src/net_processing.cpp#L3600-L3601
 6513:30 <jnewbery> (requests from a single peer are still treated serially)
 6613:31 <jnewbery> I don't think we should do this because it adds complexity, but there are potentially other ways to achieve what's wanted
 6713:32 <sipa> and if we want to say do block processing in a separate thread as well, like index building, we'll be faced with the same issues, and they need to be solved in a way that doesn't break the protocol synchronicity guarantees
 6813:32 <sipa> so it feels a bit strange to use a current implementation issue with filters specially
 6913:33 <jnewbery> sipa: I think my point is that we should avoid synchronicity between the core functionality of relaying blocks and the optional functionality of serving filters
 7013:33 <sipa> jnewbery: well we already have that, as long as it's an RPC interface
 7113:33 <sipa> if it's a P2P thing, it should behave P2P like
 7213:34 <jnewbery> I don't think the block example is the same. We wouldn't relay a block through headers if we hadn't yet processed it
 7313:35 <jkczyz> I've heard discussion around eventually committing the filter in the block (sorry if that terminology is no precise) -- at that point, does this problem simply go away (i.e., would the index be updated when the block is connected)? Or does the problem still exist for older blocks?
 7413:35 <sipa> jnewbery: i'm not sure i see the difference
 7513:37 <jnewbery> for compact blocks we do relay before processing, and I think that's the one place in net processing that we currently block on the scheduler thread: https://github.com/bitcoin/bitcoin/blob/365f1082e1e6ff1c2f53552c3871223e87a9d43f/src/net_processing.cpp#L1479-L1500
 7613:38 <jnewbery> sipa: if we send a header to a peer, we've already processed the block, so we can respond to a getdata request for it
 7713:38 <jnewbery> the header message itself is an indication that we can serve the block
 7813:38 <jnewbery> but why should it also be an indication that we have built the filter
 7913:39 <sipa> so perhaps we shouldn't send a header message to BIP157 peers before we've built the filter for it?
 8013:39 <jnewbery> we don't know which peers are BIP157 clients
 8113:39 <sipa> oh
 8213:39 <sipa> that's annoying
 8313:39 <jnewbery> the service bit is on the server side
 8413:41 <jnewbery> jkczyz: if the filter is committed, then presumably it's required for validation, and therefore a peer sending you a header for that block indicates that they have the filter
 8513:41 <MarcoFalke> We know they are clients when they send the first request, so what about disconnecting them instead of having them turn thumbs for 20 seconds (or whatever the timeout is)
 8613:41 <jnewbery> (again, it's slightly different for compact blocks where we relay a compact block before fully validating)
 8713:42 <sipa> yeah, compact blocks are explicitly asynchronously processed, as an exception to the usual protocol guarantees, with a good reason
 8813:42 <sipa> it seems BIP157 is written with an assumption of synchronous processing
 8913:43 <jnewbery> sipa: right, and with an assumption that servers will always be able to service any request
 9013:44 <sipa> and actually implementing that seems to require not announcing headers until the index has caught up?
 9113:44 <jnewbery> I'd prefer to not add this message-handler thread blocking, but I don't think it's the end of the world if we do. We're talking about very uncommon circumstances (usually I expect we'd be able to build the filter in the round-trip time between sending a header and receiving a getcfilters request)
 9213:45 <sipa> given that this functionality is opt-in, perhaps that's not too crazy?
 9313:45 <MarcoFalke> sipa: Yes, that's a different issue from the thread blocking. I'd say we should be nice to peers and disconnect them
 9413:46 <sipa> MarcoFalke: i agree
 9513:46 <jonatack> MarcoFalke: agree
 9613:46 <sipa> (independently of the thread blocking issue)
 9713:46 <MarcoFalke> (when the index itself is in "ibd" or syncing phase)
 9813:46 <MarcoFalke> which might happen when the node itself is already out of ibd
 9913:48 <jnewbery> sorry, to be clear about what you're all agreeing to: you think that if we've completed IBD but not yet finished building the index, we should disconnect and peers that request cfilters?
10013:48 <MarcoFalke> jup, any blockfilter message that we can't service
10113:48 <sipa> i think that whenever we would be not responding to a cfilters request, we should disconnect instead
10213:50 <sipa> i also think it would be cleaner that whenever reasonable (post IBD?), we should wait for the index to sync and respond, instead of not responding/disconnecting
10313:50 <jnewbery> I think that's reasonable, but would need to think about it
10413:50 <MarcoFalke> if the peer asks for the filters from block [1000, 1500] and we can send them, sure. Though if we are out of ibd and at block 700k and they ask for the filter for block 700k -> disconnect
10513:50 <jonatack> In general, when faced with a necessary reasonable tradeoff between user experience for full node users versus that of light clients, I'd favor the full node users.
10613:50 <jnewbery> sipa: so add the BlockUntilSyncedToCurrentChain() back?
10713:51 <MarcoFalke> jnewbery: That one can't be solved by a call to Sync()
10813:51 <jnewbery> MarcoFalke: I understand. Trying to clarify sipa's "we should wait for the index to sync and respond"
10913:52 <MarcoFalke> ah, missed that msg
11013:52 <sipa> it feels to me that BIP157, as written, assumes synchronicity - so if we should either implement it that way, or change the BIP
11113:52 <sipa> or not implement it
11213:52 <sipa> but this is not a trivial change
11313:53 <jnewbery> We only have 10 minutes left, but I did want to cover the second question
11413:53 <jnewbery> What are the potential problems with a node signaling NODE_COMPACT_FILTERS before its compact block filters index is built? What are the potential issues if a node changes its service bits dynamically?
11513:53 <sipa> and given that the protocol is an entirely opt-in extension, i think just adding syncs to make it act synchronously is not a big deal
11613:53 <jnewbery> We've already talked quite a bit about some of the implications, but I was specifically interested in how the service bits are cached in various places
11713:54 <jnewbery> like in other nodes' addrmans, and in the seed nodes
11813:54 <sipa> they're overwritten on connection, so i don't think changing them on the fly is a big deal - as long as a reasonable degree of caching remains useful
11913:54 <jnewbery> if the service bits change dynamically, is it possible that other nodes will be gossipping our old service bits for a long time?
12013:55 <sipa> if you'd expect service bits to flip-flop constantly, there is a problem
12113:56 <sipa> but i don't see much of a problem with changing them once a condition is fulfilled so that they won't change back
12213:56 <MarcoFalke> It shouldn't matter much either way
12313:56 <michaelfolkson> jnewbery: I don't know... It depends on how effective gossip floods the entire network?
12413:56 <michaelfolkson> *effectively
12513:57 <michaelfolkson> It is not flapping. It is old service bit -> new service bit right?
12613:58 <michaelfolkson> It is one way. It doesn't return back to the old service bit ever
12713:58 <jnewbery> Right. It's going from not present to present
12813:58 <MarcoFalke> It latches to true (assuming no restart)
12913:58 <sipa> i think (but i'm not sure) that the propagation of the gossip addr messages with the new flag is just as fast as the original announcement
13013:59 <jnewbery> how does the new flag overwrite the old flag?
13114:00 <sipa> the addr message with the new flag in is gossiped, which will overwrite the entry in addrman
13214:00 <sipa> upon connection to the actual peer it's also overwritten
13314:00 <MarcoFalke> sipa: But the light peer won't connect if the flag isn't set
13414:00 <sipa> sure
13514:01 <MarcoFalke> So it should be slightly slower
13614:01 <jnewbery> I think I need to look more into how addrs are gossipped and how old data is refreshed
13714:01 <sipa> i don't think there is a difference really
13814:02 <sipa> if you initially gossip with flag "incorrectly" on, feature-needing peers will connect, and need to disconnect and retry a different peer
13914:02 <MarcoFalke> sipa: Oh your point is that the propagation into addrmans does not slow down?
14014:02 <MarcoFalke> I assumed "light clients will connect to you just as fast as if you were setting the flag a day in advance"
14114:02 <sipa> yeah, my claim - but i'd need to look this up myself - is that the "updating" of the flag is not slower whether or not it has been broadcast already in the past
14214:03 <jonatack> hmmmm
14314:03 <sipa> so there is just the window during which peers think you're still on the old feature, but you're really already synced up to the new one, that matters
14414:03 <jnewbery> ok, that's time! Thanks for input, everyone.
14514:04 <sipa> thanks!
14614:04 <michaelfolkson> Thanks jnewbery
14714:04 * jnewbery goes to dig into addr gossip
14814:04 <MarcoFalke> jnewbery: Thanks for hosting. And thanks for picking up this pull request. I don't think it would have made it this far by now if you hadn't put in the effort!
14914:04 <jonatack> this ^
15014:05 <jkczyz> yeah, thanks jnewbery
15114:05 <jonatack> thanks! very interesting
15214:06 <jnewbery> MarcoFalke: thanks, and thanks for all your review :)