Bitcoin Core PR Review Club

BIP 157: Signal support for compact block filters with NODE_COMPACT_FILTERS (p2p)

Jun 4, 2020

https://github.com/bitcoin/bitcoin/pull/19070

Host: jnewbery - PR author: jnewbery

The PR branch HEAD was f5c003d3e at the time of this review club meeting.

This is the fourth PR in our special series on the BIP 157 implementation. See the previous meetings on PR 18877, PR 18960 and PR 19010.

Notes

This PR adds support for the NODE_COMPACT_FILTERS service bit, used in version and addr messages to signal a node’s capabilities. The existing service bit are defined in protocol.h.
Prior to this PR, even if a node is configured to serve cfilter, cfheaders and cfcheckpts messages, it won’t signal that capability to peers on the network.
This PR includes two important changes from the original implementation in PR 16442:
- PrepareBlockFilterRequest() no longer calls BlockUntilSyncedToCurrentChain() to avoid blocking the message handler thread.
- -peerblockfilters can be set and NODE_COMPACT_FILTERS can be signaled before the index is fully built.
See the justification for those changes and jimpo’s response in the PR. See also the summary of prior discussion in PR 16442.

Questions

What are the implications of PrepareBlockFilterRequest() calling into BlockUntilSyncedToCurrentChain()? What are the implications of not calling BlockUntilSyncedToCurrentChain()? Which is preferable? Are there any alternative solutions?
What are the potential problems with a node signaling NODE_COMPACT_FILTERS before its compact block filters index is built? What are the potential issues if a node changes its service bits dynamically?

Meeting Log

13:00

<jnewbery> #startmeeting

13:00

<jnewbery> Hi! Welcome to the final special BIP 157 review club meeting.

13:00

<michaelfolkson> hi

13:00

<jnewbery> Notes and questions are here: https://bitcoincore.reviews/19070.html

13:01

<jkczyz> hi

13:01

<jnewbery> I think this might be the most interesting of the BIP 157 PRs, since it touches on some protocol design questions

13:02

<jnewbery> The code changes are small and simple, but the design decisions have important implications for BIP 157 clients

13:02

<jnewbery> is someone able to summarize the changes?

13:03

<jonatack> hi!

13:03

<jnewbery> hi jon!

13:03

<michaelfolkson> You mean the changes between jimpo's code and your code or this PR generally?

13:04

<jnewbery> ok, I'll summarize. It's adding signaling for BIP 157 support, using the NODE_COMPACT_FILTERS service bit in version and addr messages

13:05

<jnewbery> michaelfolkson: I meant generally in this PR, but if you want to summarize the differences from Jimpo's implementation, that'd be great

13:06

<michaelfolkson> Ok. One of the changes is that your code signals a node can provide filters for temporarily when it actually can't (as I understand)

13:07

<jnewbery> That's not exactly it

13:07

<jnewbery> it'll signal support for filters before it's fully built the cache. It's able to serve filters from blocks that it's processed

13:08

<michaelfolkson> Ah ok gotcha

13:09

<jnewbery> sorry, s/cache/index

13:09

<michaelfolkson> So it is not as simple saying willing to and unable to versus willing to and able to

13:09

<jnewbery> no

13:10

<jnewbery> with this PR, a node that signals NODE_COMPACT_FILTERS will always be able to serve the filters that it has built, it's just possible that it hasn't fully built the index yet

13:11

<michaelfolkson> I'm stuck in this IBD paradigm. Although there are some parallels they don't map perfectly

13:11

<jnewbery> There's one other change, which is that PrepareBlockFilterRequest() no longer calls BlockUntilSyncedToCurrentChain().

13:11

<jnewbery> What are the implications of PrepareBlockFilterRequest() calling into BlockUntilSyncedToCurrentChain()? What are the implications of not calling BlockUntilSyncedToCurrentChain()? Which is preferable? Are there any alternative solutions?

13:12

<jnewbery> michaelfolkson: it's very similar to IBD. A node will signal NODE_NETWORK before it has completed IBD

13:12

<jkczyz> Calling it can prevent other messages from being processed. Not calling it may lead to requests not being served if the index does not contain the filter for the requested block

13:12

<jnewbery> that means "I can serve blocks". It doesn't mean "I'll be able to serve you any block that you request"

13:13

<jnewbery> jkczyz: exactly. Calling that function blocks the message handling thread on the scheduler thread, which I think we should avoid unless absolutely necessary

13:13

<jnewbery> the scheduler thread may be doing slow things like flushing the wallet to disk

13:14

<jkczyz> jnewbery: What if only some of the requested filters are available? Will we respond with a partial set?

13:15

<jkczyz> Haven't looked into this myself but it just came to mind

13:15

<jnewbery> And not blocking on the scheduler thread means that if we receive a request before the BlockConnected callback to the compact block index has been called, we won't be able to respond to that thread

13:16

<jnewbery> I think we'd reply with just a partial set. Let me check

13:17

<jnewbery> oh no, I'm wrong, we wouldn't respond with any of the filters

13:18

<michaelfolkson> I'm going to keep hammering away at this IBD comparison. I know you discuss it here https://github.com/bitcoin/bitcoin/pull/19070#issuecomment-637056107

13:19

<jnewbery> we return early here: https://github.com/bitcoin/bitcoin/blob/f5c003d3ead182335252558c5c6c9b9ca8968065/src/index/blockfilterindex.cpp#L426-L428

13:19

<jonatack> the original impl would have also entailed full node users having to stop and restart the node once the index finished building

13:19

<jkczyz> michaelfolkson: I was trying to think if there is some analogy there. As in, if a node knows it shouldn't request a block from a peer, could there be a similar mechanism for filters

13:20

<jnewbery> jonatack: right. Well actually the original original implementation toggled the service bit dynamically. Some reviewers (including me) didn't like that, so jimpo changed it to be toggled manually

13:20

<michaelfolkson> But what is the reason for not responding with a partial set? Because filters are smaller than blocks and therefore you should expect to receive all of them in one go?

13:20

<jonatack> yeah, dynamic isn't better, for the reasons everyone stated

13:22

<jnewbery> the main difference with NODE_NETWORK is that in the initial version handshake, a node will tell its peer what its best block height is: https://btcinformation.org/en/developer-reference#version

13:22

<jnewbery> so the peer knows not to ask for higher blocks than that

13:22

<michaelfolkson> jkczyz: Yeah they do not map perfectly but I think they share enough characteristics such that there are a few questions like "Well IBD doesn't do what filter serving is doing. Why not?"

13:22

<MarcoFalke> with block filters the peer also knows not to ask for filters of block that the node hasn't announced, no?

13:24

<jnewbery> MarcoFalke: that's what the BIP says, but I think it's not a great design. I think it should be fine for a node to receive and relay a block before it's built the filter

13:24

<jnewbery> tying those two functions together seems like a bad idea

13:26

<jnewbery> it constrains implementers to couple together internal components that needn't be (eg here, blocking message handling on the scheduler thread so that the index gets built)

13:26

<jkczyz> jnewbery: ah, so doing so would also require re-introducing BlockUntilSyncedToCurrentChain?

13:26

<MarcoFalke> I see your point, but the validation interface queue needs to be processed either way at some point. Otherwise it will "overflow"

13:27

<jonatack> is it relatively common to see implementation development pushback and drive BIP updates?

13:27

<jnewbery> jkczyz: it requires some kind of sychronicity

13:27

<MarcoFalke> jonatack: Often issues with a BIP are only realized after it has been deployed. (E.g. bloomfilters, ouch)

13:28

<sipa> MarcoFalke: that's a fair example, but the issue there wasn't really discovered through implementation

13:28

<michaelfolkson> Steelmanning jimpo's argument. Changing the BIP is no big deal (imo).

13:28

<MarcoFalke> jup, not implementation, but more analysis

13:28

<michaelfolkson> "The BIP was designed to expose a consistent interface to clients, which is conceptually simpler, making it easier to implement client code correctly."

13:29

<michaelfolkson> You disagree with this jnewbery?

13:29

<jnewbery> one other way to achieve the same would be to have some kind of 'queue' of outstanding BIP157 requests, and if they can't be serviced, delay processing them (I put queue in quotes because I think we'd want to limit it to one request)

13:29

<sipa> jnewbery: i think you can also look at it the other way... why should an implementation detail in core affect the BIP?

13:29

<jonatack> MarcoFalke: thanks. that's not terribly surprising.

13:30

<sipa> for all kinds of network-servicable data peers can reasonable expect synchronicity... if you've processed a block, you can give it to others

13:30

<sipa> the bloom filter is weird in that it's started as an optional index, but now is becoming a network-servicable thing

13:30

<jnewbery> that'd be similar to the getdata queue, which can be processed out-of-order with other peers' requests: https://github.com/bitcoin/bitcoin/blob/365f1082e1e6ff1c2f53552c3871223e87a9d43f/src/net_processing.cpp#L3600-L3601

13:30

<jnewbery> (requests from a single peer are still treated serially)

13:31

<jnewbery> I don't think we should do this because it adds complexity, but there are potentially other ways to achieve what's wanted

13:32

<sipa> and if we want to say do block processing in a separate thread as well, like index building, we'll be faced with the same issues, and they need to be solved in a way that doesn't break the protocol synchronicity guarantees

13:32

<sipa> so it feels a bit strange to use a current implementation issue with filters specially

13:33

<jnewbery> sipa: I think my point is that we should avoid synchronicity between the core functionality of relaying blocks and the optional functionality of serving filters

13:33

<sipa> jnewbery: well we already have that, as long as it's an RPC interface

13:33

<sipa> if it's a P2P thing, it should behave P2P like

13:34

<jnewbery> I don't think the block example is the same. We wouldn't relay a block through headers if we hadn't yet processed it

13:35

<jkczyz> I've heard discussion around eventually committing the filter in the block (sorry if that terminology is no precise) -- at that point, does this problem simply go away (i.e., would the index be updated when the block is connected)? Or does the problem still exist for older blocks?

13:35

<sipa> jnewbery: i'm not sure i see the difference

13:37

<jnewbery> for compact blocks we do relay before processing, and I think that's the one place in net processing that we currently block on the scheduler thread: https://github.com/bitcoin/bitcoin/blob/365f1082e1e6ff1c2f53552c3871223e87a9d43f/src/net_processing.cpp#L1479-L1500

13:38

<jnewbery> sipa: if we send a header to a peer, we've already processed the block, so we can respond to a getdata request for it

13:38

<jnewbery> the header message itself is an indication that we can serve the block

13:38

<jnewbery> but why should it also be an indication that we have built the filter

13:39

<sipa> so perhaps we shouldn't send a header message to BIP157 peers before we've built the filter for it?

13:39

<jnewbery> we don't know which peers are BIP157 clients

13:39

<sipa> oh

13:39

<sipa> that's annoying

13:39

<jnewbery> the service bit is on the server side

13:41

<jnewbery> jkczyz: if the filter is committed, then presumably it's required for validation, and therefore a peer sending you a header for that block indicates that they have the filter

13:41

<MarcoFalke> We know they are clients when they send the first request, so what about disconnecting them instead of having them turn thumbs for 20 seconds (or whatever the timeout is)

13:41

<jnewbery> (again, it's slightly different for compact blocks where we relay a compact block before fully validating)

13:42

<sipa> yeah, compact blocks are explicitly asynchronously processed, as an exception to the usual protocol guarantees, with a good reason

13:42

<sipa> it seems BIP157 is written with an assumption of synchronous processing

13:43

<jnewbery> sipa: right, and with an assumption that servers will always be able to service any request

13:44

<sipa> and actually implementing that seems to require not announcing headers until the index has caught up?

13:44

<jnewbery> I'd prefer to not add this message-handler thread blocking, but I don't think it's the end of the world if we do. We're talking about very uncommon circumstances (usually I expect we'd be able to build the filter in the round-trip time between sending a header and receiving a getcfilters request)

13:45

<sipa> given that this functionality is opt-in, perhaps that's not too crazy?

13:45

<MarcoFalke> sipa: Yes, that's a different issue from the thread blocking. I'd say we should be nice to peers and disconnect them

13:46

<sipa> MarcoFalke: i agree

13:46

<jonatack> MarcoFalke: agree

13:46

<sipa> (independently of the thread blocking issue)

13:46

<MarcoFalke> (when the index itself is in "ibd" or syncing phase)

13:46

<MarcoFalke> which might happen when the node itself is already out of ibd

13:48

<jnewbery> sorry, to be clear about what you're all agreeing to: you think that if we've completed IBD but not yet finished building the index, we should disconnect and peers that request cfilters?

13:48

<MarcoFalke> jup, any blockfilter message that we can't service

13:48

<sipa> i think that whenever we would be not responding to a cfilters request, we should disconnect instead

13:50

<sipa> i also think it would be cleaner that whenever reasonable (post IBD?), we should wait for the index to sync and respond, instead of not responding/disconnecting

13:50

<jnewbery> I think that's reasonable, but would need to think about it

13:50

<MarcoFalke> if the peer asks for the filters from block [1000, 1500] and we can send them, sure. Though if we are out of ibd and at block 700k and they ask for the filter for block 700k -> disconnect

13:50

<jonatack> In general, when faced with a necessary reasonable tradeoff between user experience for full node users versus that of light clients, I'd favor the full node users.

13:50

<jnewbery> sipa: so add the BlockUntilSyncedToCurrentChain() back?

13:51

<MarcoFalke> jnewbery: That one can't be solved by a call to Sync()

13:51

<jnewbery> MarcoFalke: I understand. Trying to clarify sipa's "we should wait for the index to sync and respond"

13:52

<MarcoFalke> ah, missed that msg

13:52

<sipa> it feels to me that BIP157, as written, assumes synchronicity - so if we should either implement it that way, or change the BIP

13:52

<sipa> or not implement it

13:52

<sipa> but this is not a trivial change

13:53

<jnewbery> We only have 10 minutes left, but I did want to cover the second question

13:53

<jnewbery> What are the potential problems with a node signaling NODE_COMPACT_FILTERS before its compact block filters index is built? What are the potential issues if a node changes its service bits dynamically?

13:53

<sipa> and given that the protocol is an entirely opt-in extension, i think just adding syncs to make it act synchronously is not a big deal

13:53

<jnewbery> We've already talked quite a bit about some of the implications, but I was specifically interested in how the service bits are cached in various places

13:54

<jnewbery> like in other nodes' addrmans, and in the seed nodes

13:54

<sipa> they're overwritten on connection, so i don't think changing them on the fly is a big deal - as long as a reasonable degree of caching remains useful

13:54

<jnewbery> if the service bits change dynamically, is it possible that other nodes will be gossipping our old service bits for a long time?

13:55

<sipa> if you'd expect service bits to flip-flop constantly, there is a problem

13:56

<sipa> but i don't see much of a problem with changing them once a condition is fulfilled so that they won't change back

13:56

<MarcoFalke> It shouldn't matter much either way

13:56

<michaelfolkson> jnewbery: I don't know... It depends on how effective gossip floods the entire network?

13:56

<michaelfolkson> *effectively

13:57

<michaelfolkson> It is not flapping. It is old service bit -> new service bit right?

13:58

<michaelfolkson> It is one way. It doesn't return back to the old service bit ever

13:58

<jnewbery> Right. It's going from not present to present

13:58

<MarcoFalke> It latches to true (assuming no restart)

13:58

<sipa> i think (but i'm not sure) that the propagation of the gossip addr messages with the new flag is just as fast as the original announcement

13:59

<jnewbery> how does the new flag overwrite the old flag?

14:00

<sipa> the addr message with the new flag in is gossiped, which will overwrite the entry in addrman

14:00

<sipa> upon connection to the actual peer it's also overwritten

14:00

<MarcoFalke> sipa: But the light peer won't connect if the flag isn't set

14:00

<sipa> sure

14:01

<MarcoFalke> So it should be slightly slower

14:01

<jnewbery> I think I need to look more into how addrs are gossipped and how old data is refreshed

14:01

<sipa> i don't think there is a difference really

14:02

<sipa> if you initially gossip with flag "incorrectly" on, feature-needing peers will connect, and need to disconnect and retry a different peer

14:02

<MarcoFalke> sipa: Oh your point is that the propagation into addrmans does not slow down?

14:02

<MarcoFalke> I assumed "light clients will connect to you just as fast as if you were setting the flag a day in advance"

14:02

<sipa> yeah, my claim - but i'd need to look this up myself - is that the "updating" of the flag is not slower whether or not it has been broadcast already in the past

14:03

<jonatack> hmmmm

14:03

<sipa> so there is just the window during which peers think you're still on the old feature, but you're really already synced up to the new one, that matters

14:03

<jnewbery> ok, that's time! Thanks for input, everyone.

14:04

<sipa> thanks!

14:04

<michaelfolkson> Thanks jnewbery

14:04

* jnewbery goes to dig into addr gossip

14:04

<MarcoFalke> jnewbery: Thanks for hosting. And thanks for picking up this pull request. I don't think it would have made it this far by now if you hadn't put in the effort!

14:04

<jonatack> this ^

14:05

<jkczyz> yeah, thanks jnewbery

14:05

<jonatack> thanks! very interesting

14:06

<jnewbery> MarcoFalke: thanks, and thanks for all your review :)