Bitcoin Core PR Review Club

BIP 157: Add compact block filter headers cache (utxo db and indexes)

May 14, 2020

https://github.com/bitcoin/bitcoin/pull/18960

Host: jnewbery - PR authors: jnewbery , jimpo

This is the second special Bitcoin Core PR Review Club looking at the BIP 157 implementation. Today, we’ll look at the second PR in the series: Add compact block filter headers cache.

Notes

BIP 157 defines checkpoints, which are compact block filter headers at 1000 block intervals. Newly syncing clients can fetch the checkpoints first (potentially from a trusted source) and then fill in the intermediate filter headers, and finally the filters themselves. Because the filter headers form a hash chain, the client can verify that they’ve been provided with the correct filters and headers by checking against the checkpoints.
Since the checkpoint filter headers are the potentially the most frequently served data, it makes sense to cache them in memory rather than read them from disk every time.
I’ve taken a different approach from jimpo’s original implementation, so in this review club, we’ll take a look at three different approaches:
1. The original implementation, which added an active_chain_cf_headers vector in net_processing to cache the block filter headers, and updated that cache in the UpdatedBlockTip() callback.
2. A second implementation, which uses the same active_chain_cf_headers, but updates it in the ProcessGetCFCheckPt() function.
3. A third implementation, which moves the cache into the BlockFilterIndex class, and doesn’t change net processing logic.

Questions

Why is a lock added in the first and second implementations? Is it also needed in the third implementation?
The first implementation could cause a data race on shutdown (full log). Can you find where the data race is?
What are the main differences between the implementations? Which approach do you prefer?

Meeting Log

13:00

<jnewbery> #startmeeting

13:00

<jnewbery> hi

13:00

<michaelfolkson> hi

13:00

<theStack> hi

13:00

<troygiorshev> hi

13:00

<fjahr> hi

13:01

<jkczyz> hi

13:01

<jonatack> hi

13:01

<Caralie> hi

13:01

<lightlike> hi

13:01

<MarcoFalke> hi

13:01

<andrewtoth> hi

13:01

<jnewbery> thanks for joining pt 2 of the BIP 157 special edition review club series!

13:01

<jnewbery> notes/questions in the normal place: https://bitcoincore.reviews/18960.html

13:02

<jnewbery> who had a chance to review the PR? y/n

13:02

<thomasb06> hi

13:02

<theStack> y

13:02

<troygiorshev> n

13:02

<andrewtoth> n

13:02

<jkczyz> y

13:02

<thomasb06> n

13:03

<fjahr> y earlier but missed that there were notes

13:03

<jnewbery> and for those that did review it, did you look at the alternative cache implementations in https://github.com/bitcoin-core-review-club/bitcoin/releases/tag/pr18960.1 and https://github.com/bitcoin-core-review-club/bitcoin/releases/tag/pr18960.2 ?

13:03

<jkczyz> yup

13:04

<michaelfolkson> Somewhat. Not to a point where I understood exactly what was going on in all 3

13:04

<jnewbery> great. So PR 18960 is fairly straightforward and small, so I think the interesting discussion this week will be around how the implementations differ

13:05

<jnewbery> Is anyone able to give a high-level summary of the different approaches?

13:06

<jkczyz> Roughly: (1) populates the cache as blocks are connected and needs to manage reorgs (2) populates when filters are requested in the message handler (3) moves the cache to the BlockFilterIndex

13:06

<theStack> 1) update cache on block tip update, cache by height 2) update cache on getcfcheckpt request, cache by height 3) update cache on getfcheckpt request, cache by block hash

13:07

<jnewbery> jkczyz theStack: great summaries. Thanks!

13:07

<jkczyz> theStack: ah yes, height vs hash too

13:07

<jnewbery> So q1: why are locks added? Are they needed?

13:08

<jkczyz> I suppose it would be needed if populating and querying are done on separate threads, which is not in (3) but I assume so in (1) and (2)

13:09

<theStack> they are needed because LookupFilterHeader could be also called from a different thread -- currently an RPC call i think

13:09

<jnewbery> jkczyz: yes. We don't want to read/write the same data in different threads without a lock

13:10

<jnewbery> theStack: good spot. I think we can call LookupFilterHeader() from both the message handler thread (net processing) and an RPC thread

13:10

<jnewbery> so without a lock there could be a race

13:11

<jnewbery> I guess I should say http helper thread, since after https://github.com/bitcoin/bitcoin/pull/17631, the filter headers will also be fetchable over REST

13:12

<jkczyz> so needed in (3) actually

13:12

<jnewbery> I think implementation (2) strictly doesn't need a lock since all the reading/writing of the cache happens in the message handler thread

13:12

<jnewbery> jkczyz: right

13:13

<jnewbery> ok, onto q2. The first implementation could cause a data race on shutdown (full log). Can you find where the data race is?

13:14

<jnewbery> (and as always, feel free to ask any questions any time. I only put the set questions out to prompt discussion)

13:15

<theStack> i saw that the original implementation used std::mutex, while in the codebase mostly our custom (Recursive)Mutex seems to be used -- is there any guide on when to use what?

13:16

<jnewbery> MarcoFalke is probably best placed to answer this, if he's still around

13:17

<jnewbery> RecursiveMutex allows us to take locks multiple times, which happens in some places. I think a long term goal might be to git rid of that usage pattern?

13:18

<aj> our custom Mutex (and RecursiveMutex) are just the standard ones, with extra code to debug lock order violations (which might result in deadlocks) and to support clangs thread safety compile time checks

13:20

<jnewbery> The data race on shutdown was in a UpdatedBlockTip() callback. See L5727 of https://travis-ci.org/github/bitcoin/bitcoin/jobs/665594933. I don't know exactly why that happens, but I guess it's what caused jimpo to move to implementation (2), which doesn't use that callback

13:20

<theStack> aj: ok, so i deduce from that that it's general a good idea to use them

13:21

<jnewbery> perhaps it'd be a good idea to talk aout performance. theStack, I see you've done some tests. Can you summarise what you did?

13:21

<sipa> for a mutex which is a private field in a class, which is not exposed to any external modules, shouldn't need our lock assertion wrappers

13:23

<aj> sipa: and none of the class's member functions take any other locks, or call functions that take any other locks? otherwise you could still get lock order violations?

13:23

<fjahr> here is the change he pushed after sipa pointed out the data race: https://github.com/bitcoin/bitcoin/compare/f68aa2c1a40afb49969b9120ec43e2a5040724f1..7608b32fc27254c84c16090af9a32d6bbdf60ec7#diff-eff7adeaec73a769788bb78858815c91

13:23

<theStack> jnewbery: sure. basically the idea was to send repeatedly send getcfcheckpt requests to a full node (as it has enough blocks to take a substantial amout of time, compared to e.g. a regtest node) and measure how long it would take -- with and without the cache

13:24

<theStack> i summarized the results here, but they are not very exciting, there doesn't seem to be much difference: https://github.com/bitcoin/bitcoin/pull/18960#issuecomment-628761578

13:24

<theStack> maybe it's my testing approach though :)

13:26

<jnewbery> theStack: thanks for sharing. I wonder how much of the time is parsing/serialization/message sending overhead

13:27

<jnewbery> I think a better test might be to add LogPrint statements before and after the call to ProcessGetCFCheckPt() and then check debug.log

13:28

<theStack> jnewbery: yes the mentioned overhead could definitely distort the result; also the ping part of send_and_ping(...) is included in the time, maybe it would be better to directly wait for the cfcheckpt answer

13:28

<theStack> jnewbery: ah, nice idea with the LogPrint statements!

13:28

<fjahr> testing this in isolation like this guarantees that the leveldb cache will be hit in the version where there is no in memory cache, right? I am not sure if that would always be the case for a node running in the network.

13:29

<fjahr> but probably more likely the more adoption we see, so I guess it's ok?

13:29

<jnewbery> fjahr: right. I think that's where testing it gets tricky. I don't know where other caching happens

13:31

<jnewbery> Perhaps running this on a main net node and periodically sending getcfcheckpt messages is a more realistic test

13:31

<jkczyz> theStack: did you notice much difference between the first and subsequent runs? One difference between the implementations is (3) needs to warm up the cache on the initial fetch

13:32

<jkczyz> I guess (2) does as well

13:33

<jnewbery> jkczyz: right. (2) and (3) need an initial getcfcheckpt request to warm the cache. (1) needs a new block to warm the cache

13:33

<theStack> jkczyz: not really, on the posted result the request for up to block 630000 (i.e. 630 checkpoint headers) takes initially 50.92ms and the average of 99 more subsequent requests is 50.8ms

13:34

<jnewbery> It's possible to modify (3) to automatically fill the cache on startup, but I didn't think it was worth adding the code just to make the first getcfcheckpt response faster

13:35

<jnewbery> theStack: I expect much of that 50ms is overhead. If you think about how the message gets sent from the python test framework to the node, and then added to the peers receive buffer, and then the buffer processed on a messagehanderthread loop, that's got to be a lot of it.

13:36

<theStack> jnewbery: yes, that's also my assumption here

13:36

<jnewbery> There are 100ms sleeps in the messagehandler thread between looping through all peers: https://github.com/jnewbery/bitcoin/blob/7ad83ed252914d6d5b8ed81f103aecf052c68fb7/src/net.cpp#L2061

13:37

<thomasb06> where the corresponding tests are?

13:37

<jnewbery> on a non-busy node, 50ms is the average time you have to wait for the message handler thread to wake up

13:37

<theStack> it's a pity that there is no easy way to just create a node with a huge block count

13:37

<theStack> jnewbery: oh, that explains a lot then

13:38

<jnewbery> thomasb06: https://github.com/jnewbery/bitcoin/blob/7ad83ed252914d6d5b8ed81f103aecf052c68fb7/test/functional/p2p_blockfilters.py#L71

13:38

<thomasb06> thanks

13:38

<jnewbery> theStack: yeah, I only just realised that myself

13:38

<theStack> then i guess your suggested approach with logging directly in the C++ code itself is really the way to go; given that the time resolution in the log files is small enough

13:39

<theStack> (if not, one can sure find another way to directly measure the elapsed time)

13:39

<jnewbery> yes, time resolution in the logging files can be microseconds

13:40

<jnewbery> take a look at https://github.com/jnewbery/bitcoin/blob/7ad83ed252914d6d5b8ed81f103aecf052c68fb7/test/functional/test_framework/test_node.py#L355. Perhaps that can be modified to take timestamps from the debug log.

13:40

<jnewbery> it's a way for the test framework to read the node's debug log during a test

13:43

<jkczyz> Is the cache recommended in BIP 157? I couldn't find it just now but thought I had read about it awhile back

13:43

<jnewbery> ok, final question from me: Which approach do you prefer? Why? 1/2 are acceptable answers :)

13:43

<theStack> jnewbery: good idea. i was thinking about manually analyzing the log in worst case, but of course if that can be automatized as well it'd be awesome

13:43

<jnewbery> jkczyz: I thought I remembered reading it in a bip or somewhere else, but I couldn't see it last time I checked

13:44

<theStack> jkczyz: i don't think BIPs usually describe implementation details about performance improvements in such a detailled manner

13:44

<jkczyz> I like (3) for its simplicity (no need to worry about re-orgs) and because it separates caching from the network layer

13:45

<jnewbery> It doesn't seem to be in the BIP history, so maybe I imagined it

13:45

<jnewbery> ah. Here it is: https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki#node-operation: "They also SHOULD keep every checkpoint header in memory, so that getcfcheckpt requests do not result in many random-access disk reads."

13:46

<thomasb06> talking about node operation, what are filters for?

13:47

<theStack> jnewbery: also prefer (3) for its absolute simplicity! from an algorithmic perspective, the cache access in (1)/(2) happens in O(1) though, while in (3) it's O(log N), but considering the small amount of elements in the cache, this doesn't really matter

13:47

<fjahr> I also prefer (3) for it's simplicity

13:47

<michaelfolkson> Yeah 3 does seem better. Did you write it from scratch jnewbery or did you look at the first two and think there are better ways to do this?

13:48

<michaelfolkson> thomasb06: Filters are for light clients. So they can work out which blocks they need to request from full nodes

13:48

<thomasb06> michaelfolkson: ok

13:49

<nehan> nice property of (3) (i think) is that the cache is only populated if this feature is actually used (requests are made)

13:49

<jnewbery> theStack: BIPs do often go into implementation details that are not strictly necessary to the protocol. I'd argue that the schnoor BIP's section on signing: https://github.com/bitcoin/bips/blob/master/bip-0340.mediawiki#default-signing, which documents lots of footguns is similar in some ways.

13:51

<jnewbery> michaelfolkson: moving the cache to the indexer was one of my requests in the original BIP: https://github.com/bitcoin/bitcoin/pull/16442#issuecomment-555256244 , so I wanted to implement it to see what it would look like

13:51

<nehan> i had a question about (3): the lock is taken out twice in LookupFilterHeader(). what might have changed between those two locks? https://github.com/bitcoin-core-review-club/bitcoin/commit/7ad83ed252914d6d5b8ed81f103aecf052c68fb7#diff-5ad30723a078a5a4b5f92c9aa0369ac2R412

13:52

<thomasb06> what is a DoS attack, denial of service?

13:52

<nehan> could the header have been inserted somewhere else?

13:52

<michaelfolkson> thomasb06: Yes

13:53

<jnewbery> nehan: you're right that the cache is only populated if it's used in implementation (3). I think that's also true in implementation (2), but not (1), which will always fill the cache when the chain advances

13:53

<jkczyz> nehan: I think it may be to avoid holding the lock when it is not necessary to do so

13:53

<jnewbery> nehan: that's a great question! Does anyone have any thoughts?

13:54

<thomasb06> michaelfolkson: so what's wrong if nodes generate filters dynamically?

13:54

<jnewbery> when you say 'could the header have been inserted somewhere else?' do you mean in a different thread?

13:54

<nehan> jnewbery: yes

13:54

<theStack> i'd say it's generally a good practice to not hold locks longer than necessary

13:55

<nehan> releasing and taking out a lock again in the same function means the world might have changed out from under you.

13:55

<michaelfolkson> For the benefit of future code changes theStack?

13:56

<sipa> michaelfolkson: to reduce latency of other code waiting on the lock

13:56

<michaelfolkson> Ah ok

13:56

<jnewbery> right. So we'd need to check where LookupFilterHeader() could be called from, which is net_processing (when a getcfcheckpt request is received) or rpc/blockchain (when the getblockfilter RPC is called)

13:56

<jnewbery> so there is the possibility that there are two threads executing this function simultaneously. Is that a problem?

13:57

<nehan> i guess they'd just both put the same thing in the map

13:57

<jnewbery> does 'the world changing out from under us' cause a problem in this case?

13:57

<fjahr> nehan: the operations don't depend on each other so I think it does not matter

13:57

<nehan> fjahr: the operations definitely depend on each other. i'm not sure i understand what you're saying

13:58

<michaelfolkson> thomasb06: I don't know what you mean in this example by "dynamically". What doc are you referring to?

13:58

<nehan> fjahr: it's a get - no return - put paradigm. the thing might have been put in there by another thread

13:59

<thomasb06> michaelfolkson: the Node Operation: "Nodes SHOULD NOT generate filters dynamically on request, as malicious peers may be able to perform DoS attacks by requesting small filters derived from large blocks."

13:59

<jnewbery> nehan: right. So we could have two threads not able to get from the cache and both going through this logic: https://github.com/jnewbery/bitcoin/blob/7ad83ed252914d6d5b8ed81f103aecf052c68fb7/src/index/blockfilterindex.cpp#L405-L410

13:59

<jnewbery> and then both trying to put in the cache

13:59

<jnewbery> so the second thread tries to emplace the same object into the map that's already there

14:00

<jnewbery> which isn't a problem. emplace() is a no-op on the map if there's already something with that key: https://en.cppreference.com/w/cpp/container/map/emplace

14:00

<jnewbery> That was a great question. I think it probably deserves a code comment to explain that.

14:01

<nehan> jnewbery: yes that is a scary code pattern to me

14:02

<michaelfolkson> thomasb06: I think that is explained in the preceding paragraph. Nodes should generate filters and persist them rather than generate them on request

14:02

<jnewbery> There's no harm in locking for the entire function if you think that's better. There's going to be ~0 lock contention

14:02

<fjahr> so the second pattern does not fail depending on the first pattern, that's what I meant

14:02

<jnewbery> since it's only when a P2P message and RPC request arrive at the same time

14:02

<jnewbery> ok, that's time. Thanks for participating, everyone!

14:02

<jnewbery> See you all next week

14:02

<nehan> jnewbery: less cognitive overhead for the next developer who reads the code if you just lock the whole function

14:02

<jkczyz> drawback is you need to hold the lock even when not accessing the cache, unless refactoring and duplicating a little code

14:02

<nehan> thanks!

14:03

<jnewbery> and don't forget to leave your comments/review on the PR

14:03

<jkczyz> thanks, jnewbery!

14:03

<fjahr> Thanks jnewbery!

14:03

<theStack> thanks for hosting jnewbery!

14:03

<Caralie> thank you!

14:03

<troygiorshev> thanks jnewbery

14:03

<jnewbery> #endmeeting