This is the second special Bitcoin Core PR Review Club looking at the BIP 157
implementation. Today, we’ll
look at the second PR in the series: Add compact block filter headers cache.
Notes
BIP 157
defines checkpoints, which are compact block filter headers at 1000 block
intervals. Newly syncing clients can fetch the checkpoints first (potentially
from a trusted source) and then fill in the intermediate filter headers, and
finally the filters themselves. Because the filter headers form a hash chain,
the client can verify that they’ve been provided with the correct filters and
headers by checking against the checkpoints.
Since the checkpoint filter headers are the potentially the most frequently
served data, it makes sense to cache them in memory rather than read them
from disk every time.
I’ve taken a different approach from jimpo’s original implementation, so in
this review club, we’ll take a look at three different approaches:
<jnewbery> great. So PR 18960 is fairly straightforward and small, so I think the interesting discussion this week will be around how the implementations differ
<jkczyz> Roughly: (1) populates the cache as blocks are connected and needs to manage reorgs (2) populates when filters are requested in the message handler (3) moves the cache to the BlockFilterIndex
<theStack> 1) update cache on block tip update, cache by height 2) update cache on getcfcheckpt request, cache by height 3) update cache on getfcheckpt request, cache by block hash
<theStack> i saw that the original implementation used std::mutex, while in the codebase mostly our custom (Recursive)Mutex seems to be used -- is there any guide on when to use what?
<jnewbery> RecursiveMutex allows us to take locks multiple times, which happens in some places. I think a long term goal might be to git rid of that usage pattern?
<aj> our custom Mutex (and RecursiveMutex) are just the standard ones, with extra code to debug lock order violations (which might result in deadlocks) and to support clangs thread safety compile time checks
<jnewbery> The data race on shutdown was in a UpdatedBlockTip() callback. See L5727 of https://travis-ci.org/github/bitcoin/bitcoin/jobs/665594933. I don't know exactly why that happens, but I guess it's what caused jimpo to move to implementation (2), which doesn't use that callback
<aj> sipa: and none of the class's member functions take any other locks, or call functions that take any other locks? otherwise you could still get lock order violations?
<theStack> jnewbery: sure. basically the idea was to send repeatedly send getcfcheckpt requests to a full node (as it has enough blocks to take a substantial amout of time, compared to e.g. a regtest node) and measure how long it would take -- with and without the cache
<theStack> jnewbery: yes the mentioned overhead could definitely distort the result; also the ping part of send_and_ping(...) is included in the time, maybe it would be better to directly wait for the cfcheckpt answer
<fjahr> testing this in isolation like this guarantees that the leveldb cache will be hit in the version where there is no in memory cache, right? I am not sure if that would always be the case for a node running in the network.
<jkczyz> theStack: did you notice much difference between the first and subsequent runs? One difference between the implementations is (3) needs to warm up the cache on the initial fetch
<theStack> jkczyz: not really, on the posted result the request for up to block 630000 (i.e. 630 checkpoint headers) takes initially 50.92ms and the average of 99 more subsequent requests is 50.8ms
<jnewbery> It's possible to modify (3) to automatically fill the cache on startup, but I didn't think it was worth adding the code just to make the first getcfcheckpt response faster
<jnewbery> theStack: I expect much of that 50ms is overhead. If you think about how the message gets sent from the python test framework to the node, and then added to the peers receive buffer, and then the buffer processed on a messagehanderthread loop, that's got to be a lot of it.
<theStack> then i guess your suggested approach with logging directly in the C++ code itself is really the way to go; given that the time resolution in the log files is small enough
<theStack> jnewbery: good idea. i was thinking about manually analyzing the log in worst case, but of course if that can be automatized as well it'd be awesome
<theStack> jnewbery: also prefer (3) for its absolute simplicity! from an algorithmic perspective, the cache access in (1)/(2) happens in O(1) though, while in (3) it's O(log N), but considering the small amount of elements in the cache, this doesn't really matter
<michaelfolkson> Yeah 3 does seem better. Did you write it from scratch jnewbery or did you look at the first two and think there are better ways to do this?
<jnewbery> nehan: you're right that the cache is only populated if it's used in implementation (3). I think that's also true in implementation (2), but not (1), which will always fill the cache when the chain advances
<jnewbery> right. So we'd need to check where LookupFilterHeader() could be called from, which is net_processing (when a getcfcheckpt request is received) or rpc/blockchain (when the getblockfilter RPC is called)
<thomasb06> michaelfolkson: the Node Operation: "Nodes SHOULD NOT generate filters dynamically on request, as malicious peers may be able to perform DoS attacks by requesting small filters derived from large blocks."
<michaelfolkson> thomasb06: I think that is explained in the preceding paragraph. Nodes should generate filters and persist them rather than generate them on request