The PR branch HEAD was 1ad8ea2b at the time of this review club meeting.
Notes
A bloom filter is a
probabilistic data structure. It supports two operations:
adding an element to the filter.
querying an element from the filter.
If an element has been previously added, then querying for the element will
return true. If an element has not been added, then querying for the
element may return true or false. In other words, querying may return a
false positive, but will never return a false negative.
See the wikipedia page for
how a bloom filter is implemented with hash functions onto a bitfield. Note
that the false positive rate depends on the size of the filter and the number
of hash functions.
BIP 37
introduced a new method for Simple Payment Verification
(SPV)
clients to use bloom filters to track transactions that affect their addresses.
BIP 37 was implemented in Bitcoin Core in PR
1795.
Using the P2P messages defined in BIP 37, an SPV client can request that a
full node send it transactions which match a bloom filter. The full node will
then relay unconfirmed transactions that match the filter, and the client can
request merkle
blocks,
which only contain the transactions that match the filter.
The SPV client chooses the bloom filter parameters (filter size, number of
hashes and a ‘tweak’ for the hashes) and sends them to the node in a
filterload message.
The original implementation contained a logic bug. If the client sent a
filterload message with a zero-sized filter, then the serving node could
later attempt a divide-by-zero and crash when querying an element from the
filter. See
CVE-2013-5700
for further details.
This bug was quietly fixed in PR
2914 without advertising the
reason. That fix added the isFull and isEmpty booleans, which have proven
to be confusing for developers.
This PR 18806 removes those
isFull and isEmpty booleans and adds a more straightforward fix for the
issue.
<jnewbery> This week we're covering quite a straightforward refactor PR. There have been a lot of new participants at these meetings the last few weeks, and recent meetings have gone quite deep into P2P and validation logic, so I though it'd be nice to look at something a bit simpler this week.
<lightlike> i wonder if a covert fix like this would still be possible today - someone surely would challenge the claim that this is a performance optimization and ask for benchmarks?!
<theStack> lightlike: interesting question, i also asked myself this quite a lot... i think the quality of reviews has increased drastically, looking at old PRs
<jnewbery> "BIP 37 did not specify a service bit for the bloom filter service, thus implicitly assuming that all nodes that serve peers data support it."
<jnewbery> ok, next question: Serving bloom filters is now disabled by default for full nodes. Why? What are some of the problems with BIP 37? What alternatives are there?
<theStack> one of the major advantages of server side filters is that they are deterministic, i.e. no need to maintain an individual state for each peer?
<sipa> theStack: no cpu cost per request, ability for client to compare filters from multiple nodes, ... possibility even to at some point soft fork it in so that you have an SPV-level proof of no censorship, ...
<jnewbery> ok, that was all the questions I had. I wanted this week's meeting to be a chance for people to ask more general questions about the review process, so hopefully some of you have questions.
<theStack> ad server side filters: seems to be a great idea and actually seems much more logical than client side filters; one drawback though seems to be that more data has to be transferred
<lightlike> andrewtoth: there is a specific one src/test/fuzz/bloom_filter.cpp - we reviewed one on process_messages that indirectly also creates the msgs for the bloom if you let it run for a while
<theStack> also i was very unsure if this would get concept acks, as bip37 is kind of anachronistic and according to gmaxwell there were discussions about removing bloom filter support completely years ago
<ccdle12> I was just wondering if CBloomFilter construction in net_processing by "vRecv >> filter" is constructed via `serialize.h` and is using those macros the preferred way to deserialize to objects in Bitcoin?
<theStack> jnewbery: agreed, this ctor also confused me quite a lot when i started digging into this code... a comment like "only used for tests" would make sense here
<michaelfolkson> Let's say an attacker had successfully pushed all v0.8 nodes off the network. Is there anything he/she could've done other than causing havoc? Target a specfic node with sybils when it comes back up?
<handed> michaelfolkson re: attacker, if the node that submits block headers to a pool is crashed then that could create a lower effective hashrate / hmm maybe a chain split?
<jonatack> thomasb06: have a deep look at the articles mentioned on the review club home page, and spend time watching interation on github and looking at the docs and code in the repository
<handed> michaelfolkson I'm thinking that if mining operator doesn't get a new block header from pool operator (e.g. they continue mining on old headers because their pool operator's node has crashed due to CVE) that they can create another valid block B2, A<-B1 A<-B2
<ariard> thomasb06: try to understand the code structure and how any changes may alter program behavior, and if this behavior is expected or not compare to what his claim by author
<handed> michaelfolkson it delays settlement for the network, seems to me that uncrashed nodes will have to wait and say "Hmm two valid, competing blocks exist. I don't know which one miners will continue to build on, I'll need to wait"
<nehan> thomasb06: i try to convince myself the code is correct. also think about how the change might impact performance. more/fewer disk reads? memory conserved?
<jonatack> another interesting debugging resource: some of achow101's twitch coding videos... the best parts imo are when he gets into the weeds, stuff is broken, and he has to fix it.
<jnewbery> When I started reviewing Bitcoin Core PRs, I did a _lot_ of manual testing. I wanted to see for myself how the behaviour changed. Even if there are automated tests, it's a good exercise to try to trigger the new behaviour yourself
<handed> sipa sure but as you point on occasionally, it's all about which block gets built on, not if your tx is in the tip (unless the point is, if the chain diverges under non adversarial txs, all txns could eventually be included)
<fjahr> thomasb06: I gave a talk on the functional tests (the ones in python) but it seems it's still not uploaded yet by Bitcoin Edge. At least the slides are there, maybe they are a little bit helpful https://telaviv2019.bitcoinedge.org/presentations
<michaelfolkson> sipa handed: Granted we want to stop havoc from happening. But it won't damage network once bug is addressed or cause people to lose money
<theStack> jonatack: awesome. i wanted to take a deeper look into lightning and in particular one of its implementations for a long time, maybe this is a good start
<jnewbery> I've opened the first PR here: https://github.com/bitcoin/bitcoin/pull/18877. Would anyone be interested in an extra review club session on that? It'd be slightly different from the normal session because it's explicitly to try to help get the PRs into a state ready for review.