The harness simulates P2P network behavior by having three peers send various messages (CMPCTBLOCK, BLOCKTXN, HEADERS, SENDCMPCT, TX) to the test node, triggering its compact block logic.
The goal is to find bugs and ensure assertions hold. See #33296 for an example of a bug found by this very fuzz test. Nice!
Compact Blocks Protocol Overview
BIP 152 introduced the compact block relay protocol, which reduces bandwidth and latency when propagating blocks by sending an encoded version of a block under the assumption that most of the transactions are already in our peer’s mempool.
Compact blocks consist of:
A header
Short transaction identifiers (6 bytes instead of 32 bytes)
Prefilled transactions
A vector where each entry is a differentially-encoded uint16_t index and the complete serialized transaction.
The current code only prefills the coinbase. However, the protocol allows for the inclusion of other transactions that we might expect our peer is missing. See this TODO.
The receiving node reconstructs the block using transactions from its mempool. If any transactions are missing, it requests them via a GETBLOCKTXN message.
There are two modes:
low-bandwidth: block sent only after peer requests it via INV/GETDATA
high-bandwidth: block sent to peers without request as soon as it is received, even before full validation is completed
Fuzz Testing
Fuzzing is a testing technique that provides quasi-random inputs to code to discover bugs and unexpected behavior. Fuzzers use feedback like code coverage as a guide for mutation of those inputs, allowing them to explore deeper paths within the target code.
See our fuzzing docs to start fuzzing with libFuzzer and AFL++.
Check out this PR’s test coverage (blockencodings.cpp and net_processing.cpp are probably most useful to look at). Walking through a coverage report to see which branches are hit or not hit is a good way to start evaluating a fuzz harness.
An effective fuzz test should be deterministic, to make bugs reproducible. If you’re able to get the AFL++ fuzzer running, you will see a stability metric. The goal is to get to 100%, perfect determinism! After gathering some inputs, you can also run the fuzz determinism script over the corpus to check for potential sources of instability.
The faster (more iterations per second) a fuzz harness is, the more quickly it will discover new code paths and potential bugs. You’ll see this metric as exec/s when fuzzing. Any operation that significantly slows down an iteration (like reading from and writing to disk) should be optimized or avoided if possible.
Fun Exercise
If you get the cmpctblock harness running, revert PR #33296 and see if you can reproduce the crash on this Assume statement.
Yeah I know, fuzzing is cool.
Questions
Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach? Were you able to get the fuzz test running?
Where in the codebase are the main compact block helpers and processing logic? Name some of the relevant classes and functions. (Hint: nothing like a little search for NetMsgType::CMPCTBLOCK to get you started)
The fuzz test sends SENDCMPCT messages with high_bandwidth randomly set. How many high bandwidth peers are allowed and does the fuzz harness test this limit? More generally, why would a peer choose to be high or low bandwidth?
Compare -testdatadir and the new -fuzzcopydatadir. Why is the latter useful for performance, and why isn’t a fresh TestingSetup that mines blocks on each iteration acceptable here?
Look at create_block in the harness. How many transactions do the generated blocks contain, and where do they come from? What compact block scenarios might be missed with only a few transactions in a block?
In commit 254e13cFinalizeHeader is moved to util.h so it can be used by any fuzz test. What is the purpose of this function and why doesn’t it slow down the fuzzer?
Commit ed813c4 sorts m_dirty_blockindex by block hash instead of pointer address. What non-determinism does this fix? The author notes this slows production code for no production benefit. Why can’t EnableFuzzDeterminism() be used here? How do you think this non-determinism should be best handled (if not the way the PR currently does)?
The harness adds prefilled transactions to a compact block beyond the coinbase. Walk through the index calculations for prefilling transactions at indices 1 and 3 in a 4-transaction block. Do you think this code could be simplified or cleaned up, and if so, what do you suggest?
<marcofleon> Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach? Were you able to get the fuzz test running?
<stringintech> while trying to get the fuzz test running on mac, i ran into a couple of compile-time and runtime issues. small write-up for the compile fix here: https://stringintech.github.io/blog/p/fuzzing-bitcoin-core-using-afl-on-apple-silicon . i will try to enhance/extend it once i wrap my head around the runtime issues (reported one on the PR page).
<danielabrozzoni> Concept ACK, I am still finishing my review :) I looked at commits one by one, trying to understand why each change was needed. I was able to run the fuzz tests.
<marcofleon> stringintech: nice, yeah I saw your comments. I still need to look into it more to understand but I do wonder why that wasn't showing up on linux
<marcofleon> Where in the codebase are the main compact block helpers and processing logic? Name some of the relevant classes and functions. (Hint: nothing like a little search for NetMsgType::CMPCTBLOCK to get you started)
<danielabrozzoni> I tried to see it crash, but couldn't reproduce! It might be that I haven't run the fuzzer for long enough, and my exec/sec are quite low
<monlovesmango> well I don't want to derail anything but usually with fuzz tests you see asserts everywhere, but here I didn't see any. so my question is where/how are we asserting that results are expected?
<stringintech> marcofleon: i could not unfortunately. blocked by the runtime issues. but had a question on this... how much the initial input we start with matters?
<marcofleon> just generally testing compact block logic. But that's good that you're thinking about that monlovesmango because yeah in general you want good oracles for fuzz tests
<stickies-v> marcofleon: `PeerManagerImpl::NewPoWValidBlock` creates a `NetMsgType::CMPCTBLOCK` message and through `m_connman.ForEachNode` sends it to our high-bandwidth peers
<dzxzg> The `PartiallyDownloadedBlock` class for all of the compact block construction object stuff, and a lot of the logic is inside of the `ProcessMessage` NetMsgType::CMPCTBLOCK case
<monlovesmango> it seemed like class CBlockHeaderAndShortTxIDs in blockencodings.h was putting together the compact block message. is that the question?
<danielabrozzoni> marcofleon: Ah, got it! I'll try to run it for longer, just for fun. Maybe I could also modify the harness so that it will get to the crash quicker, as an exercise
<stringintech> 1) SEND a peer a cmpct block in HIGH bandwidth mode: in `PeerManagerImpl::NewPoWValidBlock()` callback (triggered by `ChainstateManager::AcceptBlock()`) and in `PeerManagerImpl::SendMessages()` (in case not already sent the cmpct block in `PeerManagerImpl::NewPoWValidBlock()` to this peer)
<stringintech> 2) SEND a peer a cmpct block in LOW bandwidth mode: process peer cmpct block request in `PeerManagerImpl::ProcessGetBlockData()` and potentially read the block from disk and construct the cmpct block - if not already done - and send it; constructing the cmpct block happens through `CBlockHeaderAndShortTxIDs` constructor in blockencodings.cpp
<stringintech> 3) RECEIVE a cmpct block from a peer: in `PeerManagerImpl::ProcessMessage()` processing `NetMsgType::CMPCTBLOCK` msg type, we process the cmpct block sent to us and try to construct the full block using txns in our mempool (using `PartiallyDownloadedBlock::InitData()` in blockencodings.cpp) and potentially send a `NetMsgType::GETBLOCKTXN` msg to
<stringintech> the same peer for the txns we did not have where we can later process the response in `PeerManagerImpl::ProcessCompactBlockTxns()` and finally call `PartiallyDownloadedBlock::FillBlock()` to see if we have the full block now.
<stringintech> in (1) and (2) peer may respond with a `NetMsgType::GETBLOCKTXN` msg where we potentially send the requested txns of the block in `PeerManagerImpl::SendBlockTransactions()`
<danielabrozzoni> Also `MaybeSetPeerAsAnnouncingHeaderAndIDs` takes care of switching a peer to high bandwidth mode, and if we have too many high bandwidth peers, downgrading a differernt one to low bandwidth
<marcofleon> The fuzz test sends SENDCMPCT messages with high_bandwidth randomly set. How many high bandwidth peers are allowed and does the fuzz harness test this limit? More generally, why would a peer choose to be high or low bandwidth?
<stringintech> i found a limit on the other side: how many peers WE can select as HB which is “3” and it is enforced when PeerManagerImpl::BlockChecked() calls`PeerManagerImpl::MaybeSetPeerAsAnnouncingHeaderAndIDs()` (maintains 3 fastest peers as HB).
<dzxzg> A peer doesn't choose to be high or low bandwidth, we will ask all of our peers to be low bandwidth compact block peers initially, and then in MaybeSetPeerAsAnnouncing* we'll select high bandwidth peers and send them a SENDCMPCT asking them to be a high bandwidth peers
<dzxzg> If someone has asked you to be their high-bandwidth peer, you'll send them a CMPCTBLOCK message in `NewPoWValidBlock()` before you have even completetd block validation
<marcofleon> Compare -testdatadir and the new -fuzzcopydatadir. Why is the latter useful for performance, and why isn’t a fresh TestingSetup that mines blocks on each iteration acceptable here?
<stringintech> is this performance related (considering that mining should not take so long when in fuzzing mode) or that we want all runs to have the same initial state?
<stringintech> stickies-v: i think that's what we are doing know. mining blocks and store in a static dir in init() and then copy that static dir in each run.
<stickies-v> well so usually test suites have "fixtures" to do expensive one-off initialization that can be shared across tests, e.g. we have the same in our unit tests using boost
<marcofleon> Yeah I think so, usually the testing setup stuff would be done in the init as a one off. But for this test specifically, its done in the iteration which is unusual
<dzxzg> Maybe there is an obvious reason why this is dumb, but it really seems like there should be a way to have a memory-only chainstate object, and then we could just reset to that at the end of each fuzz epoch(not sure if epoch is the right word)
<marcofleon> My first question when I saw this test was why can't we do this in memory. And Niklas was saying its just the validation code doesn't allow it. We need to do some refactoring
<marcofleon> Look at create_block in the harness. How many transactions do the generated blocks contain, and where do they come from? What compact block scenarios might be missed with only a few transactions in a block?
<dzxzg> stringintech89: It would reduce coverage of this particular fuzz test, but that coverage is redundant I think this test should just fuzz compact block logic which doesn't do anything special with the disk, I think existing validation fuzzers should cover that stuff
<stringintech89> for the next question: max 3 txns: coinbase, at most one from mempool, at most one not in mempool. was wondering if including more than one from mempool (or more than one not in mempool) would cover different code paths and lead to more coverage.
<marcofleon> Commit ed813c4 sorts m_dirty_blockindex by block hash instead of pointer address. What non-determinism does this fix? The author notes this slows production code for no production benefit. Why can’t EnableFuzzDeterminism() be used here? How do you think this non-determinism should be best handled (if not the way the PR currently does)?
<dzxzg> the second half of question 5, this doesn't exercise some of the prefill index stuff, where prefills have an index counting from the last prefill
<stringintech89> and if we want to include the new sort in the set type itself, EnableFuzzDeterminism() does not work and we would need macros perhaps.
<marcofleon> Yeah basically the iteration order won't be deterministic with just a normal std::set which orders based on memory address. so it's changed to block hash to make it the same order every time
<stringintech89> but as marcofleon mentioned on the PR page, maybe we can include a runtime sorting logic just when fuzzing; then EnableFuzzDeterminism() can be used perhaps
<eugenesiegel> hi, sorry I thought this review club was at a different time. runtime sorting doesn't work because the std::sort call itself will be non-deterministic
<dzxzg> Just drawing it out a bit more to make sure I understand why we care if it's in the same order every run, we write from the set of dirty block indexes in the order of the set, and if we don't write blockindexes in the same order every time, we would have different behavior running the same seed two times?
<marcofleon> in gneral fuzz tests should be as deterministic as possible. But not sure if it's worth it in this case, because this non-determinism might not super impactful?
<eugenesiegel> dzxzg: we won't technically have different behavior since iirc the db is a key-value store. The issue is that the fuzzer will think it's gaining (or losing) coverage when nothing has actually happened
<eugenesiegel> sorry, I think my answer was confusing. if there is a different order written to leveldb, the internal MemTable is non-deterministic, but since these are key-value pairs it shouldn't? matter on-disk
<eugenesiegel> If we insert into `m_dirty_blockindex` without sorting on insert, then `m_dirty_blockindex` is in random order. this means the sort function will call whatever std::less function a variable number of times each run for the same input. I'm completely ok with removing the commit
<stringintech89> eugenesiegel: can you please elaborate a bit on this "the issue is that the fuzzer will think it's gaining (or losing) coverage when nothing has actually happened"
<eugenesiegel> stringintech89: let's say we removed the `m_dirty_blockindex` commit, meaning the harness is now non-deterministic. if the fuzzer is given an input that calls WriteBatchSync and modifies one bit of the input that's uninteresting (doesn't gain or lose coverage), then because the harness is non-deterministic, the fuzzer may actually see an increase
<stringintech> eugenesiegel: ahh thank you! i think i got it (though not so clear how fuzzer calculates the coverage). will read more on it and ask more questions if still not clear.
<eugenesiegel> stringintech: This might be a bit out-dated, but I found this AFL writeup very helpful when learning how coverage feedback with fuzzing worked: https://lcamtuf.coredump.cx/afl/technical_details.txt
<stringintech66> so in the example eugenesiegel explained, for the two given inputs that have the same coverage of the code paths we are interested in, fuzzer might see different coverage score since different amount of sorting might happen...