Fuzz: Add tx_pool fuzz target (tests)

https://github.com/bitcoin/bitcoin/pulls/21142

Host: MarcoFalke  -  PR author: MarcoFalke

The PR branch HEAD was bcf96cd at the time of this review club meeting.

Notes

  • Fuzzing is an effective way to find vulnerabilities in code before it is merged or released. A wiki page collects the issues fuzzing could identify.

  • Bitcoin Core has over 160 individual fuzz targets, but the code coverage of consensus, validation and P2P code paths is still low. For example MemPoolAccept::AcceptSingleTransaction never gets past PreChecks and exits early.

  • MemPoolAccept is used to accept (or reject) transactions from remote peers, the wallet and the RPC interface.

  • The existing process_message fuzz target can also test messages of type tx. It works by initializing a blockchain, then it lets the fuzz engine create a message (e.g. tx) from a peer to be processed.

  • The new tx_pool fuzz target aims to fuzz mempool acceptance more efficiently.

  • Refer to the fuzzing doc on how to run a fuzz engine on your machine.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. Why does the existing process_message_tx fuzz target perform so poorly when it comes to mempool acceptance?

  3. Why does the newly added tx_pool fuzz target achieve higher coverage in MemPoolAccept than the process_message_tx target?

  4. Is it expected to see more transactions rejected or accepted to the mempool in the tx_pool target? Why? You may collect evidence to support your answer by debugging the tx_pool target or by assuming all fields in the ConsumeTransaction helper are initialized by values picked to be uniformly random. Real fuzz engines do not pick values uniformly randomly, but this assumption is good enough for now.

  5. How does the tx_pool_standard fuzz target improve upon the tx_pool fuzz target even further?

  6. Do you have other ideas for improvement?

Meeting Log

  117:00 <MarcoFalke> #startmeeting
  217:00 <glozow> hi
  317:00 <amiti> hi
  417:00 <jnewbery> hi!
  517:00 <maqusat> hi
  617:00 <comment> hi
  717:00 <emzy> hi
  817:00 <OliP> Hi
  917:00 <michaelfolkson> hi
 1017:00 <AnthonyRonning> hi
 1117:00 <Keikun> Hi
 1217:00 <cguida1> hi
 1317:00 <MarcoFalke> as always, don't ask to ask, just ask ;)
 1417:00 <MarcoFalke> any first time reviewers today?
 1517:00 <Keikun> yep!
 1617:01 <glozow> it's my first time reviewing a fuzz pr :3
 1717:01 <MarcoFalke> Keikun: Welcome!
 1817:01 <MarcoFalke> glozow: Welcome to fuzzing! :)
 1917:01 <jarolrod_> hi
 2017:01 <sipa> hi
 2117:01 <jonatack> hi (and hi Keikun!)
 2217:02 <MarcoFalke> ok, let's get started...
 2317:02 <MarcoFalke> Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
 2417:02 <AnthonyRonning> tACK - code looks good to me & ran fuzzer locally overnight
 2517:02 <felixweis> hi
 2617:02 <emzy> Tested ACK
 2717:02 <michaelfolkson> n Sad times :(
 2817:02 <glozow> y, got the fuzz running and did a code review
 2917:02 <amiti> easy to give it a concept ACK, seems to clearly increase coverage :)
 3017:03 <jarolrod_> concept ACK, got it running, but don't know much about fuzzing yet :)
 3117:03 <maqusat> Concept ACK
 3217:03 <MarcoFalke> ok, before we jump into the fuzz target, let's cover another target really quick
 3317:03 <cguida1> didn't get to running it, still researching fuzzing
 3417:03 <MarcoFalke> Why does the existing process_message_tx fuzz target perform so poorly when it comes to mempool acceptance?
 3517:04 <MarcoFalke> cguida1: Did you have issues compiling? I'll be around after the meeting to troubleshoot a bit
 3617:04 <AnthonyRonning> mempool acceptance is pretty strict, it seems like the process_message fuzzing is random bytes of data in general.
 3717:05 <cguida1> MarcoFalke: unfortunately I was just busy with other stuff and didn't get to it :/
 3817:05 <MarcoFalke> AnthonyRonning: Correct
 3917:05 <glozow> yeah i imagine it'd take a lot of random tries to get a valid tx
 4017:05 <michaelfolkson> I saw glozow did the fuzzing on Docker, I've also had problems in the past with fuzzing on Mac
 4117:06 <MarcoFalke> Depending on the fuzz engine, it might even be practically impossible to get a valid tx
 4217:06 <AnthonyRonning> both add value though, I like that process_message does test random data because any peer could send us all that
 4317:06 <MarcoFalke> michaelfolkson: Yeah, the easiest way to get this running is to install Ubuntu, but we can cover compile issues after the meeting
 4417:07 <emzy> michaelfolkson: I use a linux box (old thin client) for the fuzzing. Don't like to boil my notebook.
 4517:07 <MarcoFalke> AnthonyRonning: Indeed. I see them as testing different code paths. process_message* is more general at the cost of missing detail
 4617:08 <MarcoFalke> To wrap up the question. For example predicting a correct prevout hash in the input has a probability of approx. 0
 4717:09 <MarcoFalke> So all messages in process_message are invalid transactions (or not even transactions)
 4817:09 <MarcoFalke> Why does the newly added tx_pool fuzz target achieve higher coverage in MemPoolAccept than the process_message_tx target?
 4917:09 <AnthonyRonning> `tx_pool` creates transactions based on an existing mempool in order to make sure the tx data isn’t so random.
 5017:09 <glozow> for starters, you can get further than missing-inputs
 5117:11 <glozow> it also provides a valid script, right? `P2WSH_OP_TRUE`s?
 5217:11 <MarcoFalke> For reference, the tx_pool target uses ConsumeTransaction: https://github.com/bitcoin/bitcoin/blob/e4e253d73007e0b680d2a473327c6fd66de4d86c/src/test/fuzz/util.cpp#L27
 5317:12 <AnthonyRonning> would it be benefitial to have an intentionally invalid script for the tx_pool tests?
 5417:13 <glozow> we have a script fuzzer, I think
 5517:13 <MarcoFalke> glozow: the tx_pool target consumes any script the tx_pool_standard target picks P2WSH_OP_TRUE (a standard script)
 5617:13 <cguida1> AnthonyRunning: I don't see why not
 5717:13 <cguida1> Oh dear, everything is bold now…
 5817:14 <MarcoFalke> (sorry, insert separator between "script the")
 5917:15 <MarcoFalke> Any questions about the tx_pool target before we move on?
 6017:16 <MarcoFalke> Is it expected to see more transactions rejected or accepted to the mempool in the tx_pool target? Why? You may collect evidence to support your answer by debugging the tx_pool target or by assuming all fields in the ConsumeTransaction helper are initialized by values picked to be uniformly random. Real fuzz engines do not pick values uniformly randomly, but this assumption is good enough for now.
 6117:16 <amiti> yes! at the end, why do we push_back txhash onto txids?
 6217:17 <MarcoFalke> amiti: txids keeps track of all txids to allow the fuzz engine to pick a valid prevout hash with propability larger than ~0
 6317:17 <MarcoFalke> https://github.com/bitcoin/bitcoin/blob/e4e253d73007e0b680d2a473327c6fd66de4d86c/src/test/fuzz/tx_pool.cpp#L223
 6417:17 <MarcoFalke> txids is also passed to the ConsumeTransaction helper
 6517:18 <MarcoFalke> https://github.com/bitcoin/bitcoin/blob/e4e253d73007e0b680d2a473327c6fd66de4d86c/src/test/fuzz/util.cpp#L38
 6617:18 <amiti> ah, I misread this and didn't realize its in the while loop. ok if it was a valid txn so use this as an option for the next round
 6717:18 <MarcoFalke> indeed
 6817:18 <amiti> clever :)
 6917:18 <AnthonyRonning> my guess was that tx_pool_standard would have more accepted transactions while tx_pool would have more rejected.
 7017:18 <AnthonyRonning> i wasn't sure how to debug to validate that idea
 7117:19 <MarcoFalke> Wild guesses are also allowed
 7217:19 <cguida1> I agree, I would guess more rejected, since tx_pool allows nonstandard txs? But just a guess. I imagine there are also ways to make sure most are accepted
 7317:20 <MarcoFalke> (The question is just about the ration of rejected/accepted within the tx_pool target)
 7417:23 <MarcoFalke> The answer to that question depends heavily on the fuzz engine. Modern fuzz engines can spend more time evolving fuzz inputs that cover rarely hit edges.
 7517:24 <MarcoFalke> One way to debug this would be to print out the mempool reject reason
 7617:24 <MarcoFalke> And then collect statistics which ones are the most common ones
 7717:25 <MarcoFalke> I found dust to be common and obviously an invalid sig
 7817:25 <glozow> ooo interesting
 7917:25 <emzy> More general question: How long should you run the fuzzing? What is a good sign to stop?
 8017:26 <glozow> any "mempool full"s ?
 8117:26 <glozow> i think that'd be the last possible failure
 8217:26 <MarcoFalke> glozow: Not yet, but I wasn't running very long
 8317:26 <AnthonyRonning> so just manually write a line in the code to log a message and that'll show up in the fuzz console output? or some other way with fuzzing libs? Not familar with fuzzing
 8417:27 <sipa> emzy: it's never ending
 8517:27 <MarcoFalke> emzy: That is still an open research question, but a good heuristic is to look whether the coverage metric increases
 8617:27 <sipa> for nontrivial tests there may be code paths that are only found after months or years of combined fuzzing time
 8717:28 <comment> emzy when one's tried every combination -- sun might go super nova before that though ;]
 8817:28 <emzy> MarcoFalke: so if it not increases anymore, better stop and change someting?
 8917:28 <MarcoFalke> emzy: The probability to find something decreases, but it won't be zero
 9017:29 <MarcoFalke> Unless the target is so trivial that all paths can be enumerated
 9117:29 <sipa> and presumably, the more interesting things are only found after long periods of time
 9217:29 <emzy> btw. I know it never stopps but there needs to be some practical messure to stop and move on.
 9317:29 <sipa> emzy: well, we in aggregate, won't stoo
 9417:30 <sipa> make sure you use the seeds in the qa repo, and contribute new ones back
 9517:30 <MarcoFalke> emzy: For a project that changes code every day, there is no "move on"
 9617:30 <AnthonyRonning> i wasn't exactly sure, but it seemed like the results from a `tx_pool_standard` actually changes the state of the mempool so consequtive runs even on the same input could produce different code paths?
 9717:30 <MarcoFalke> With changing code, the paths change and some inputs are invalidated and need to be "re-evolved"
 9817:30 <comment> +1
 9917:30 <emzy> MarcoFalke: so maybe if the code changed you restart the fuzzing with the new code? ;)
10017:31 <MarcoFalke> jup
10117:31 <MarcoFalke> Moving on, How does the tx_pool_standard fuzz target improve upon the tx_pool fuzz target even further?
10217:31 <glozow> how do we know when we've found a fuzz trophy? a crash?
10317:31 <MarcoFalke> glozow: The fuzz engine will tell you
10417:32 <MarcoFalke> afl++, and hongfuzz will tell you in their stats screen. libFuzzer will tell you by crashing
10517:32 <AnthonyRonning> `tx_pool_standard` will create transactions that are standard & based on a simulated mempool. This leads to better acceptance than creating transactions with completely random amounts, scripts, etc.
10617:33 <jonatack> FWIW I didn't see accepted = true yet in tx_pool after adding std::cout << "SUCCESS\n"; in the truthy case
10717:33 <jonatack> (I suppose it may improve with run time)
10817:34 <MarcoFalke> jonatack: How many CPU hours?
10917:34 <jonatack> a few minutes :p
11017:34 <MarcoFalke> leave a comment on the pr if you didn't find any within 2-8 hours or so
11117:35 <MarcoFalke> Can transaction still be rejected in the tx_pool_standard target?
11217:36 <MarcoFalke> (Question is made up just now, so don't look into your notes)
11317:37 <glozow> hm, I thought it would still be possible... nLockTime is random right?
11417:37 <AnthonyRonning> yeah I think so, things like locktime are still random
11517:37 <MarcoFalke> For reference we are looking at this part of the code now: https://github.com/bitcoin/bitcoin/blob/e4e253d73007e0b680d2a473327c6fd66de4d86c/src/test/fuzz/tx_pool.cpp#L109
11617:38 <glozow> I'm comparing `ConsumeTransaction` and the create transaction block in tx_pool_standard. It seems stricter but I don't think it necessarily results in a standard tx?
11717:38 <MarcoFalke> glozow: Indeed, the tx is not guaranteed to be standard. (only the scripts)
11817:39 <glozow> ah 💡 that's where the standard comes from
11917:40 <MarcoFalke> Also, the amounts still might be dust
12017:41 <MarcoFalke> So, to the last question in the script: Do you have other ideas for improvement?
12117:42 <glozow> RBF? :)
12217:42 <MarcoFalke> glozow: Good idea to test the rbf code paths
12317:42 <MarcoFalke> Was working on this just now, but fighting c++17 auto-template deduction guidelines :)
12417:42 <glozow> could also pre-populate the mempool to test rejections based on mempool limits
12517:43 <AnthonyRonning> is there a way to get a visual code coverage for the code paths that were tested? It may help me see what still isn't being reached.
12617:43 <MarcoFalke> Now is also a good time to ask any questions. I understand that the fuzz code is daunting (especially for newcomers)
12717:44 <MarcoFalke> AnthonyRonning: Jup, can be generated
12817:44 <MarcoFalke> Will look like this: (master) https://marcofalke.github.io/btc_cov/fuzz.coverage/index.html
12917:44 <glozow> I noticed the qa-assets repo is very very large
13017:44 <MarcoFalke> hey, it is only 3 GB
13117:44 <glozow> is it possible to condense the seeds or something?
13217:44 <AnthonyRonning> MarcoFalke: thanks! that's really useful
13317:44 <glozow> or some kind of more compact way?
13417:45 <sipa> glozow: yes, though compact is based on some measure of "redundant"
13517:45 <AnthonyRonning> have the seeds for `tx_pool` been added to qa-assets already?
13617:45 <MarcoFalke> I think there is a lot of redundancy in the inputs and git will compress them (at least as long as they are in the git pack)
13717:45 <glozow> AnthonyRonning: probably doesn't make sense until tx_pool is merged
13817:46 <sipa> but redundant w.r.t. what? your compilation options will influence what is considered interesting
13917:46 <AnthonyRonning> yeah good point. I wasn't sure how to see or check my seeds, thought it might had to been when passing in qa-assets data
14017:46 <MarcoFalke> AnthonyRonning: The target might change a bit once I add support for RBF, so we wouldn't want to add stale inputs to the repo
14117:46 <jnewbery> Is a git repo the best way to store the fuzz assets? Commit history don't seem very interesting/useful
14217:47 <AnthonyRonning> the only thing i really see at the end of my fuzzing runs are some `slow-unit-*` files, are those anything to be concerned about?
14317:47 <sipa> AnthonyRonning: how are you invoking it?
14417:47 <MarcoFalke> jnewbery: Sometimes inputs are deleted because they are no longer relevant for the master branch. But keeping a copy for release branches could make sense
14517:47 <AnthonyRonning> sipa: `FUZZ=tx_pool src/test/fuzz/fuzz -jobs=31`
14617:48 <MarcoFalke> AnthonyRonning: How many cores do you have?
14717:48 <sipa> AnthonyRonning: i think you need to specify a directory name where it'll save seeds
14817:48 <AnthonyRonning> 64
14917:48 <AnthonyRonning> sipa: ah okay cool!
15017:48 <AnthonyRonning> still need to play with fuzzing more, very facinating
15117:48 <MarcoFalke> slow-* means that the particular input took a long time
15217:48 <sipa> just put a directory name after the commamd
15317:48 <AnthonyRonning> sipa: thanks!
15417:49 <MarcoFalke> This could mean that the input takes a long time to parse and execute by the fuzz engine. Or it could mean there is a DoS vector
15517:50 <AnthonyRonning> MarcoFalke: oh okay, so maybe false positive, maybe DoS? Is it worth saving those off to investigate further?
15617:51 <glozow> ooh, how do we recover what the input was? I had a couple of those as well when i didn't specify directory
15717:51 <MarcoFalke> crash- and slow- are stored in the pwd
15817:51 <MarcoFalke> (also oom-*)
15917:52 <jnewbery> MarcoFalke: my point is that unlike the code repo, where history is important, cloning the entire history of the fuzz assets doesn't seem very useful
16017:52 <MarcoFalke> btw, this is how you generate the colored html output: https://github.com/MarcoFalke/btc_cov/blob/cd1b2a714aa99be3a9fd2bc68a2308c49f36fd76/.cirrus.yml#L43
16117:53 <MarcoFalke> jnewbery: It isn't useful for CI, which is why CI specifies --depth=1
16217:53 <MarcoFalke> Though it might be useful for a dev
16317:53 <jnewbery> ah, good tip!
16417:54 <MarcoFalke> Sorry, wrong line https://github.com/MarcoFalke/btc_cov/blob/cd1b2a714aa99be3a9fd2bc68a2308c49f36fd76/.cirrus.yml#L65
16517:55 <AnthonyRonning> MarcoFalke: thanks!
16617:55 <glozow> so adding `--enable-lcov --enable-lcov-branch-coverage` ?
16717:56 <maqusat> likely a noob question but why do tests use COINBASE_MATURITY constant?
16817:56 <MarcoFalke> jup
16917:56 <jonatack> maqusat: The block reward of coinbaseoutput.nValue (50) BTC/block matures after COINBASE_MATURITY (100) blocks
17017:57 <MarcoFalke> maqusat: Good question! Spending a coin before COINBASE_MATURITY confirmations is not allowed, so to add any transaction the the mempool the fuzz target needs to mine at least COINBASE_MATURITY+1 blocks
17117:57 <MarcoFalke> I think this one mines 2*COINBASE_MATURITY
17217:58 <glozow> and it's still possible to pull an outpoint that's premature right?
17317:58 <jonatack> this is why, in the functional test, you often see generate(101) in the test setup (COINBASE_MATURITY + 1)
17417:58 <MarcoFalke> As the setup is expensive, it is done one time before any fuzzing starts: https://github.com/bitcoin/bitcoin/blob/e4e253d73007e0b680d2a473327c6fd66de4d86c/src/test/fuzz/tx_pool.cpp#L20
17517:58 <jonatack> tests*
17617:58 <MarcoFalke> glozow: Only for the tx_pool target, IIRC
17717:59 <maqusat> does coinbase mean a special type of transaction that can't be spent without 100 confirmations?
17817:59 <glozow> ah right, `if (outpoints.size() >= COINBASE_MATURITY) break;`
17918:00 <MarcoFalke> maqusat: The coinbase transaction is the first transaction in the block and can't be spent before 100 confirmations
18018:00 <MarcoFalke> #endmeeting