Add tx_pool fuzz target (tests )
Mar 17, 2021
https://github.com/bitcoin/bitcoin/pull/21142
Host:
MarcoFalke
-
PR author: MarcoFalke
The PR branch HEAD was bcf96cd at the time of this review club meeting.
Notes
Fuzzing is an effective way to find vulnerabilities in code before it is
merged or released. A wiki
page
collects the issues fuzzing could identify.
Bitcoin Core has over
160 individual
fuzz targets, but the code coverage of consensus, validation and P2P code
paths is still
low . For
example MemPoolAccept::AcceptSingleTransaction
never gets past PreChecks
and exits early.
MemPoolAccept
is used to accept (or reject) transactions from remote peers, the wallet and the RPC interface.
The existing process_message
fuzz target can also test messages of type
tx
. It works by initializing a
blockchain ,
then it lets the fuzz engine create a message (e.g. tx
) from a peer to be
processed .
The new tx_pool
fuzz target aims to fuzz mempool acceptance more efficiently.
Refer to the fuzzing
doc on how to
run a fuzz engine on your machine.
Questions
Did you review the PR? Concept ACK, approach ACK, tested ACK, or
NACK ?
What was your review approach?
Why does the existing process_message_tx
fuzz target perform so poorly
when it comes to mempool acceptance?
Why does the newly added tx_pool
fuzz target achieve higher coverage in MemPoolAccept
than the process_message_tx
target?
Is it expected to see more transactions rejected or accepted to the mempool in the tx_pool
target? Why? You may collect evidence to support your answer by debugging the tx_pool
target or by assuming all fields in the ConsumeTransaction
helper are initialized by values picked to be uniformly random. Real fuzz engines do not pick values uniformly randomly, but this assumption is good enough for now.
How does the tx_pool_standard
fuzz target improve upon the tx_pool
fuzz target even further?
Do you have other ideas for improvement?
Meeting Log
1 17:00 <MarcoFalke> #startmeeting
9 17:00 <michaelfolkson> hi
10 17:00 <AnthonyRonning> hi
13 17:00 <MarcoFalke> as always, don't ask to ask, just ask ;)
14 17:00 <MarcoFalke> any first time reviewers today?
16 17:01 <glozow> it's my first time reviewing a fuzz pr :3
17 17:01 <MarcoFalke> Keikun: Welcome!
18 17:01 <MarcoFalke> glozow: Welcome to fuzzing! :)
21 17:01 <jonatack> hi (and hi Keikun!)
22 17:02 <MarcoFalke> ok, let's get started...
23 17:02 <MarcoFalke> Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
24 17:02 <AnthonyRonning> tACK - code looks good to me & ran fuzzer locally overnight
26 17:02 <emzy> Tested ACK
27 17:02 <michaelfolkson> n Sad times :(
28 17:02 <glozow> y, got the fuzz running and did a code review
29 17:02 <amiti> easy to give it a concept ACK, seems to clearly increase coverage :)
30 17:03 <jarolrod_> concept ACK, got it running, but don't know much about fuzzing yet :)
31 17:03 <maqusat> Concept ACK
32 17:03 <MarcoFalke> ok, before we jump into the fuzz target, let's cover another target really quick
33 17:03 <cguida1> didn't get to running it, still researching fuzzing
34 17:03 <MarcoFalke> Why does the existing process_message_tx fuzz target perform so poorly when it comes to mempool acceptance?
35 17:04 <MarcoFalke> cguida1: Did you have issues compiling? I'll be around after the meeting to troubleshoot a bit
36 17:04 <AnthonyRonning> mempool acceptance is pretty strict, it seems like the process_message fuzzing is random bytes of data in general.
37 17:05 <cguida1> MarcoFalke: unfortunately I was just busy with other stuff and didn't get to it :/
38 17:05 <MarcoFalke> AnthonyRonning: Correct
39 17:05 <glozow> yeah i imagine it'd take a lot of random tries to get a valid tx
40 17:05 <michaelfolkson> I saw glozow did the fuzzing on Docker, I've also had problems in the past with fuzzing on Mac
41 17:06 <MarcoFalke> Depending on the fuzz engine, it might even be practically impossible to get a valid tx
42 17:06 <AnthonyRonning> both add value though, I like that process_message does test random data because any peer could send us all that
43 17:06 <MarcoFalke> michaelfolkson: Yeah, the easiest way to get this running is to install Ubuntu, but we can cover compile issues after the meeting
44 17:07 <emzy> michaelfolkson: I use a linux box (old thin client) for the fuzzing. Don't like to boil my notebook.
45 17:07 <MarcoFalke> AnthonyRonning: Indeed. I see them as testing different code paths. process_message* is more general at the cost of missing detail
46 17:08 <MarcoFalke> To wrap up the question. For example predicting a correct prevout hash in the input has a probability of approx. 0
47 17:09 <MarcoFalke> So all messages in process_message are invalid transactions (or not even transactions)
48 17:09 <MarcoFalke> Why does the newly added tx_pool fuzz target achieve higher coverage in MemPoolAccept than the process_message_tx target?
49 17:09 <AnthonyRonning> `tx_pool` creates transactions based on an existing mempool in order to make sure the tx data isn’t so random.
50 17:09 <glozow> for starters, you can get further than missing-inputs
51 17:11 <glozow> it also provides a valid script, right? `P2WSH_OP_TRUE`s?
53 17:12 <AnthonyRonning> would it be benefitial to have an intentionally invalid script for the tx_pool tests?
54 17:13 <glozow> we have a script fuzzer, I think
55 17:13 <MarcoFalke> glozow: the tx_pool target consumes any script the tx_pool_standard target picks P2WSH_OP_TRUE (a standard script)
56 17:13 <cguida1> AnthonyRunning: I don't see why not
57 17:13 <cguida1> Oh dear, everything is bold now…
58 17:14 <MarcoFalke> (sorry, insert separator between "script the")
59 17:15 <MarcoFalke> Any questions about the tx_pool target before we move on?
60 17:16 <MarcoFalke> Is it expected to see more transactions rejected or accepted to the mempool in the tx_pool target? Why? You may collect evidence to support your answer by debugging the tx_pool target or by assuming all fields in the ConsumeTransaction helper are initialized by values picked to be uniformly random. Real fuzz engines do not pick values uniformly randomly, but this assumption is good enough for now.
61 17:16 <amiti> yes! at the end, why do we push_back txhash onto txids?
62 17:17 <MarcoFalke> amiti: txids keeps track of all txids to allow the fuzz engine to pick a valid prevout hash with propability larger than ~0
64 17:17 <MarcoFalke> txids is also passed to the ConsumeTransaction helper
66 17:18 <amiti> ah, I misread this and didn't realize its in the while loop. ok if it was a valid txn so use this as an option for the next round
67 17:18 <MarcoFalke> indeed
68 17:18 <amiti> clever :)
69 17:18 <AnthonyRonning> my guess was that tx_pool_standard would have more accepted transactions while tx_pool would have more rejected.
70 17:18 <AnthonyRonning> i wasn't sure how to debug to validate that idea
71 17:19 <MarcoFalke> Wild guesses are also allowed
72 17:19 <cguida1> I agree, I would guess more rejected, since tx_pool allows nonstandard txs? But just a guess. I imagine there are also ways to make sure most are accepted
73 17:20 <MarcoFalke> (The question is just about the ration of rejected/accepted within the tx_pool target)
74 17:23 <MarcoFalke> The answer to that question depends heavily on the fuzz engine. Modern fuzz engines can spend more time evolving fuzz inputs that cover rarely hit edges.
75 17:24 <MarcoFalke> One way to debug this would be to print out the mempool reject reason
76 17:24 <MarcoFalke> And then collect statistics which ones are the most common ones
77 17:25 <MarcoFalke> I found dust to be common and obviously an invalid sig
78 17:25 <glozow> ooo interesting
79 17:25 <emzy> More general question: How long should you run the fuzzing? What is a good sign to stop?
80 17:26 <glozow> any "mempool full"s ?
81 17:26 <glozow> i think that'd be the last possible failure
82 17:26 <MarcoFalke> glozow: Not yet, but I wasn't running very long
83 17:26 <AnthonyRonning> so just manually write a line in the code to log a message and that'll show up in the fuzz console output? or some other way with fuzzing libs? Not familar with fuzzing
84 17:27 <sipa> emzy: it's never ending
85 17:27 <MarcoFalke> emzy: That is still an open research question, but a good heuristic is to look whether the coverage metric increases
86 17:27 <sipa> for nontrivial tests there may be code paths that are only found after months or years of combined fuzzing time
87 17:28 <comment> emzy when one's tried every combination -- sun might go super nova before that though ;]
88 17:28 <emzy> MarcoFalke: so if it not increases anymore, better stop and change someting?
89 17:28 <MarcoFalke> emzy: The probability to find something decreases, but it won't be zero
90 17:29 <MarcoFalke> Unless the target is so trivial that all paths can be enumerated
91 17:29 <sipa> and presumably, the more interesting things are only found after long periods of time
92 17:29 <emzy> btw. I know it never stopps but there needs to be some practical messure to stop and move on.
93 17:29 <sipa> emzy: well, we in aggregate, won't stoo
94 17:30 <sipa> make sure you use the seeds in the qa repo, and contribute new ones back
95 17:30 <MarcoFalke> emzy: For a project that changes code every day, there is no "move on"
96 17:30 <AnthonyRonning> i wasn't exactly sure, but it seemed like the results from a `tx_pool_standard` actually changes the state of the mempool so consequtive runs even on the same input could produce different code paths?
97 17:30 <MarcoFalke> With changing code, the paths change and some inputs are invalidated and need to be "re-evolved"
99 17:30 <emzy> MarcoFalke: so maybe if the code changed you restart the fuzzing with the new code? ;)
100 17:31 <MarcoFalke> jup
101 17:31 <MarcoFalke> Moving on, How does the tx_pool_standard fuzz target improve upon the tx_pool fuzz target even further?
102 17:31 <glozow> how do we know when we've found a fuzz trophy? a crash?
103 17:31 <MarcoFalke> glozow: The fuzz engine will tell you
104 17:32 <MarcoFalke> afl++, and hongfuzz will tell you in their stats screen. libFuzzer will tell you by crashing
105 17:32 <AnthonyRonning> `tx_pool_standard` will create transactions that are standard & based on a simulated mempool. This leads to better acceptance than creating transactions with completely random amounts, scripts, etc.
106 17:33 <jonatack> FWIW I didn't see accepted = true yet in tx_pool after adding std::cout << "SUCCESS\n"; in the truthy case
107 17:33 <jonatack> (I suppose it may improve with run time)
108 17:34 <MarcoFalke> jonatack: How many CPU hours?
109 17:34 <jonatack> a few minutes :p
110 17:34 <MarcoFalke> leave a comment on the pr if you didn't find any within 2-8 hours or so
111 17:35 <MarcoFalke> Can transaction still be rejected in the tx_pool_standard target?
112 17:36 <MarcoFalke> (Question is made up just now, so don't look into your notes)
113 17:37 <glozow> hm, I thought it would still be possible... nLockTime is random right?
114 17:37 <AnthonyRonning> yeah I think so, things like locktime are still random
116 17:38 <glozow> I'm comparing `ConsumeTransaction` and the create transaction block in tx_pool_standard. It seems stricter but I don't think it necessarily results in a standard tx?
117 17:38 <MarcoFalke> glozow: Indeed, the tx is not guaranteed to be standard. (only the scripts)
118 17:39 <glozow> ah 💡 that's where the standard comes from
119 17:40 <MarcoFalke> Also, the amounts still might be dust
120 17:41 <MarcoFalke> So, to the last question in the script: Do you have other ideas for improvement?
121 17:42 <glozow> RBF? :)
122 17:42 <MarcoFalke> glozow: Good idea to test the rbf code paths
123 17:42 <MarcoFalke> Was working on this just now, but fighting c++17 auto-template deduction guidelines :)
124 17:42 <glozow> could also pre-populate the mempool to test rejections based on mempool limits
125 17:43 <AnthonyRonning> is there a way to get a visual code coverage for the code paths that were tested? It may help me see what still isn't being reached.
126 17:43 <MarcoFalke> Now is also a good time to ask any questions. I understand that the fuzz code is daunting (especially for newcomers)
127 17:44 <MarcoFalke> AnthonyRonning: Jup, can be generated
129 17:44 <glozow> I noticed the qa-assets repo is very very large
130 17:44 <MarcoFalke> hey, it is only 3 GB
131 17:44 <glozow> is it possible to condense the seeds or something?
132 17:44 <AnthonyRonning> MarcoFalke: thanks! that's really useful
133 17:44 <glozow> or some kind of more compact way?
134 17:45 <sipa> glozow: yes, though compact is based on some measure of "redundant"
135 17:45 <AnthonyRonning> have the seeds for `tx_pool` been added to qa-assets already?
136 17:45 <MarcoFalke> I think there is a lot of redundancy in the inputs and git will compress them (at least as long as they are in the git pack)
137 17:45 <glozow> AnthonyRonning: probably doesn't make sense until tx_pool is merged
138 17:46 <sipa> but redundant w.r.t. what? your compilation options will influence what is considered interesting
139 17:46 <AnthonyRonning> yeah good point. I wasn't sure how to see or check my seeds, thought it might had to been when passing in qa-assets data
140 17:46 <MarcoFalke> AnthonyRonning: The target might change a bit once I add support for RBF, so we wouldn't want to add stale inputs to the repo
141 17:46 <jnewbery> Is a git repo the best way to store the fuzz assets? Commit history don't seem very interesting/useful
142 17:47 <AnthonyRonning> the only thing i really see at the end of my fuzzing runs are some `slow-unit-*` files, are those anything to be concerned about?
143 17:47 <sipa> AnthonyRonning: how are you invoking it?
144 17:47 <MarcoFalke> jnewbery: Sometimes inputs are deleted because they are no longer relevant for the master branch. But keeping a copy for release branches could make sense
145 17:47 <AnthonyRonning> sipa: `FUZZ=tx_pool src/test/fuzz/fuzz -jobs=31`
146 17:48 <MarcoFalke> AnthonyRonning: How many cores do you have?
147 17:48 <sipa> AnthonyRonning: i think you need to specify a directory name where it'll save seeds
148 17:48 <AnthonyRonning> 64
149 17:48 <AnthonyRonning> sipa: ah okay cool!
150 17:48 <AnthonyRonning> still need to play with fuzzing more, very facinating
151 17:48 <MarcoFalke> slow-* means that the particular input took a long time
152 17:48 <sipa> just put a directory name after the commamd
153 17:48 <AnthonyRonning> sipa: thanks!
154 17:49 <MarcoFalke> This could mean that the input takes a long time to parse and execute by the fuzz engine. Or it could mean there is a DoS vector
155 17:50 <AnthonyRonning> MarcoFalke: oh okay, so maybe false positive, maybe DoS? Is it worth saving those off to investigate further?
156 17:51 <glozow> ooh, how do we recover what the input was? I had a couple of those as well when i didn't specify directory
157 17:51 <MarcoFalke> crash- and slow- are stored in the pwd
158 17:51 <MarcoFalke> (also oom-*)
159 17:52 <jnewbery> MarcoFalke: my point is that unlike the code repo, where history is important, cloning the entire history of the fuzz assets doesn't seem very useful
161 17:53 <MarcoFalke> jnewbery: It isn't useful for CI, which is why CI specifies --depth=1
162 17:53 <MarcoFalke> Though it might be useful for a dev
163 17:53 <jnewbery> ah, good tip!
165 17:55 <AnthonyRonning> MarcoFalke: thanks!
166 17:55 <glozow> so adding `--enable-lcov --enable-lcov-branch-coverage` ?
167 17:56 <maqusat> likely a noob question but why do tests use COINBASE_MATURITY constant?
168 17:56 <MarcoFalke> jup
169 17:56 <jonatack> maqusat: The block reward of coinbaseoutput.nValue (50) BTC/block matures after COINBASE_MATURITY (100) blocks
170 17:57 <MarcoFalke> maqusat: Good question! Spending a coin before COINBASE_MATURITY confirmations is not allowed, so to add any transaction the the mempool the fuzz target needs to mine at least COINBASE_MATURITY+1 blocks
171 17:57 <MarcoFalke> I think this one mines 2*COINBASE_MATURITY
172 17:58 <glozow> and it's still possible to pull an outpoint that's premature right?
173 17:58 <jonatack> this is why, in the functional test, you often see generate(101) in the test setup (COINBASE_MATURITY + 1)
175 17:58 <jonatack> tests*
176 17:58 <MarcoFalke> glozow: Only for the tx_pool target, IIRC
177 17:59 <maqusat> does coinbase mean a special type of transaction that can't be spent without 100 confirmations?
178 17:59 <glozow> ah right, `if (outpoints.size() >= COINBASE_MATURITY) break;`
179 18:00 <MarcoFalke> maqusat: The coinbase transaction is the first transaction in the block and can't be spent before 100 confirmations
180 18:00 <MarcoFalke> #endmeeting