Add tx_pool fuzz target (tests)

Mar 17, 2021

https://github.com/bitcoin/bitcoin/pull/21142

Host: MarcoFalke - PR author: MarcoFalke

The PR branch HEAD was bcf96cd at the time of this review club meeting.

Notes

Fuzzing is an effective way to find vulnerabilities in code before it is merged or released. A wiki page collects the issues fuzzing could identify.
Bitcoin Core has over 160 individual fuzz targets, but the code coverage of consensus, validation and P2P code paths is still low. For example MemPoolAccept::AcceptSingleTransaction never gets past PreChecks and exits early.
MemPoolAccept is used to accept (or reject) transactions from remote peers, the wallet and the RPC interface.
The existing process_message fuzz target can also test messages of type tx. It works by initializing a blockchain, then it lets the fuzz engine create a message (e.g. tx) from a peer to be processed.
The new tx_pool fuzz target aims to fuzz mempool acceptance more efficiently.
Refer to the fuzzing doc on how to run a fuzz engine on your machine.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
Why does the existing process_message_tx fuzz target perform so poorly when it comes to mempool acceptance?
Why does the newly added tx_pool fuzz target achieve higher coverage in MemPoolAccept than the process_message_tx target?
Is it expected to see more transactions rejected or accepted to the mempool in the tx_pool target? Why? You may collect evidence to support your answer by debugging the tx_pool target or by assuming all fields in the ConsumeTransaction helper are initialized by values picked to be uniformly random. Real fuzz engines do not pick values uniformly randomly, but this assumption is good enough for now.
How does the tx_pool_standard fuzz target improve upon the tx_pool fuzz target even further?
Do you have other ideas for improvement?

Meeting Log

17:00

<MarcoFalke> #startmeeting

17:00

<glozow> hi

17:00

<amiti> hi

17:00

<jnewbery> hi!

17:00

<maqusat> hi

17:00

<comment> hi

17:00

<emzy> hi

17:00

<OliP> Hi

17:00

<michaelfolkson> hi

17:00

<AnthonyRonning> hi

17:00

<Keikun> Hi

17:00

<cguida1> hi

17:00

<MarcoFalke> as always, don't ask to ask, just ask ;)

17:00

<MarcoFalke> any first time reviewers today?

17:00

<Keikun> yep!

17:01

<glozow> it's my first time reviewing a fuzz pr :3

17:01

<MarcoFalke> Keikun: Welcome!

17:01

<MarcoFalke> glozow: Welcome to fuzzing! :)

17:01

<jarolrod_> hi

17:01

<sipa> hi

17:01

<jonatack> hi (and hi Keikun!)

17:02

<MarcoFalke> ok, let's get started...

17:02

<MarcoFalke> Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

17:02

<AnthonyRonning> tACK - code looks good to me & ran fuzzer locally overnight

17:02

<felixweis> hi

17:02

<emzy> Tested ACK

17:02

<michaelfolkson> n Sad times :(

17:02

<glozow> y, got the fuzz running and did a code review

17:02

<amiti> easy to give it a concept ACK, seems to clearly increase coverage :)

17:03

<jarolrod_> concept ACK, got it running, but don't know much about fuzzing yet :)

17:03

<maqusat> Concept ACK

17:03

<MarcoFalke> ok, before we jump into the fuzz target, let's cover another target really quick

17:03

<cguida1> didn't get to running it, still researching fuzzing

17:03

<MarcoFalke> Why does the existing process_message_tx fuzz target perform so poorly when it comes to mempool acceptance?

17:04

<MarcoFalke> cguida1: Did you have issues compiling? I'll be around after the meeting to troubleshoot a bit

17:04

<AnthonyRonning> mempool acceptance is pretty strict, it seems like the process_message fuzzing is random bytes of data in general.

17:05

<cguida1> MarcoFalke: unfortunately I was just busy with other stuff and didn't get to it :/

17:05

<MarcoFalke> AnthonyRonning: Correct

17:05

<glozow> yeah i imagine it'd take a lot of random tries to get a valid tx

17:05

<michaelfolkson> I saw glozow did the fuzzing on Docker, I've also had problems in the past with fuzzing on Mac

17:06

<MarcoFalke> Depending on the fuzz engine, it might even be practically impossible to get a valid tx

17:06

<AnthonyRonning> both add value though, I like that process_message does test random data because any peer could send us all that

17:06

<MarcoFalke> michaelfolkson: Yeah, the easiest way to get this running is to install Ubuntu, but we can cover compile issues after the meeting

17:07

<emzy> michaelfolkson: I use a linux box (old thin client) for the fuzzing. Don't like to boil my notebook.

17:07

<MarcoFalke> AnthonyRonning: Indeed. I see them as testing different code paths. process_message* is more general at the cost of missing detail

17:08

<MarcoFalke> To wrap up the question. For example predicting a correct prevout hash in the input has a probability of approx. 0

17:09

<MarcoFalke> So all messages in process_message are invalid transactions (or not even transactions)

17:09

<MarcoFalke> Why does the newly added tx_pool fuzz target achieve higher coverage in MemPoolAccept than the process_message_tx target?

17:09

<AnthonyRonning> `tx_pool` creates transactions based on an existing mempool in order to make sure the tx data isn’t so random.

17:09

<glozow> for starters, you can get further than missing-inputs

17:11

<glozow> it also provides a valid script, right? `P2WSH_OP_TRUE`s?

17:11

<MarcoFalke> For reference, the tx_pool target uses ConsumeTransaction: https://github.com/bitcoin/bitcoin/blob/e4e253d73007e0b680d2a473327c6fd66de4d86c/src/test/fuzz/util.cpp#L27

17:12

<AnthonyRonning> would it be benefitial to have an intentionally invalid script for the tx_pool tests?

17:13

<glozow> we have a script fuzzer, I think

17:13

<MarcoFalke> glozow: the tx_pool target consumes any script the tx_pool_standard target picks P2WSH_OP_TRUE (a standard script)

17:13

<cguida1> AnthonyRunning: I don't see why not

17:13

<cguida1> Oh dear, everything is bold now…

17:14

<MarcoFalke> (sorry, insert separator between "script the")

17:15

<MarcoFalke> Any questions about the tx_pool target before we move on?

17:16

<MarcoFalke> Is it expected to see more transactions rejected or accepted to the mempool in the tx_pool target? Why? You may collect evidence to support your answer by debugging the tx_pool target or by assuming all fields in the ConsumeTransaction helper are initialized by values picked to be uniformly random. Real fuzz engines do not pick values uniformly randomly, but this assumption is good enough for now.

17:16

<amiti> yes! at the end, why do we push_back txhash onto txids?

17:17

<MarcoFalke> amiti: txids keeps track of all txids to allow the fuzz engine to pick a valid prevout hash with propability larger than ~0

17:17

<MarcoFalke> https://github.com/bitcoin/bitcoin/blob/e4e253d73007e0b680d2a473327c6fd66de4d86c/src/test/fuzz/tx_pool.cpp#L223

17:17

<MarcoFalke> txids is also passed to the ConsumeTransaction helper

17:18

<MarcoFalke> https://github.com/bitcoin/bitcoin/blob/e4e253d73007e0b680d2a473327c6fd66de4d86c/src/test/fuzz/util.cpp#L38

17:18

<amiti> ah, I misread this and didn't realize its in the while loop. ok if it was a valid txn so use this as an option for the next round

17:18

<MarcoFalke> indeed

17:18

<amiti> clever :)

17:18

<AnthonyRonning> my guess was that tx_pool_standard would have more accepted transactions while tx_pool would have more rejected.

17:18

<AnthonyRonning> i wasn't sure how to debug to validate that idea

17:19

<MarcoFalke> Wild guesses are also allowed

17:19

<cguida1> I agree, I would guess more rejected, since tx_pool allows nonstandard txs? But just a guess. I imagine there are also ways to make sure most are accepted

17:20

<MarcoFalke> (The question is just about the ration of rejected/accepted within the tx_pool target)

17:23

<MarcoFalke> The answer to that question depends heavily on the fuzz engine. Modern fuzz engines can spend more time evolving fuzz inputs that cover rarely hit edges.

17:24

<MarcoFalke> One way to debug this would be to print out the mempool reject reason

17:24

<MarcoFalke> And then collect statistics which ones are the most common ones

17:25

<MarcoFalke> I found dust to be common and obviously an invalid sig

17:25

<glozow> ooo interesting

17:25

<emzy> More general question: How long should you run the fuzzing? What is a good sign to stop?

17:26

<glozow> any "mempool full"s ?

17:26

<glozow> i think that'd be the last possible failure

17:26

<MarcoFalke> glozow: Not yet, but I wasn't running very long

17:26

<AnthonyRonning> so just manually write a line in the code to log a message and that'll show up in the fuzz console output? or some other way with fuzzing libs? Not familar with fuzzing

17:27

<sipa> emzy: it's never ending

17:27

<MarcoFalke> emzy: That is still an open research question, but a good heuristic is to look whether the coverage metric increases

17:27

<sipa> for nontrivial tests there may be code paths that are only found after months or years of combined fuzzing time

17:28

<comment> emzy when one's tried every combination -- sun might go super nova before that though ;]

17:28

<emzy> MarcoFalke: so if it not increases anymore, better stop and change someting?

17:28

<MarcoFalke> emzy: The probability to find something decreases, but it won't be zero

17:29

<MarcoFalke> Unless the target is so trivial that all paths can be enumerated

17:29

<sipa> and presumably, the more interesting things are only found after long periods of time

17:29

<emzy> btw. I know it never stopps but there needs to be some practical messure to stop and move on.

17:29

<sipa> emzy: well, we in aggregate, won't stoo

17:30

<sipa> make sure you use the seeds in the qa repo, and contribute new ones back

17:30

<MarcoFalke> emzy: For a project that changes code every day, there is no "move on"

17:30

<AnthonyRonning> i wasn't exactly sure, but it seemed like the results from a `tx_pool_standard` actually changes the state of the mempool so consequtive runs even on the same input could produce different code paths?

17:30

<MarcoFalke> With changing code, the paths change and some inputs are invalidated and need to be "re-evolved"

17:30

<comment> +1

17:30

<emzy> MarcoFalke: so maybe if the code changed you restart the fuzzing with the new code? ;)

100

17:31

<MarcoFalke> jup

101

17:31

<MarcoFalke> Moving on, How does the tx_pool_standard fuzz target improve upon the tx_pool fuzz target even further?

102

17:31

<glozow> how do we know when we've found a fuzz trophy? a crash?

103

17:31

<MarcoFalke> glozow: The fuzz engine will tell you

104

17:32

<MarcoFalke> afl++, and hongfuzz will tell you in their stats screen. libFuzzer will tell you by crashing

105

17:32

<AnthonyRonning> `tx_pool_standard` will create transactions that are standard & based on a simulated mempool. This leads to better acceptance than creating transactions with completely random amounts, scripts, etc.

106

17:33

<jonatack> FWIW I didn't see accepted = true yet in tx_pool after adding std::cout << "SUCCESS\n"; in the truthy case

107

17:33

<jonatack> (I suppose it may improve with run time)

108

17:34

<MarcoFalke> jonatack: How many CPU hours?

109

17:34

<jonatack> a few minutes :p

110

17:34

<MarcoFalke> leave a comment on the pr if you didn't find any within 2-8 hours or so

111

17:35

<MarcoFalke> Can transaction still be rejected in the tx_pool_standard target?

112

17:36

<MarcoFalke> (Question is made up just now, so don't look into your notes)

113

17:37

<glozow> hm, I thought it would still be possible... nLockTime is random right?

114

17:37

<AnthonyRonning> yeah I think so, things like locktime are still random

115

17:37

<MarcoFalke> For reference we are looking at this part of the code now: https://github.com/bitcoin/bitcoin/blob/e4e253d73007e0b680d2a473327c6fd66de4d86c/src/test/fuzz/tx_pool.cpp#L109

116

17:38

<glozow> I'm comparing `ConsumeTransaction` and the create transaction block in tx_pool_standard. It seems stricter but I don't think it necessarily results in a standard tx?

117

17:38

<MarcoFalke> glozow: Indeed, the tx is not guaranteed to be standard. (only the scripts)

118

17:39

<glozow> ah 💡 that's where the standard comes from

119

17:40

<MarcoFalke> Also, the amounts still might be dust

120

17:41

<MarcoFalke> So, to the last question in the script: Do you have other ideas for improvement?

121

17:42

<glozow> RBF? :)

122

17:42

<MarcoFalke> glozow: Good idea to test the rbf code paths

123

17:42

<MarcoFalke> Was working on this just now, but fighting c++17 auto-template deduction guidelines :)

124

17:42

<glozow> could also pre-populate the mempool to test rejections based on mempool limits

125

17:43

<AnthonyRonning> is there a way to get a visual code coverage for the code paths that were tested? It may help me see what still isn't being reached.

126

17:43

<MarcoFalke> Now is also a good time to ask any questions. I understand that the fuzz code is daunting (especially for newcomers)

127

17:44

<MarcoFalke> AnthonyRonning: Jup, can be generated

128

17:44

<MarcoFalke> Will look like this: (master) https://marcofalke.github.io/btc_cov/fuzz.coverage/index.html

129

17:44

<glozow> I noticed the qa-assets repo is very very large

130

17:44

<MarcoFalke> hey, it is only 3 GB

131

17:44

<glozow> is it possible to condense the seeds or something?

132

17:44

<AnthonyRonning> MarcoFalke: thanks! that's really useful

133

17:44

<glozow> or some kind of more compact way?

134

17:45

<sipa> glozow: yes, though compact is based on some measure of "redundant"

135

17:45

<AnthonyRonning> have the seeds for `tx_pool` been added to qa-assets already?

136

17:45

<MarcoFalke> I think there is a lot of redundancy in the inputs and git will compress them (at least as long as they are in the git pack)

137

17:45

<glozow> AnthonyRonning: probably doesn't make sense until tx_pool is merged

138

17:46

<sipa> but redundant w.r.t. what? your compilation options will influence what is considered interesting

139

17:46

<AnthonyRonning> yeah good point. I wasn't sure how to see or check my seeds, thought it might had to been when passing in qa-assets data

140

17:46

<MarcoFalke> AnthonyRonning: The target might change a bit once I add support for RBF, so we wouldn't want to add stale inputs to the repo

141

17:46

<jnewbery> Is a git repo the best way to store the fuzz assets? Commit history don't seem very interesting/useful

142

17:47

<AnthonyRonning> the only thing i really see at the end of my fuzzing runs are some `slow-unit-*` files, are those anything to be concerned about?

143

17:47

<sipa> AnthonyRonning: how are you invoking it?

144

17:47

<MarcoFalke> jnewbery: Sometimes inputs are deleted because they are no longer relevant for the master branch. But keeping a copy for release branches could make sense

145

17:47

<AnthonyRonning> sipa: `FUZZ=tx_pool src/test/fuzz/fuzz -jobs=31`

146

17:48

<MarcoFalke> AnthonyRonning: How many cores do you have?

147

17:48

<sipa> AnthonyRonning: i think you need to specify a directory name where it'll save seeds

148

17:48

<AnthonyRonning> 64

149

17:48

<AnthonyRonning> sipa: ah okay cool!

150

17:48

<AnthonyRonning> still need to play with fuzzing more, very facinating

151

17:48

<MarcoFalke> slow-* means that the particular input took a long time

152

17:48

<sipa> just put a directory name after the commamd

153

17:48

<AnthonyRonning> sipa: thanks!

154

17:49

<MarcoFalke> This could mean that the input takes a long time to parse and execute by the fuzz engine. Or it could mean there is a DoS vector

155

17:50

<AnthonyRonning> MarcoFalke: oh okay, so maybe false positive, maybe DoS? Is it worth saving those off to investigate further?

156

17:51

<glozow> ooh, how do we recover what the input was? I had a couple of those as well when i didn't specify directory

157

17:51

<MarcoFalke> crash- and slow- are stored in the pwd

158

17:51

<MarcoFalke> (also oom-*)

159

17:52

<jnewbery> MarcoFalke: my point is that unlike the code repo, where history is important, cloning the entire history of the fuzz assets doesn't seem very useful

160

17:52

<MarcoFalke> btw, this is how you generate the colored html output: https://github.com/MarcoFalke/btc_cov/blob/cd1b2a714aa99be3a9fd2bc68a2308c49f36fd76/.cirrus.yml#L43

161

17:53

<MarcoFalke> jnewbery: It isn't useful for CI, which is why CI specifies --depth=1

162

17:53

<MarcoFalke> Though it might be useful for a dev

163

17:53

<jnewbery> ah, good tip!

164

17:54

<MarcoFalke> Sorry, wrong line https://github.com/MarcoFalke/btc_cov/blob/cd1b2a714aa99be3a9fd2bc68a2308c49f36fd76/.cirrus.yml#L65

165

17:55

<AnthonyRonning> MarcoFalke: thanks!

166

17:55

<glozow> so adding `--enable-lcov --enable-lcov-branch-coverage` ?

167

17:56

<maqusat> likely a noob question but why do tests use COINBASE_MATURITY constant?

168

17:56

<MarcoFalke> jup

169

17:56

<jonatack> maqusat: The block reward of coinbaseoutput.nValue (50) BTC/block matures after COINBASE_MATURITY (100) blocks

170

17:57

<MarcoFalke> maqusat: Good question! Spending a coin before COINBASE_MATURITY confirmations is not allowed, so to add any transaction the the mempool the fuzz target needs to mine at least COINBASE_MATURITY+1 blocks

171

17:57

<MarcoFalke> I think this one mines 2*COINBASE_MATURITY

172

17:58

<glozow> and it's still possible to pull an outpoint that's premature right?

173

17:58

<jonatack> this is why, in the functional test, you often see generate(101) in the test setup (COINBASE_MATURITY + 1)

174

17:58

<MarcoFalke> As the setup is expensive, it is done one time before any fuzzing starts: https://github.com/bitcoin/bitcoin/blob/e4e253d73007e0b680d2a473327c6fd66de4d86c/src/test/fuzz/tx_pool.cpp#L20