Add make check-valgrind to run the unit tests under Valgrind (tests)

Dec 18, 2019

https://github.com/bitcoin/bitcoin/pull/17639

Host: jonatack - PR author: practicalswift

Notes

Definitions

Undefined Behavior (UB): there are no restrictions on the behavior of the program if certain rules of the language are violated. Examples of undefined behavior are uninitialized scalars, memory access outside of array bounds, signed integer overflow, null pointer dereference, access to an object through a pointer of a different type, etc. Compilers are not required to handle undefined behavior (although many simple situations are diagnosed), and the compiled program is not required to do anything meaningful.
Uninitialized variables are Undefined Behavior, a typical outcome of which is reading stale stack or register values. In the best case, it generates a pseudorandom result which in edge cases might be the expected result or fixed by the compiler. In the worst case, an exploit may be created that leaks secrets by allowing use of an attacker-controlled value.
MemorySanitizer (MSan) (wiki) is a tool that detects uninitialized reads. It consists of a compiler instrumentation module and a run-time library. Typical slowdown is 3x and memory usage is 2x. MSan is supported on Linux/NetBSD/FreeBSD and requires compiling and linking all the code in your program, including libraries it uses and the C++ standard library, with -fsanitize=memory -fPIE -pie.
Valgrind is a debugging and profiling tool suite to make programs faster and more correct. Its most popular tool, Memcheck, can detect memory-related errors common in C and C++ programs that can lead to crashes and unpredictable behavior. Here is a tutorial. Memcheck is not perfect: typical slowdown is 3-10x, memory usage is 2x, it can produce false positives (there are mechanisms for suppressing these; see the valgrind.supp file in Bitcoin Core), and it doesn’t currently detect out-of-range reads or writes to arrays allocated statically or on the stack.

Events

On 10 November 2019, bitcoinVBR opened PR 17433 “fix nMinerConfirmationWindow not initialized”. The PR was misunderstood as a style change and closed.
On 12 November, bitcoinVBR opened a similar change in PR 17448 “remove unused variable- consensus.nMinerConfirmationWindow”. Despite a bad merge by the author while making the PR, the issue had became clear.
On 14 November, a clean third version was merged, PR 17449 “fix uninitialized variable nMinerConfirmationWindow”. It was for this 2-line fix that the Bitcoin Core 0.19.0.1 patch release was made – an unusual event.
What was the issue? An uninitialized read in PR 16713, merged 27 September 2019, that got past review by more than a half dozen reviewers (and also your host today).
How did this flaw escape detection?
- The uninitialized read (of nMinerConfirmationWindow) took place on mainnet and testnet but not on regtest.
- Memory sanitizer was not enabled in the tests and CI.
- The compilers were “fixing” our code in both Clang and GCC with the Bitcoin Core default setting of -O2.
On 30 October, PR 15921 – a review club PR with quite a few ACKs – was merged which contained another uninitialized read. Bitcoin Core contributor practicalswift found the issue by compiling Bitcoin with -fsanitize=memory and with valgrind, which both caught the issue. It was fixed in PR 17624.
On 29 November, practicalswift opened PR 17633 to enable running the functional tests with valgrind memcheck, which was recently merged. Note that the valgrind --exit-on-first-error=yes option is available since version 3.14 on Linux. Valgrind support on macOS may be more limited. A PR is open to automate setting that exit flag when running functional tests with valgrind.
On 30 November, practicalswift opened today’s PR 17639 to enable running the unit tests with valgrind memcheck.

What can we do to mitigate uninitialized variables?

Recommended reading: the discussion in PR 17627 “Suppress false positive warning about uninitialized entropy buffers”.
Test/fuzz/review better… or just write perfect code.
- We do these things (test, fuzz, and review), but the recent issues show that it’s not enough.
- Attackers may find what programmers don’t think of, and what fuzzers don’t hit.
Compile with -Werror=uninitialized or -Wconditional-uninitialized and fix all uninitialized variables in the codebase.
- A fair amount of code to change.
- The value chosen to initialize doesn’t necessarily make sense.
- Code may not express intent anymore.
- Similarly, static analysis isn’t smart enough.
Use MemorySanitizer or valgrind.
- MSan requires compiling everything in the process, which is cumbersome.
- Can’t deploy in production.
- Several times slower and memory-hungry.
- Resource exhaustion can generate false positives.
Pre-initialize variables with dummy values, e.g. compile with Clang -ftrivial-auto-var-init=pattern.
- Raises testing sensitivity concerns.
- If a flaw is introduced, it may be undetectable by valgrind.
- Compilers can warn when they’re certain a value will be undefined, and pre-emptive dummy initialization suppresses those warnings.
- Valgrind has special macros that can be used to mark memory as undefined. It may be best if dummy initialization were to be always done via a macro that would allow disabling it for testing, or valgrind annotating it.

Sources/resources (credit to practicalswift for most of these)

RFC: Enabling commonly enabled compiler diagnostics.
Guide to setting up MSan for Bitcoin Core.
Reading list on the pros/cons of default initialization.
Video of JF Bastien talk “Mitigating Undefined Behavior”, LLVM Developers’ Meeting, December 11, 2019.
Blog post “Building better software with better tools: sanitizers versus valgrind”, May 2019.

Questions

Did you review the PR? Concept ACK, approach ACK, ACK <commit>, or NACK? Were you able to run the unit tests with valgrind? Don’t forget to put your PR review on GitHub or ask questions.
Have you tried reproducing and catching the uninitialized read errors that were found?
Which options do you use – or plan to begin using – to catch these?
What would be the best way to generalize detecting uninitialized variables reliably in Bitcoin Core?
Extra credit: can you improve the PR to run the tests in parallel in a way that prints generated suppressions to the user and does not potentially introduce false positives due to resource exhaustion?

Meeting Log

18:33

<jonatack> Hi everyone, we'll get started a little under a half hour

18:34

<jonatack> About the session notes today: if you have any corrections or improvements to suggest, please share!

18:35

<jonatack> I'd like the notes, questions, and discussion to be a resource that people can use or come back to on this subject.

19:00

<jonatack> #startmeeting

19:00

<MarcoFalke> hi

19:00

<fanquake> hi

19:00

<emzy> Hi

19:00

<jonatack> Hi all! Welcome to this week's episode of the Bitcoin Core PR Review club!

19:00

<amiti> hi

19:00

<jnewbery_6> hi

19:00

<jonatack> #topic Today we are looking at PR 17639, "Add make check-valgrind to run the unit tests under Valgrind (tests)".

19:00

<jonatack> We usually start Bitcoin Core IRC meetings with a 'hi' so it's clear who's at keyboard. Feel free to say hi, even if you arrive in the middle of the meeting!

19:00

<jkczyz> hi

19:00

<jonatack> Please jump in at any point with thoughts and questions. Don't be shy! This discussion is about your thoughts and input.

19:00

<jonatack> If you didn't have the chance to read the notes yet, I encourage you to do so.

19:00

<michaelfolkson> hi

19:00

<jonatack> I think they provide a good background summary for this discussion and for reviewing the PR.

19:00

<jonatack> For today's episode, I thought we could begin by talking about review of this PR.

19:00

<jonatack> And then springboard from the PR out to a larger discussion on preventing errors like this

19:01

<jonatack> both on an individual level of integrating helpful tools into our testing and reviewing workflow

19:01

<michaelfolkson> Excellent

19:01

<michaelfolkson> *notes

19:01

<jonatack> and on the broader level of the Bitcoin Core project itself

19:01

<kanzure> hi

19:01

<jonatack> which is certainly an issue that the maintainers are thinking about

19:01

<jonatack> That said, feel free to digress, comment, and ask questions at any time without waiting. Just jump in and go for it. The more the merrier!

19:01

<jonatack> michaelfolkson: ty

19:02

<jonatack> #topic Did you review the PR? Thoughts? Concept ACK, approach ACK, ACK <commit>, or NACK?

19:02

<jonatack> Were you able to run the unit tests with valgrind?

19:03

<jonatack> For me, they start but then seem stalled at "Running 387 test cases...", so i need to debug that

19:03

<michaelfolkson> Not on Mac OS. Getting set up on Ubuntu now

19:04

<amiti> I was v disappointed to find out that valgrind is incompatible with catalina

19:04

<jonatack> yes, macOS users get better clang and worse valgrinding, iiuc

19:04

<emzy> ACK. But I had to remove "--exit-on-first-error=yes --error-exitcode=1" from the makefile. Because my debian 10 has only valgrind-3.12.0.SVN

19:05

<jonatack> emzy: good point, that flag is v3.14 and later

19:05

<emzy> jonatack: right

19:06

<jonatack> i was surprised that the unit tests hang, because the functional tests run for me with the --valgrind flag (though they time out and i have to fiddle with the suppresions file)

19:06

<jonatack> suppressions*

19:06

<fanquake> emzy how are you installing it? Debian 10 has Valgrind 3.14: https://packages.debian.org/buster/valgrind

19:07

<jonatack> note that there is a good first issue opened by MarcoFalke to fix the functional test valgrind timeouts to make it more useable

19:08

<emzy> fanquake: my bad. I have Debian GNU/Linux 9.11 (stretch) on that machine... I have to update.

19:08

<MarcoFalke> Jup, it is for the ci/travis run. Looks like someone is working on it already: https://github.com/bitcoin/bitcoin/pull/17770

19:09

<andrewtoth_> hi

19:10

<jonatack> nice. the GFI tag seems to work well.

19:10

<jonatack> emzy: so today's PR worked for you after rm-ing that flag? How long did the unit tests take to run?

19:11

<emzy> jonatack: not shure. I will run it now, and time it.

19:12

<michaelfolkson> Will they be done by the end of this meeting?

19:12

<MarcoFalke> I'd say no

19:13

<jonatack> emzy: i'm interested, because reading the manual and searching online i saw varying perf figures: 3x, 10x, 30-40x, 50x, and up to 300x slowdowns

19:13

<jonatack> it's module-dependent, but here we're interested in the default memcheck perf, i think

19:14

<emzy> I think it was about 20 min. But I was not watching :)

19:14

<emzy> I was building without qt.

19:14

<jonatack> emzy: be sure to put your review on GH when you can.

19:15

<emzy> ok

19:15

<jonatack> topic Have you tried reproducing and catching the uninitialized read errors that were found?

19:15

<jonatack> #topic Have you tried reproducing and catching the uninitialized read errors that were found?

19:15

<emzy> no

19:17

<MarcoFalke> I think both of them need a node that has finished IBD

19:17

<jonatack> I did a few days ago while reviewing one of practicalswift's valgrind PRs. MarcoFalke: yes i was out of IBD.

19:18

<jonatack> Thanks to help from MarcoFalke and practitalswift. The main issue for me was the suppressions file needed additions.

19:20

<michaelfolkson> One of them needed booting up on mainnet to replicate I read but not full IBD. Made me feel bad about always running tests on regtest

19:20

<jonatack> Question: which of the tools mentioned in the notes do you like using for your own workflow, or plan to adopt?

19:22

<jonatack> e.g. MSan, valgrind, -Werror= / -Wconditional=, etc

19:22

<jonatack> each of them seems to have tradeoffs

19:23

<jonatack> MSan is cumbersome to set up and use AFAICT

19:23

<jonatack> Valgrind seems even slower than MSan

19:23

<michaelfolkson> MSan works on Mac?

19:24

<michaelfolkson> It appears no

19:24

<amiti> I tried using the memory sanitizer on mac & ran into issues as well

19:25

<jonatack> MSan compat: in https://clang.llvm.org/docs/MemorySanitizer.html it says Linux/NetBSD/FreeBSD

19:25

<jonatack> I'm learning here too

19:26

<jonatack> amiti: can you describe the issues you encountered?

19:26

<michaelfolkson> Regardless these tools are intense. Are we moving to a world with two tiers of testers? Most that don't run these tools and a small minority that do?

19:26

<amiti> I'm trying to remember ...

19:27

<jonatack> I have not yet tried MSan, only valgrind and various clang compile flags

19:27

<emzy> would it be enough to run Valgrind on one OS. Or have you also check on others?

19:28

<jonatack> practicalswift did a tutorial here on installing MSan in a Bitcoin Core issue: https://github.com/bitcoin/bitcoin/issues/17620#issuecomment-559156227

19:28

<jonatack> he walked provoostenator through it to help get him set up

19:29

<michaelfolkson> Or we try to identify which tests should be run with Valgrind, MSan and which don't need to be...

19:29

<MarcoFalke> emzy: In theory it would have to be run on each shipped binary (thus each OS), because slightly different code is compiled for each OS

19:29

<MarcoFalke> michaelfolkson: All tests should be run in valgrind, msan, ... etc :)

19:30

<emzy> ok makes sense. could be that there are OS specific code segments.

19:31

<michaelfolkson> MarcoFalke: But there could be two tiers of testers? A non-Valgrind/MSan tested ACK and a Valgrind tested ACK?

19:31

<MarcoFalke> emzy: Jup

19:31

<jonatack> MarcoFalke: how feasible does it seem to have CI run these (assuming the CI provider works properly...)

19:32

<MarcoFalke> jonatack: It already does :)

19:32

<jonatack> oh?

19:32

<emzy> nice

19:32

<jonatack> Things move fast on Bitcoin Core :)

19:32

<MarcoFalke> Heh, this is on current master: https://travis-ci.org/bitcoin/bitcoin/jobs/626430212

19:32

<michaelfolkson> Ah

19:33

<MarcoFalke> It wraps all binaries in valgrind (see the wrap-valgrind step and https://github.com/bitcoin/bitcoin/blob/master/ci/test/wrap-valgrind.sh )

19:34

<MarcoFalke> Then runs the tests as normal (make check && ./test/functional/test_runner.py)

19:34

<MarcoFalke> Well, not all functional tests

19:34

<MarcoFalke> See https://github.com/bitcoin/bitcoin/pull/17770

19:35

<emzy> "make check-valgrind" took 21 minutes for me. "./configure --enable-hardening --without-gui --disable-wallet"

19:36

<jonatack> emzy: that's reasonable, something i could run on critical PRs

19:37

<emzy> I will try again on Debian 10. Maybe the newer valgrind is slower.

19:38

<jonatack> MarcoFalke: Is there a CI build that runs memory checks than valgrind? MSan, Werror, etc?

100

19:38

<jonatack> other than*

101

19:38

<MarcoFalke> not yet

102

19:38

<jonatack> What plans?

103

19:39

<MarcoFalke> I presume the -Werror=uninitialized had a lot of false positives?

104

19:39

<MarcoFalke> MSan is a subset of what valgrind does, so it doesn't seem worth it

105

19:40

<MarcoFalke> Source: https://github.com/google/sanitizers/wiki/MemorySanitizer (search for "subset")

106

19:41

<jonatack> This article suggests some advantages of sanitizers over valgrind https://lemire.me/blog/2019/05/16/building-better-software-with-better-tools-sanitizers-versus-valgrind/

107

19:42

<jonatack> MarcoFalke's link: "MSan implements a subset of functionality found in Valgrind (Memcheck tool). It is significantly faster than Memcheck (TODO:benchmark)."

108

19:43

<jonatack> #topic Is it feasible to run the unit tests in parallel with valgrind?

109

19:43

<MarcoFalke> I mean I won't object someone contributing a ci run that builds with msan

110

19:43

<MarcoFalke> jonatack: Yes, travis does it

111

19:44

<MarcoFalke> See my link above

112

19:45

<MarcoFalke> Takes about 30 minutes (with wallet unit tests, etc), but it works

113

19:45

<jonatack> I recall the Valgrind manual has a section about running in parallel

114

19:45

<jonatack> MarcoFalke: great!

115

19:45

<fanquake> 27m26.144s to run make check-valgrind inside a debian container on my machine

116

19:46

<fanquake> That's with minimum depends, so skipping a bunch of tests

117

19:47

<jonatack> I'd like to get in the habit of running individual unit tests with valgrind. For example, after working on a new test, doing a final valgrind check before pushing.

118

19:50

<michaelfolkson> I have a question on initialization. I'm trying to reconcile the advice from the video, basically "always initialize local variables" and Greg's "No constant is an acceptable random value"

119

19:51

<michaelfolkson> Greg's advice is Bitcoin specific right? Because a constant would be more dangerous than an uninitialized variable?

120

19:51

<jonatack> MarcoFalke: I don't recall false positives with the (few) times I tried -Werror but it was with the --enable-werror flag, which might be different?

121

19:52

<jonatack> michaelfolkson: good question

122

19:52

<MarcoFalke> michaelfolkson: How is valgrind supposed to find the uninitizlied variable when it was initialized by you?

123

19:53

<MarcoFalke> And how do you know what value to inizialize to when the value is supposed to be passed in or parsed from somwhere else?

124

19:54

<michaelfolkson> MarcoFalke: It isn't. But that's not the only reason we're using Valgrind?

125

19:54

<jonatack> michaelfolkson: as you probably remarked, i recapped gregory's thoughts in the notes about pre-initialising with dummy values, but i need to re-watch the video with those caveats in mind

126

19:54

<michaelfolkson> Greg's comment here: https://github.com/bitcoin/bitcoin/pull/17627#issuecomment-562387426

127

19:55

<michaelfolkson> And the video I'm referring to is here: https://www.youtube.com/watch?v=I-XUHPimq3o

128

19:55

<michaelfolkson> From Jon's notes

129

19:55

<MarcoFalke> michaelfolkson: no, we use valgrind for other checks as well (e.g. use-after-free, I believe)