Undefined Behavior (UB):
there are no restrictions on the behavior of the program if certain rules of
the language are violated. Examples of undefined behavior are uninitialized
scalars, memory access outside of array bounds, signed integer overflow, null
pointer dereference, access to an object through a pointer of a different
type, etc. Compilers are not required to handle undefined behavior (although
many simple situations are diagnosed), and the compiled program is not
required to do anything meaningful.
Uninitialized variables are Undefined Behavior, a typical outcome of which
is reading stale stack or register values. In the best case, it generates a
pseudorandom result which in edge cases might be the expected result or fixed
by the compiler. In the worst case, an exploit may be created that leaks
secrets by allowing use of an attacker-controlled value.
MemorySanitizer (MSan)
(wiki) is a tool
that detects uninitialized reads. It consists of a compiler instrumentation
module and a run-time library. Typical slowdown is 3x and memory usage is
2x. MSan is supported on Linux/NetBSD/FreeBSD and requires compiling and
linking all the code in your program, including libraries it uses and the C++
standard library, with -fsanitize=memory -fPIE -pie.
Valgrind is a debugging and profiling tool suite to
make programs faster and more correct. Its most popular tool, Memcheck, can
detect memory-related errors common in C and C++ programs that can lead to
crashes and unpredictable behavior. Here is a
tutorial. Memcheck is
not perfect: typical slowdown is 3-10x, memory usage is 2x, it can produce
false positives (there are
mechanisms
for suppressing these; see the
valgrind.supp
file in Bitcoin Core), and it doesn’t currently detect out-of-range reads or
writes to arrays allocated statically or on the stack.
Events
On 10 November 2019, bitcoinVBR opened PR
17433 “fix
nMinerConfirmationWindow not initialized”. The PR was misunderstood as a style
change and closed.
On 12 November, bitcoinVBR opened a similar change in PR
17448 “remove unused variable-
consensus.nMinerConfirmationWindow”. Despite a bad merge by the author while
making the PR, the issue had became clear.
On 14 November, a clean third version was merged, PR
17449 “fix uninitialized
variable nMinerConfirmationWindow”. It was for this 2-line fix that the Bitcoin
Core 0.19.0.1 patch release
was made – an unusual event.
What was the issue? An uninitialized read in PR
16713, merged 27 September 2019,
that got past review by more than a half dozen reviewers (and also your
host
today).
On 30 October, PR 15921 – a
review club PR with quite a few ACKs – was
merged which contained another uninitialized read. Bitcoin Core contributor
practicalswift found the issue by
compiling Bitcoin with -fsanitize=memory and with valgrind, which both caught
the issue. It was fixed in PR
17624.
On 29 November, practicalswift opened PR
17633 to enable running the
functional tests with valgrind memcheck, which was recently merged. Note that
the valgrind --exit-on-first-error=yes option is available since version 3.14
on Linux. Valgrind support on macOS may be more limited. A PR is
open to automate setting that
exit flag when running functional tests with valgrind.
On 30 November, practicalswift opened today’s PR
17639 to enable running the unit
tests with valgrind memcheck.
What can we do to mitigate uninitialized variables?
Recommended reading: the discussion in PR
17627 “Suppress false positive
warning about uninitialized entropy buffers”.
Test/fuzz/review better… or just write perfect code.
We do these things (test, fuzz, and review), but the recent issues show that
it’s not enough.
Attackers may find what programmers don’t think of, and what fuzzers don’t
hit.
Compile with -Werror=uninitialized or -Wconditional-uninitialized and fix
all uninitialized variables in the codebase.
A fair amount of code to change.
The value chosen to initialize doesn’t necessarily make sense.
Code may not express intent anymore.
Similarly, static analysis isn’t smart enough.
Use MemorySanitizer or valgrind.
MSan requires compiling everything in the process, which is cumbersome.
Can’t deploy in production.
Several times slower and memory-hungry.
Resource exhaustion can generate false positives.
Pre-initialize variables with dummy values, e.g. compile with Clang -ftrivial-auto-var-init=pattern.
If a flaw is introduced, it may be undetectable by valgrind.
Compilers can warn when they’re certain a value will be
undefined, and pre-emptive dummy initialization suppresses those warnings.
Valgrind has special macros that can be used to mark memory as undefined. It
may be best if dummy initialization were to be always done via a macro that
would allow disabling it for testing, or valgrind annotating it.
Sources/resources (credit to practicalswift for most of these)
<jonatack> We usually start Bitcoin Core IRC meetings with a 'hi' so it's clear who's at keyboard. Feel free to say hi, even if you arrive in the middle of the meeting!
<jonatack> i was surprised that the unit tests hang, because the functional tests run for me with the --valgrind flag (though they time out and i have to fiddle with the suppresions file)
<jonatack> emzy: i'm interested, because reading the manual and searching online i saw varying perf figures: 3x, 10x, 30-40x, 50x, and up to 300x slowdowns
<michaelfolkson> Regardless these tools are intense. Are we moving to a world with two tiers of testers? Most that don't run these tools and a small minority that do?
<jonatack> MarcoFalke's link: "MSan implements a subset of functionality found in Valgrind (Memcheck tool). It is significantly faster than Memcheck (TODO:benchmark)."
<jonatack> I'd like to get in the habit of running individual unit tests with valgrind. For example, after working on a new test, doing a final valgrind check before pushing.
<michaelfolkson> I have a question on initialization. I'm trying to reconcile the advice from the video, basically "always initialize local variables" and Greg's "No constant is an acceptable random value"
<jonatack> MarcoFalke: I don't recall false positives with the (few) times I tried -Werror but it was with the --enable-werror flag, which might be different?
<jonatack> michaelfolkson: as you probably remarked, i recapped gregory's thoughts in the notes about pre-initialising with dummy values, but i need to re-watch the video with those caveats in mind