The PR branch HEAD was a076eb6 at the time of this review club meeting.
Notes
Context
User-Space, Statically Defined Tracing (USDT) allows peeking into runtime
internals at statically defined tracepoints of user-space applications
(Bitcoin Core in our case).
Build support and TRACEx macros based on
systemtapssys/sdt.h for USDT were
merged in PR19866.
We can hook into the tracepoints with tracing scripts via the Linux kernel.
If we donβt hook into the tracepoints, then they are
NOPs and have little to no
performance impact.
The tracepoints can pass data back to the tracing script, which contains the
tracing logic, e.g. to collect statistics, print or visualize data, give
alerts, etc.
Tracepoints need to be somewhat generic to allow for reusability in different
tracing scripts, but they also need a clear use case. There is no need to
plaster the code with (unused) tracepoints.
There are currently two main tools for writing USDT scripts and other tools
are under development:
Hooking into these tracepoints works via a technology called
eBPF. Think of it as a small virtual machine (VM) in your
Linux kernel where you can run sandboxed eBPF programs (even if there is a
problem with your eBPF program, it canβt crash or otherwise harm your kernel).
The tracing scripts compile and then load eBPF bytecode into this VM. When
attached to a tracepoint, the eBPF program is called with the arguments passed
to the tracepoint.
Based on your use case, the eBPF program can, for example, filter the data or
pass it along to the tracing logic in the tracing script. The eBFP VM is quite
limited. For example, it has a stack size of 512 bytes.
ββββββββββββββββββββ ββββββββββββββββ
β tracing script β β bitcoind β
β==================β 2. β==============β
β eBPF β tracing β hooks β β
β code β logic β intoβββ€βΊtracepoint 1ββΌββββ 3.
ββββββ¬ββββ΄βββ²βββββββ βββ€βΊtracepoint 2 β β pass args
1. β β 4. β β ... β β to eBPF
User compiles β β pass data to β ββββββββββββββββ β program
Space & loads β β tracing script β β
ββββββββββββββββββΌβββββββΌββββββββββββββββββΌβββββββββββββββββββββΌβββ
Kernel β β β β
Space ββββ¬ββΌβββββββ΄ββββββββββββββββββ΄βββββββββββββ β
β β eBPF program βββββββββ
β βββββββββββββββββββββββββββββββββββββββββ€
β eBPF kernel Virtual Machine (sandboxed) β
ββββββββββββββββββββββββββββββββββββββββββββ
1. The tracing script compiles the eBPF code and loads the eBFP program into a kernel VM
2. The eBPF program hooks into one or more tracepoints
3. When the tracepoint is called, the arguments are passed to the eBPF program
4. The eBPF program processes the arguments and returns data to the tracing script
The PR includes examples and documentation on how to run them.
For building Bitcoin Core with USDT support, you need the sys/sdt.h headers
(when present, USDT support is automatically compiled in). On Debian-like
systems you can install the package systemtap-sdt-dev (this is not yet
documented in the PR).
As an exercise for reviewers: You can try to build Bitcoin Core with USDT
support, list the available tracepoints (see doc/tracing.md), try out the
example scripts (see contrib/tracing.md), and even add a custom tracepoint
and tracing script.
Questions
What is the difference between USDT and using an uprobe to trace Bitcoin
Core? What do they have in common? Why do we add tracepoints if we can just
use uprobes? (Hint: see this
comment
and the following ones in
PR19866.)
Why shouldnβt we do any βexpensiveβ operations just to pass extra data into
tracepoints? What are examples of such expensive operations?
Why are root privileges required to run tracing scripts?
What is eBPF (aka BPF), and how do we utilize it for USDT?
For debugging and monitoring of the peer-to-peer code, it can be useful to
log the raw P2P message bytes. Is this possible with USDT and eBPF? What are
limiting factors? Why?
Discussion: Should USDT be supported in Bitcoin Core release builds?
Discussion: Do you have ideas for places where static tracepoints make sense?
See issue #20981 for
inspiration.
Discussion: Can the tracepoints be automatically tested? Could they even help
in functional testing?
<b10c> Also interesting for me: Who was able to build bitcoind with USDT support, listed tracepoints, ran an example, and who experimented with their own tracepoints and scripts?
<b10c> Can someone explain the difference between USDT and using an uprobe to trace Bitcoin Core? What do they have in common? Why do we add tracepoints if we can just use uprobes?
<b10c> and static tracepoints allow us to write scripts that will (hopefully) still work in 6 months as we are targeting a semi stable tracepoint API. Functions likely change over time
<b10c> next Q: When adding tracepoints, why shouldnβt we do any βexpensiveβ operations just to pass extra data into tracepoints? What are examples of such expensive operations?
<michaelfolkson> It gets grey between tracing, logging and debugging for me. I would say acquiring locks that weren't acquired in the original code would be debugging?
<lightlike> I don't understand this completely: Wouldn't the code in the TRACE6 parts ignored by the compiler if the user is not interested in tracepoints and doesn't activate them?
<LarryRuane> It's probably okay, for this purpose, to read variables that are normally lock-protected, right? You might get inconsistent data, but that's probably acceptable here. (?)
<b10c> lightlike: if you compile a bitcoind without USDT support, then nothing is different. If you compile with USDT support and don't hook into it then there is an extra NOP
<jb55> sipa: yes with bcc you can do the formatting within the kernel/ebpf vm, but it's extra work and you can't use simple tools like bpftrace (but they could add that feature over time)
<jb55> there are more than just USDTs, you can dynamically trace any function within the codebase with uprobes (function enter) and uretprobes (function return). really handy for tracing executing codepaths.
<b10c> Some of you might work on P2P code and want to use USDT to debug a new P2P message: Can USDT and eBPF be used to e.g. log the raw P2P mesage bytes?
<michaelfolkson> So for that reason and it not being on MacOS (and possibly other reasons other too) P2P message logging would still be used e.g. https://bitcoincore.reviews/19509
<b10c> jb55 yes. IIRC you wouldn't be able to e.g. print the pointer data in bpftrace scripts as the printf string itself is limited to something <512 byte
<jb55> glozow: yes, that's one of the connectblock examples. but keep in mind plugging in a tracepoint does have performance implications. but if you're fine with comparing differences between traced IBDs it works great.
<b10c> the contrib/tracing/log_raw_p2p_msgs.py example logs raw P2P messags as hex and prints a warning if the message was larger than 32kb and might be cut-off
<jb55> the connectblock one was just me wanting to time IBDs more accurately. The p2p was one of the motivating reasons for me to add eBPF support, to potentially avoid ad-hoc logging everywhere (even if that's not possible right now for portability reasons)
<jb55> some others that were suggested: more accurate coincache memory/perf tracking. any others that ya'll think of that might be useful, feel free to suggest!
<jb55> laanwj talked about looking at the traces added to jlopp/statoshi to get some ideas, if we could hook prometheus into any core node at runtime, that would be dope.
<jnewbery> I may be wrong, but I see eBPF being most useful for understanding global application performance. Logging/message dumping still has a place for understanding message flows for particular peers, etc.