User-Space, Statically Defined Tracing (USDT) allows observability into runtime internals at
statically defined tracepoints. We have discussed USDTs at previous PR Review Club meetings,
#22006 and #23724.
Coin Selection is the process of selecting UTXOs (“coins”) from a wallet’s UTXO pool in order to
fund a transaction’s payment(s). We have discussed coin selection at preview PR Review Club
meetings, including #22009, #17526 and #17331.
PR #24644 adds tracepoints to the wallet’s coin
<svav> Once a tracepoint is reached, it can pass data about process internals to a userspace script for further processing. This is great for observability and allows for debugging, testing, and monitoring.
<larryruane> The other difference I think of with tracepoints is that, with logging, you can later process the log file (to summarize what's in it), but there could be a huge amount of disk space consumed ... with tracepoints, you can sort of "compress" the information on the fly
<sipa> I think another important difference is that logging is an actual action, where tracepoints are just hooks that an external process can plug into. A tracepoint on itself does nothing unless something uses it.
<larryruane> i have a really basic question, when a thread hits an active tracepoint, does the thread suspend until the data is received by the tracing script? Or is there a memory queue of tracing events so the thread can continue asynchronously?
<b10c> larryruane: in assembly, the tracepoint is a literal NOP (no operation). If we tell the kernel to hook into the tracepoint, it executes a small eBPF bytecode program e.g. adding data to a eBPF map where it can be read asynchronously from a userspace program
<Murch> glozow: The tracepoints collect information on the algorithms that produced the input selection, the total amount selected, the waste score, some details on fees, change output position, and whether the solution is `avoid_partial_unspents` compliant
<Murch> While it's easy to evaluate a coin selection algorithm on a single situation (UTXO pool and selection target), the overall problem we're interested in is the emergent behavior of various algorithms over longer scenarios of payment sequences and feerates.
<achow101> pop: the tracepoints let us do simulations. prior to tracepoints, these simulations required an aditional patch which adds a couple of globals and RPCs that let us measure some things. but these could not be upstreamed so had to be maintained separately
<pop> Murch: So without net tracepoints to track coinselection metrics over longer periods and map the interactions between multiple algorithms, there is no way to evaluate the emergent behavior of coin selection?
<Murch> pop: Previously we had either created separate simulation frameworks or modified a copy of Bitcoin Core to add the corresponding logging. Having the tracepoints allows us to keep the log generation and processing in a separate project which makes it easier to apply the tracing to many different states of the codebase
<b10c> PaperSword: this guidance is relevant for people who run a bitcoind with tracing support, but don't hook into the tracepoints. Release builds have tracepoint support, so we assume that's the case for a majority of our users.
<larryruane> so do we ask important bitcoind users, such as exchanges, to enable tracepoints to learn about real-world behavior? (and send us results) Or is the intention that tracepoints are only for us dev types?
<PaperSword> glozow: I was already able to take a look at the metric, but spent a lot of time trying to get the tests in this PR to pass on RHEL linux. I was unable to successfuly run the tests from this PR.
<b10c> theStack: It makes sense to have the tracepoints in release builds to allow people to trace their production setups. Switching binaries is often not something you want to do if your trying to debug a problem.
<Murch> larryruane: We currently have three datasets, an online gambling service'¿ payment sequence, a merchant's inbound payments, and another services payments. Still looking for something representative of individual users
<Murch> Yeah, maybe to make it clear, this is not a telemetry function, it's just for users to hook into stuff running on their own computer. Our simulation scenarios are merely lists of the incoming and outgoing amounts they've processed without additional information (and slightly fuzzified amounts for privacy)
<achow101> a1ph4byte: we want to have our automatic coin selection behave in a smart way, e.g. reduce fees when feerates are high, consolidate more when feerates are low. a simplistic algorithm can result in unexpected or undesired behavior
<pop> achow101: Murch: So the important thing is to simply have data that has a close relationship with actual usage. The specific relationship between payment and feerate isn't critical at this point in coin selection algorithm development?
<theStack> PaperSword: indeed, needing to run as root seems to be quite of a drawback (though i think someone mentioned at #bitcoin-core-dev recently that laanwj managed to get them to run without root... maybe someone knows more details)
<b10c> theStack: currently we only make data available via tracepoints that's already present in the function they are called in. that means, transactions and blocks might need additional serialization which could be "expensive"
<glozow> PaperSword: yeah, C-strings don't know their own length so you just need to look for \0 when parsing. I don't know the answer to "why does it need to be a C-string," I assumed we have to use primitive data types or something
<Murch> "Previously we had either created separate simulation frameworks or modified a copy of Bitcoin Core to add the corresponding logging. Having the tracepoints allows us to keep the log generation and processing in a separate project which makes it easier to apply the tracing to many different states of the codebase"
<theStack> sipa: there are fixed size array parameters possible though, isn't it? like "void foo(int bla)"... at least i vaguely remember that i used this years ago, maybe it was non-standard though
<glozow> Thanks all for coming! Sorry if you were expecting us to stick to the questions, I personally am very happy we didn't. Feel free to ask if you want answers to the questions, I'll be around for a while.
<achow101> pop: the avoid partial spends feature groups together all UTXOs for the same address and treats them as a single UTXO during coin selection. This means that all UTXOs for the same address are all spent at the same time.
<achow101> pop: no, private keys are never revealed. This is just for privacy. It means that reused addresses won't be mixed with other transactions and thus reveal what utxos are belong to the same person