User-Space, Statically Defined Tracing (USDT) allows observability into runtime internals at
statically defined tracepoints. We have discussed USDTs at previous PR Review Club meetings,
#22006 and #23724.
Coin Selection is the process of selecting UTXOs (“coins”) from a wallet’s UTXO pool in order to
fund a transaction’s payment(s). We have discussed coin selection at preview PR Review Club
meetings, including #22009, #17526 and #17331.
PR #24644 adds tracepoints to the wallet’s coin
selection code.
<svav> Once a tracepoint is reached, it can pass data about process internals to a userspace script for further processing. This is great for observability and allows for debugging, testing, and monitoring.
<larryruane> The other difference I think of with tracepoints is that, with logging, you can later process the log file (to summarize what's in it), but there could be a huge amount of disk space consumed ... with tracepoints, you can sort of "compress" the information on the fly
<svav> a1ph4byte Coin Selection - The process of selecting UTXOs (“coins”) from a wallet’s UTXO pool in order to fund a transaction’s payment(s).
<b10c> we can still parse log messages, but the contents might change over time and we might end up e.g. printing a hash as hex and then parsing it back in which isn't efficient
<sipa> I think another important difference is that logging is an actual action, where tracepoints are just hooks that an external process can plug into. A tracepoint on itself does nothing unless something uses it.
<larryruane> i have a really basic question, when a thread hits an active tracepoint, does the thread suspend until the data is received by the tracing script? Or is there a memory queue of tracing events so the thread can continue asynchronously?
<b10c> larryruane: in assembly, the tracepoint is a literal NOP (no operation). If we tell the kernel to hook into the tracepoint, it executes a small eBPF bytecode program e.g. adding data to a eBPF map where it can be read asynchronously from a userspace program
<Murch> glozow: The tracepoints collect information on the algorithms that produced the input selection, the total amount selected, the waste score, some details on fees, change output position, and whether the solution is `avoid_partial_unspents` compliant
<sipa> b10c: But execution of bitcoind is suspect while the kernel executes the eBPF program - it's the reading out of the results that is done asynchronously?
<Murch> While it's easy to evaluate a coin selection algorithm on a single situation (UTXO pool and selection target), the overall problem we're interested in is the emergent behavior of various algorithms over longer scenarios of payment sequences and feerates.
<b10c> larryruane: yes, I think of it similar to breakpoints in a debugger on the NOP, the positions of these NOPs are written into a ELF note of the bitcoind binary
<theStack> PaperSword: i'd assume that if we compile without eBPF support, there are not even NOPs, because the TRACE... defines are replaced by empty strings
<Murch> The tracepoints allow us to observe how the UTXO pool evolves over time and to assess the overall fee expenditures as well as the individual outcomes of each payment
<achow101> pop: the tracepoints let us do simulations. prior to tracepoints, these simulations required an aditional patch which adds a couple of globals and RPCs that let us measure some things. but these could not be upstreamed so had to be maintained separately
<pop> Murch: So without net tracepoints to track coinselection metrics over longer periods and map the interactions between multiple algorithms, there is no way to evaluate the emergent behavior of coin selection?
<sipa> pop: Sure there is, but you couldn't do it with an unmodified bitcoind. Tracepoints provide a way for profiling software to hook into bitcoind, unmodified.
<b10c> PaperSword: when compiled without tracing support, there's nothing tracing related in the code. The TRACEx makros are empty if tracing is disabled
<Murch> pop: Previously we had either created separate simulation frameworks or modified a copy of Bitcoin Core to add the corresponding logging. Having the tracepoints allows us to keep the log generation and processing in a separate project which makes it easier to apply the tracing to many different states of the codebase
<popracepoints allows us to keep the log generation and processing in a separate project which makes it easier to apply the tracing to many different states of the codebase
<b10c> PaperSword: this guidance is relevant for people who run a bitcoind with tracing support, but don't hook into the tracepoints. Release builds have tracepoint support, so we assume that's the case for a majority of our users.
<sipa> Ok, sure, it takes up source code space, but any alternative that doesn't have that will just not have any logging/tracing/observing of the relevant metric at all.
<theStack> b10c: "Release builds have tracepoints support" oh that's interesting, i would have guessed that they are only useful for developers and are disabled for releases
<larryruane> so do we ask important bitcoind users, such as exchanges, to enable tracepoints to learn about real-world behavior? (and send us results) Or is the intention that tracepoints are only for us dev types?
<PaperSword> glozow: I was already able to take a look at the metric, but spent a lot of time trying to get the tests in this PR to pass on RHEL linux. I was unable to successfuly run the tests from this PR.
<b10c> theStack: It makes sense to have the tracepoints in release builds to allow people to trace their production setups. Switching binaries is often not something you want to do if your trying to debug a problem.
<Murch> larryruane: We currently have three datasets, an online gambling service'Âż payment sequence, a merchant's inbound payments, and another services payments. Still looking for something representative of individual users
<larryruane> some other projects (storage systems) have this "phone home" idea, but it was always a huge privacy concern ... I can understand why we can't / shouldn't do anything like that!
<Murch> Yeah, maybe to make it clear, this is not a telemetry function, it's just for users to hook into stuff running on their own computer. Our simulation scenarios are merely lists of the incoming and outgoing amounts they've processed without additional information (and slightly fuzzified amounts for privacy)
<pop> achow101: it's hard to imagine a dataset of transactions that wouldn't be personally identifiable, especially if those transactions have made it into blocks
<theStack> could the tracepoints probably also serve as a replacement to the zmq notifications, on the long term? (don't know too much on either of the two areas)
<achow101> a1ph4byte: we want to have our automatic coin selection behave in a smart way, e.g. reduce fees when feerates are high, consolidate more when feerates are low. a simplistic algorithm can result in unexpected or undesired behavior
<Murch> pop: We run the data as a benchmark against different coin selection improvements to compare which perform better on the scenario. So the exact amounts aren't that important
<glozow> a1ph4byte: you might find useful information in these notes https://bitcoincore.reviews/22009. FIFO would be expensive and leak information about the wallet, etc.
<achow101> a1ph4byte: there are also privacy considerations, and maintaining a usable utxo pool for the wallet (e.g. not producing sand (near-dust) outputs)
<pop> achow101: Murch: So the important thing is to simply have data that has a close relationship with actual usage. The specific relationship between payment and feerate isn't critical at this point in coin selection algorithm development?
<theStack> PaperSword: indeed, needing to run as root seems to be quite of a drawback (though i think someone mentioned at #bitcoin-core-dev recently that laanwj managed to get them to run without root... maybe someone knows more details)
<b10c> theStack: currently we only make data available via tracepoints that's already present in the function they are called in. that means, transactions and blocks might need additional serialization which could be "expensive"
<Murch> He suggested that we might introduce additional metrics and use scores from all of them to pick the best selectionresult rather than just the wastemetric
<glozow> Yeah, you might want a "privacy score" encapsulating whether you have a change output, how many inputs you're pulling together of what outputtypes, etc
<glozow> PaperSword: yeah, C-strings don't know their own length so you just need to look for \0 when parsing. I don't know the answer to "why does it need to be a C-string," I assumed we have to use primitive data types or something
<Murch> "Previously we had either created separate simulation frameworks or modified a copy of Bitcoin Core to add the corresponding logging. Having the tracepoints allows us to keep the log generation and processing in a separate project which makes it easier to apply the tracing to many different states of the codebase"
<sipa> sizeof(char[]) gives the length of the array and &(char[]) returns a pointer to the first element of the array (so, itself). For all other purposes, a char[] just degenerates into a char*
<theStack> sipa: there are fixed size array parameters possible though, isn't it? like "void foo(int bla[5])"... at least i vaguely remember that i used this years ago, maybe it was non-standard though
<glozow> Thanks all for coming! Sorry if you were expecting us to stick to the questions, I personally am very happy we didn't. Feel free to ask if you want answers to the questions, I'll be around for a while.
<pop> Would anyone be willing to break this one down? 6. What’s the difference between the two calls to CreateTransactionInternal? What does it mean to Avoid Partial Spends?
<achow101> pop: the avoid partial spends feature groups together all UTXOs for the same address and treats them as a single UTXO during coin selection. This means that all UTXOs for the same address are all spent at the same time.
<theStack> sipa: i was hoping that the compiler would at least warn if you'd example access c[5] in func, but apparently it doesn't; so it really treats it just as pointer and that's it
<achow101> if avoid partial spends is off (it is off by default), we would do a CreateTransactionInternal without it, then do it again with APS on. Then we choose the "better" of the two solutions
<pop> achow101: Is this because, once you spend a single UTXO from the address you will have revealed the private key for that address, rendering all of the associated UTXOs spendable by anyone?
<achow101> pop: no, private keys are never revealed. This is just for privacy. It means that reused addresses won't be mixed with other transactions and thus reveal what utxos are belong to the same person