The PR branch HEAD was d29f2fd at the time of this review club meeting.
Notes
The libbitcoinkernel project is an effort to
decouple Bitcoin Core’s consensus engine from other non-consensus modules (such as the various
indices) in the codebase. Previously, we have covered libbitcoinkernel-related PRs #24410
and #20158.
ArgsManager
is a Bitcoin Core-specific data structure responsible for handling configuration options. It
returns help strings for configuration options, parses user input, and stores the configured values.
While this functionality is helpful for users that want to customize their nodes, not every node
participating in the network needs it.
Script (specifically, signature) verification is the most computationally expensive part of
validating a transaction. The validation caches,
g_signatureCache and
g_scriptExecutionCache,
speed up block validation performance by caching successful signatures and scripts, respectively,
validated when transactions were submitted to the node’s mempool.
The debug-only -maxsigcachesize option limits the aggregate memory usage of both caches.
The caches are cuckoo caches, introduced in
PR #8895. Also see
Cuckoo hash tables if you are interested.
You do not need to fully understand cuckoo cache and validation cache usage to review this PR,
but feel free to take the scenic route if it piques your interest. Also, looking at the
implementation of CheckInputScripts() may help clarify what is going on in
txvalidationcache_tests.
PR #25527 is one of a series of PRs that
decouples ArgsManager from kernel-related code (see also
#25290 and
#25487). Specifically, it removes uses of
ArgsManager when initializing the signature and script caches. It also removes a limit on
signature cache sizes (MAX_MAX_SIG_CACHE_SIZE), patches a uint32 overflow, and other cleanups.
In your own words, what does the ArgsManager do? Why or why not should it belong in src/kernel
vs src/node?
In your own words, what are the validation caches? Why would they belong in src/kernel vs src/node?
The first
commit
removes the call to InitScriptExecutionCache() in txvalidationcache_tests.cpp/checkinputs_test.
How can you verify that this call was unnecessary?
(Hint a) What do setting cacheSigStore and cacheFullScriptStore do in CheckInputScripts()?
(Hint b) What needs to be called/initialized for the test to use the script cache?
(Hint c) The checkinputs_test test case uses Dersig100Setup. How can you check if this has a script cache setup?
This sixth
commit
changes the type of signature_cache_bytes and script_execution_cache_bytes from int64_t to
size_t. What is the difference between int64_t, uint64_t, and size_t, and why should
a size_t hold these values?
What is the maximum value for ValidationCacheSizes::signature_cache_bytes after this
diff?
Is this a behavior change? Why or why not is the code correct?
Quiz: A mainnet node receives a new block at height 710,000 containing a few transactions. Which
of the following transactions could hit its script cache, assuming nothing has been evicted?
(Note: taproot activated at height 709,632)
(A) the coinbase transaction
(B) a transaction the node has never seen before
(C) a transaction with the same txid but different witness as a transaction in its mempool
(D) a transaction that was in its mempool but replaced by another
(E) a transaction with no taproot inputs or outputs accepted to the mempool at height 709,000
(F) All of the above
(G) None of the above
<glozow> This PR has a pretty small diff and doesn't change behavior, but there are a lot of details to pay attention to. You shouldn't be ACKing a PR just because you ran the tests and they passed :) The goal today is to "learn how to review better" by zooming in and asking ourselves specific questions to convince ourselves this PR is correct
<glozow> This review club is intended for beginners, so always feel free to ask questions :) don't worry about questions being too basic or off topic, the goal is to learn.
<stickies-v> this PR removes the direct dependency in consensus code on the global gArgs ArgsManager that (amongst others) allows the user to define the max cache size for script and sig verification
<larryruane> Amirreza: (if I could take a try at answering) It's very important for the nodes on the bitcoin network to all agree on whether a particular block or a particular transaction is VALID or not ... so we don't get a chain-split! So consensus code is any code that contributes to that determination (valid or not), hope that helps
<stickies-v> Amirreza: I wouldn't dare give a comprehensive answer, but my take is - code that's responsible for validating the consensus rules, e.g. transaction and block validation rules. The code that, if different clients have different logic, would cause the network to split
<hernanmarino> It s a data structure for handling command line arguments. If the objective is to decouple funtionality and isolate it in a core consensus modules, command line should definitely be far from it.
<larryruane> If src/kernel looks at config settings, and since nodes can easily have different config settings, we thereby risk consensus failure (disagreement)
<larryruane> glozow: "validation caches ... Why belong in src/kernel?" I'm unclear on that. A "cache", in normal usage, means just a performance optimization (the more stuff you cache, the quicker you can get to it) ... so why are those caches related to consensus?
<stickies-v> Amirreza: others can answer this better, but at the moment the consensus code is in quite a few places, which is kind of the point of libbitcoinkernel (and libbitcoinconsensus). validation.cpp has a lot of the consensus code
<larryruane> Amazing, so bitcoin core is designed so that, IF there's a (completely unknown!) bug, we can guard against it causing a disaster by careful design
<glozow> Let's start looking at the commits. The first commit removes the call to `InitScriptExecutionCache()` in checkinputs_test. How can you verify that this call was unnecessary?
<stickies-v> Amirreza: no I don't think so, consensus-critical is commonly referred to something that's critical because it's consensus. Anything that can affect consensus is critical
<hernanmarino> glozow: Because they were already initialized. Dersig100Setup indirectly calls code from BasicTestingSetup which Initializes the cache in an assert
<glozow> Amirreza: not sure about semantics, but we're basically referring to "consensus" as consensus rules themselves, e.g. signature verification. and we're referring to signature caching as "consensus-critical" functionality because, if we have an invalid signature cached, our node is no longer enforcing consensus rules.
<larryruane> glozow: ".. unnecessary?" -- okay, this is kind of cheating, but I would run the test in the debugger, set a BP on that function, and verify that it gets called before the call that's being removed
<glozow> The second commit returns approximate total size from `InitSignatureCache()` and `InitScriptExecutionCache`. It also adds the `[[nodiscard]]` attribute to both functions. What does `[[nodiscard]]` do, and why or why not is it appropriate to add here?
<stickies-v> the compiler will raise a warning if a nodiscard type/return value is ignored/not used. it is appropriate to add here because previously `InitS*Cache()` returned `void`, but now it returns a `bool` that we *need* to check to ensure the cache was actually initialized.
<glozow> adam2k: yes, remember this is checked at compile time. It's to help make sure the developer doesn't forget to add a check that the initialization succeeded. Though not foolproof of course
<stickies-v> hmm glozow from how you phrase it, maybe the [[nodiscard]] is only really necessary after the uint32 overflow commit, because then InitScrtipExecutionCache can actually return false too?
<glozow> well it made sense when `-maxsigcachesize` was changed from number of entries to number of mebibytes. At the time, it would have made sense in case somebody's test had it set to 100,000 entries, and this change meant the config = 100GiB sigcache.
<glozow> adam2k: yeah, this is a debug-only option, so such a mistake wouldn't be too bad. also it's been 7 years so it's unlikely that somebody hasn't updated their settings yet
<stickies-v> (and not re my previous comment: this `MAX_MAX_SIG_CACHE_SIZE` did not actually prevent the overflow, but a lower value for it could have, I was just talking about a max value in general)
<larryruane> so if cache is now based on memory, does that mean that different hardware platforms or different compilers could result in nodes caching different numbers of entries? didn't we want to avoid that?
<stickies-v> larryruane: but I think it's unreasonable to ask users how many elements they want to cache, because they'd have no idea how much memory is required for that?
<glozow> Ok well that was lucky for us that Carl wrote the explanation for `MAX_MAX_SIG_CACHE_SIZE` and linked to the code in the commit message! If he hadn't done this, what tools could you have used for "code archeology" to understand the background of why some code exists?
<stickies-v> and you can also search https://github.com/bitcoin/bitcoin/pulls with `commit:<commit_sha>` to show the PR that contained the commit, and read the discussion there
<dongcarlimposter> I used `git blame` for a long time, but GitHub's blame is really good now, and very convenient, there's a "View blame prior to this change" button that removes the need to blame over and over again
<stickies-v> michaelfolkson: if the cache incorrectly says that a certain sig/script is in there, then we're just going to assume it's valid and not reevaluate it again
<sipa> michaelfolkson: We're not trying to reason about what a bug would look like, or even what kind of bugs are likely. It's just a fact that if the cache doesn't function correctly, it may affect consensus.
<larryruane> the problem with `git blame` is it only shows the most recent change to each line that's current in the file (I think that's why people end up doing repeated git blames)
<glozow> Before we run out of time. The last commit changes the type of signature_cache_bytes and script_execution_cache_bytes from int64_t to size_t. What is the difference between int64_t, uint64_t, and size_t, and why should a size_t hold these values?
<michaelfolkson> sipa: stickies-v example of cache misreporting what is in the cache is a possible example of a cache bug? I'd have thought something stored in the cache wouldn't change while in the cache
<larryruane> A mainnet node receives a new block at height 710,000 containing a few transactions. Which of the following transactions could hit its script cache, assuming nothing has been evicted? (Note: taproot activated at height 709,632)
<michaelfolkson> sipa: Sorry to belabor point, I am reading what you've said. I'm just trying to think how one can analyze these things when one isn't sure if something could affect consensus or not
<glozow> It's not E, the script cache includes the script verification flags. after 709632, you'd look for an entry with SCRIPT_VERIFY_TAPROOT and wouldn't find one
<sipa> michaelfolkson: The only criterion is: the consensus code depends on the cache. Thus if the cache reports the wrong thing, conensus will be affected.
<michaelfolkson> Any calls/interactions between the wallet/GUI and libbitcoinkernel would also be considered consensus right? In theory wallet code could be written maliciously to change some consensus state in libbitcoinkernel.
<sipa> It's definitely the case that all of it could, in theory, if permitted to run arbitrary malicious code, affect consensus outcome. But it's also true that some things are orders of magnitude more risky in that regard than others.
<sipa> So generally the idea is to look at code/dataflow dependencies: does consensus either directly or indirectly change behavior based on values returned from the code under consideration.
<sipa> While consensus does send signals which trigger callbacks in the wallet code (e.g. to inform the wallet about new transactions that arrived), those don't return anything that validation code then uses. So we say the wallet is not consensus critical.
<sipa> If the wallet were actually malicious, it could just go modify values in memory related to consensus data structures... but that would need extremely suspicious code.
<michaelfolkson> I guess I was putting the cache example in the very low risk bucket (needs suspicious code). But consensus does need caching and it doesn't need a wallet. So it is more of a consensus dependency argument rather than a risk argument
<sipa> Like, just literally inserting a single "!" somewhere in the cache code in its lookup or insertion code, would mean consensus is immediately affected. That's very easy to overlook.
<sipa> Say, if the wallet contained a call to CBlockIndex::RaiseValidity, e.g., that would be extremely worrisome - the wallet shouldn't be doing something like that.