The PR branch HEAD was 09daa42b at the time of this review club meeting.
Notes
This is a draft PR which removes the g_chainman global variable and
::Chain{,state}Active() functions in favor of having callers either use or
obtain local references to the relevant consensus-related object
(ChainstateManager, CChainState, CChain, BlockManager, etc.).
Prior to this PR, there were numerous references to the g_chainman global
variable and ::Chain{,state}Active() functions, some of them even in
contexts where a local ChainstateManager reference existed.
This PR is a first step towards encapsulating our consensus engine, which
allows for:
Clearer visibility into what currently lies in consensus codepaths and
what depends on our consensus engine
Coalescing duplicate consensus initialization codepaths, mitigating
against bugs that arise out of test/non-test initialization
inconsistencies
How did the PRs linked in the historical context notes change the
organization of validation.cpp? What kind of state/logic now belongs in
each consensus-related object (ChainstateManager, CChainState,
BlockManager)?
Where and how are the aforementioned consensus-related objects initialized?
What are the advantages and disadvantages of using global variables and
functions?
Why are the first few fix commits in the branch necessary? Why does the
previous code work prior to de-globalizing g_chainman and why doesn’t it
work now without the fix?
<dongcarl> Next Q: How did the PRs linked in the historical context notes change the organization of validation.cpp? What kind of state/logic now belongs in each consensus-related object (ChainstateManager, CChainState, BlockManager)?
<troygiorshev> jamesob: yeah I think! Everything to do with, say, connecting blocks is part of CChainState? Might be CChainstate through ChainstateManager
<jamesob> at the moment, there's only one chainstate running around. But eventually there may be multiple. In any case, at the moment ChainstateManager is responsible for constructing chainstates, which involves e.g. injecting a BlockManager instance
<dongcarl> elle michaelfolkson troygiorshev: True as well. A very rough approximation I have in my head: BlockManager = the blockchain/metadata, CChainState = everything related to current tip (UTXO, etc)
<jamesob> michaelfolkson: in concept, you could have n chainstates and validate n separate regions of the chain historically, simultaneously (maybe with something like utreexo) but that's way off
<jamesob> troygiorshev: if I had to guess, probably none - it would be some functional layer that accesses chainstate objects through the chainmanager... but defer to dongcarl there
<dongcarl> troygiorshev: I don't like to speculate about eventual APIs, because we're sooooo far from that right now. BUT I think a good intermediary goal is to encapsulate ChainstateManager and its dependencies first!
<dongcarl> Alright, feel free to keep discussing but here's the next topic: Where and how are the aforementioned consensus-related objects initialized?
<michaelfolkson> washroom: The start of the statement is right. But the assumeutxo project is validating from genesis in the background while assuming the tip is ok
<@jnewbery> Construction means something very specific in C++. It's the operations that get called when a new object gets instantiated. Those operations include the initialization list, the constructor functions of any base classes and the constructor function of the class itself
<@jnewbery> often, initialization happens during construction, but if initialization requires some context, then obviously we can't initialize the object during construction if it's a global
<dongcarl> Exactly what jnewbery said! If a variable is a global, and it needs some context to get initialized to steady state, then we need a two-step initialization. See: InitializeChainstate
<@jnewbery> sipa: does this: https://stackoverflow.com/a/1271692/933705 only apply to objects with internal linkage? ie if they're in the global namespace then the have to be constructed before main?
<sipa> jnewbery: c++11 added a guarantee that local static variables are only initialized once, even when the function is called simultaneously from multiple threads for the first time
<dongcarl> I'm not sure there's a reason why it's done the way it is right now. However, what I think would be a nice goal to have: an initialization codepath that's used by all binaries with the right toggles.
<@jnewbery> Take a look at the logic inside init.cpp. A lot of the code there is actually initializing the individual sub-components. We need to do the same setup inside our unit tests, otherwise we're not actually testing the code paths that run in production
<@jnewbery> I think it'd be nice if as much of the initialization of the subcomponents was done within those subcomponents, perhaps inside the constructor function. That way the unit tests could just instantiate that subcomponent and know that it's testing what will actually run on the live system
<MarcoFalke> advantage of globals would be less verbosity in code and if there can only be one instance of an object, passing it around explicitly doesn't help code clarity
<elle> dongcarl: advantages: easy access from all over. disadvantages: easy access from all over. Plus: hard to mock for tests. Also: not always known if the object is yet initialised (in its steady state)
<dongcarl> dappdever: Since initialization is mostly a one-and-done affair, the performance impact is not something we'd be too concerned about. I also doubt there's any performance gains to be made from being global.
<dongcarl> Why are the first few fix commits in the branch necessary? Why does the previous code work prior to de-globalizing g_chainman and why doesn’t it work now without the fix?
<nehan> jnewbery: one could think of it as unit testing the initialization function. i'm not sure it's in any way bad to test the same code path used in actual use, if possible.
<@jnewbery> nehan: yes, a unit test that tests initialization is fine. But you shouldn't need to initialize the whole system if you just want to test one or a few subcomponent
<dappdever> @dongcarl are the tests using the same instance of m_node across tests, whereas before each tests was creating the chain from a new node instance?
<dongcarl> dappdever: Close! It does have something to do with m_node. It's mostly because after this PR, the code will now reach for the chainman inside the local context instead of g_chainman. So it's important that the local node context contains the chainman we want to work with instead of not having one!