The PR branch HEAD was b8ea88926c at the time of this review club meeting.
Notes
init.cpp
is responsible for the startup sequence for bitcoind and bitcoin-qt. It
parses and sanitizes configuration options, initializes the various
components, starts threads, and so on.
For historic reasons, the initialization code is complex and difficult to work
with - the file is very long (1861 lines prior to this PR), the individual
functions within the file are long (for example,
AppInitMain
is 750 lines) and thereās global state being read and written.
The code is also brittle - since various components and global state
depend on each other, even innocuous-looking changes that re-order events
in minor ways can cause subtle changes in behaviour. This has been the
source of several bugs in the past.
To make matters worse, the code in init.cpp is difficult to unit test.
Since itās tightly coupled with lots of other parts of the codebase, itās
difficult to isolate and test thoroughly. The unit tests have their own
code for initializing their test setup, so the code is not shared between
bitcoind initialization code and unit test initialization.
PR 23280 is an attempt to
unify some of the initialization logic for the validation state. If that
code is shared between bitcoind and the unit tests, then itāll be better
tested and less likely to be broken by future changes.
This PR distinguishes between āsoft failuresā and āhard failuresā when
loading the chainstate. What is the difference between those two?
The first commit (init: Extract chainstate loading sequence) touches a
large number of lines and extracts logic from AppInitMain() into a
dedicated LoadChainstateSequence() function. How did you satisfy
yourself that this commit didnāt introduce any behaviour changes?
Commit init/chainstate: Decouple from stringy errors changes the return
type of LoadChainstateSequence() from a boolean to an (optional) enum.
How did you satisfy yourself that the commit didnāt introduce any
behaviour changes?
In the same commit, why are the failed_chainstate_init and
failed_verification local variables removed? What were they used for
previously? Why are they no longer required?
In the same commit, fLoaded is no longer passed into
LoadChainstateSequence() (as a reference), and is now set in the outer
AppInitMain() function. Why is this possible?
In commit init/chainstate: Remove do/while loop, the do/while loop is
removed from LoadChainstateSequence(). Why is this possible? What was
it being used for prior to this PR?
Commit init/chainstate: Decouple from concept of uiInterface adds this
code to the call site for LoadChainstateSequence():
What is that syntax? What is being passed in to the function?
Commit validation: Call NotifyBlockTip in CCS::LoadChainTip removes a
call to RPCNotifyBlockChange() and adds a call to
uiInterface.NotifyBlockTip(). How are those two functions related?
<jnewbery> stickies-v: I agree. AppInitMain() is a beast, so pulling out logical units of the code seem like an improvement, especially if it means improving the test quality.
<jnewbery> ok, let's get into the code. This PR distinguishes between āsoft failuresā and āhard failuresā when loading the chainstate. What is the difference between those two?
<stickies-v> I think to add, from my understanding the diff between soft/hard used to be a boolean true/false response, whereas nowadays this difference is not as binary anymore since we return a ChainstateLoadingError?
<lightlike> the distinction is a bit unclear though imo. "ERROR_BLOCK_FROM_FUTURE" is treated as a soft failure, but won't be healed by reindexing, instead of fixing your time
<michaelfolkson> A reindexing of the blockchain. So we are talking using block data we already have available to us to create a UTXO set....? I'm not actually sure
<jnewbery> lightlike: good point. Perhaps a better definition is that if it's a hard failure then we can never rebuild our chain state with the data in our datadir. If it's a soft failure, we may be able to recover with the data in our datadir. Is that more accurate?
<jnewbery> next question: The first commit (init: Extract chainstate loading sequence) touches a large number of lines and extracts logic from AppInitMain() into a dedicated LoadChainstateSequence() function. How did you satisfy yourself that this commit didnāt introduce any behaviour changes?
<jnewbery> seaona: take a look at the first commit (init: Extract chainstate loading sequence), and the commit log (Currently, LoadChainstateSequence returns a bool which: - if false - Definitely a "Hard failure"). There's only one place in the function that returns false, and it's the "If the loaded chain has a wrong genesis, bail out immediately" part
<jnewbery> lightlike: I agree. `--color-moved=dimmed_zebra --color-moved-ws=allow-indentation-change` are very helpful flags for reviewing these mostly move-only commits. Kudos to dongcarl for including that review tip in the commit log.
<jnewbery> I'd say that in the absence of test coverage, we need to be extra careful in our code review. Here, the commit is moving code into a separate function. We can satisfy ourselves that it's basically move only by using that `git diff` command, and the arguments are almost all (mutable) references, so there shouldn't be any difference between changing those variables inside the function from
<lightlike> Kaizen_Kintsugi: you do "git diff <old commit hash> <new commit hash> --color-moved=dimmed_zebra --color-moved-ws=allow-indentation-change" and everything that is just moved is marked in gray, so you see the differences more easily.
<stickies-v> lightlike Kaizen_Kintsugi you can also use a slightly shorter command with git show: "git show <commit hash> --color-moved=dimmed_zebra --color-moved-ws=allow-indentation-change"
<jnewbery> if you're reviewing refactors, you need to be on the lookout for even tiny changes like that if you want to satisfy yourself that there aren't behaviour changes
<jnewbery> glozow: very nice! Yes, this is the captures for the lambda function. Previously it wasn't capturing anything, and now it's capturing local variables by reference
<pg156> jnewbery: "be on the lookout for even tiny changes" means looking at and reasoning about the code side by side? anything else? (as you also said "this area of the code is not well covered by unit tests")
<jnewbery> yes, these things called uiInterface are actually different things! In the code before the commit, uiInterface is a global variable. After the commit, uiInterface is a local variable in the new LoadChainstateSequence() function
<jnewbery> right, so this function has a local variable called uiInterface, which is a reference to the global variable uiInterface. Having multiple variables named the same way is called "shadowing" and it's what Cory is talking about here: https://github.com/bitcoin/bitcoin/pull/23280#discussion_r743923908.
<jnewbery> an alternative approach would be to have a commit before this one that creates a reference to uiInterface in AppInitMain() called ui_interface or whatever, and uses that in the code that's about to be moved, then moves it with that name
<jnewbery> Commit init/chainstate: Decouple from stringy errors changes the return type of LoadChainstateSequence() from a boolean to an (optional) enum. How did you satisfy yourself that the commit didnāt introduce any behaviour changes?
<pg156> The commit log addresses the question. I haven't completed this. But as a reviewer, I feel I need to enumerate all possible outcomes where `LoadChainstateSequence` is called, to compare before and after.
<jnewbery> pg156: indeed, you need to carefully look at all of the places where we exit due to an error, and verify that we're doing the same thing before and after
<lightlike> actually an InitError seemed to have gotten lost in that commit, but there was already a review comment on that (as for everything else I noticed during review)
<jnewbery> I think the key observation is that the break statements become return statements that return error codes, and then the caller handles those error codes
<jnewbery> Kaizen_Kintsugi: exactly! The do-while is just there so it's possible to break out of that block of code. It's a very ugly way of replicating what can be done with a function and return statements
<jnewbery> In the same commit, why are the failed_chainstate_init and failed_verification local variables removed? What were they used for previously? Why are they no longer required?
<pg156> - They were used previously to break out of the chainstate activation do-while loop to return true (more specifically indicating a soft failure, as it is neither "Success" nor "Shutdown requested").
<pg156> - before the change: the effect of calling `LoadChainstateSequence` is captured by the function return value together with mutable function parameters (e.g. `fLoaded`, `strLoadError`). E.g. when db fails to upgrade, `failed_chainstate_init` is necessary to break out of the do-while loop, and set return value to true outside the loop.
<pg156> - after the change, e.g. when db fails to upgrade, the function can directly return an error status value `ERROR_CHAINSTATE_UPGRADE_FAILED` capturing the effect.