Stricter internal handling of invalid blocks (validation)

https://github.com/bitcoin/bitcoin/pull/31405

Host: stickies-v  -  PR author: mzumsande

The PR branch HEAD was 4ba2e480ffa0b77113953bee4ff5c9349e277e7e at the time of this review club meeting.

Notes

  • The BlockManager::m_block_index is a map which is used to keep track of which block headers exist, how they interconnect, and where on disk the block data (if any) is stored 1. Generally speaking, it is updated whenever a new valid header with sufficient Proof-of-Work (PoW) is received. It contains entries for blocks in the current most-PoW chain, alternative chains, and even invalid blocks.

  • The CBlockIndex objects in this map can be considered the nodes in a tree shaped structure with the genesis block at its root. By definition, each block can only point to a single predecessor, but multiple blocks can point to the same predecessor. Of course, in a single chain, this tree is pruned so that each block will never have more than one block pointing to it. The entire tree structure is kept to enable efficiently handling chain reorgs.

  • Block validation is a process that consists of multiple steps, gradually bumping up the CBlockIndex’s nStatus from its default-initialized BLOCK_VALID_UNKNOWN to BLOCK_VALID_SCRIPTS. The validation flow which is relevant to this PR can be summarised as:
    • When the header is successfully validated with AcceptBlockHeader(), it is added to m_block_index, and generally never removed from it anymore, even if the block fails a subsequent validation step.
    • When the partial block validation in AcceptBlock() succeeds, the block is persisted to disk. This validation is partial, because full validation requires the predecessor block’s data, which the node may not have received yet.
    • As soon as a block header becomes eligible to be part of the fully validated current chain (i.e. it is part of the most-PoW chain, and all data for the block’s predecessors is available), ActivateBestChain() will attempt to connect the block. If this final validation step succeeds, the block is connected and its nStatus is raised to the ultimate BLOCK_VALID_SCRIPTS.
  • CBlockIndex holds a pprev pointer to its predecessor which makes it trivial to iterate backwards (towards the genesis block) over the block tree. Iterating forwards (towards the chain tip) is not possible, because a block can have multiple successors pointing to it. The lack of forward iteration makes operations such as finding a new best block header expensive, because we have to iterate over the entire m_block_index block tree again.

  • ChainstateManager::m_best_header and ActiveChain.Tip() are both CBlockIndex pointers. During normal operation, they’ll often converge to the same block with the highest PoW. However, it is crucial to distinguish their meaning and use. ActiveChain.Tip() returns the CBlockIndex* of the fully validated and connected block that is the tip of the currently active chain. m_best_header points to the header with the most PoW that the node knows about. It represents a node’s view of what the most-PoW chain might look like, but it is possible that validation won’t allow it to progress there. As such, m_best_header is just a hint. The distinction is especially visible during IBD, when ActiveChain().Tip() progresses towards m_best_header, but can only be equal to it when IBD is finished.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. Which purpose(s) does ChainstateManager::m_best_header serve?

  3. Prior to this PR, which of these statements are true, if any?
    • A) a CBlockIndex with an INVALID predecessor will ALWAYS have a BLOCK_FAILED_CHILD nStatus
    • B) a CBlockIndex with only VALID predecessors will NEVER have a BLOCK_FAILED_CHILD nStatus
  4. One of the goals of this PR is to ensure m_best_header, and the nStatus of successors of an invalid block are always correctly set. Which functions are directly responsible for updating these values?

  5. For which scenario(s), if any, does commit validation: call InvalidChainfound also from AcceptBlock introduce behaviour change?

  6. Most of the logic in commit validation: in invalidateblock, calculate m_best_header right away implements finding the new best header. What prevents us from just using RecalculateBestHeader() here?

  7. How many times does Chainstate::InvalidateBlock() trigger the m_best_header recalculation, 1) before this PR and 2) after this PR?

  8. In validation: in invalidateblock, mark children as invalid right awaycand_invalid_descendants is implemented as a std::multimap<const CBlockIndex*, CBlockIndex*>. Can you think of any other type(s) that would work well in this context?

  9. Would we still need the cand_invalid_descendants cache if we were able to iterate forwards (i.e. away from the genesis block) over the block tree? What would be the pros and cons of such an approach, compared to the one taken in this PR?

Meeting Log

  117:00 <stickies-v> #startmeeting
  217:00 <marcofleon> hi
  317:00 <stringintech> Hi!
  417:00 <lightlike> hi
  517:00 <Devgitotox> hi
  617:00 <strat___> hi
  717:00 <monlovesmango> hello
  817:01 <stickies-v> hi everyone, thanks for a lot for joining in on this validation review club
  917:01 <stickies-v> and great to have author lightlike here too!
 1017:01 <marcofleon> woo!
 1117:02 <stickies-v> today we're looking at #31405, with notes and questions available at https://bitcoincore.reviews/31405
 1217:02 <stickies-v> anyone joining us for the first time today? even if you're just lurking, feel free to say hi!
 1317:02 <Devgitotox> me
 1417:02 <Devgitotox> first time
 1517:02 <davesoma> me too
 1617:02 <monlovesmango> welcome :)
 1717:03 <stickies-v> nice one, glad you found your way Devgitotox and davesoma . the conversation is async, so feel free to pop your questions at any time!
 1817:03 <stickies-v> who got the chance to review the PR or read the notes? (y/n)
 1917:03 <Emc99> I did
 2017:04 <strat___> yes
 2117:04 <marcofleon> yes, light review
 2217:04 <Jelle51> I didnt. It's my first time around
 2317:04 <monlovesmango> read the notes but only lite pr review
 2417:05 <stickies-v> cool, welcome to review club Jelle51 ! feel free to lurk or chime in whenever you want to
 2517:05 <stringintech> I looked at the changes but found out about the notes not soon enough. (Not sure I missed them or they were not uploaded when I looked at the PR)
 2617:05 <Jelle51> Thanks
 2717:05 <stickies-v> strat___ already left a nice ACK on the PR, but for everyone else able to review: would you give it a Concept ACK, Approach ACK, Tested ACK, or NACK? what was your review approach?
 2817:06 <stickies-v> stringintech: that's my bad, I only finished them Monday evening despite our week-in-advance hosting guideline :( sorry!
 2917:07 <marcofleon> Concept ACK I'd say. I think makes sense to have these values be set correctly instead of estimated. The PR descriptions makes a good case for it
 3017:07 <stringintech> :stickies-v No worries! actually it turned out to be kind of better for me cause I had to think harder I guess :))
 3117:07 <glozow> hi
 3217:07 <marcofleon> but personally would have to give more time to understanding about the surrounding validation code in general
 3317:08 <stickies-v> marcofleon: yeah! the rationale seems pretty straightforward, but assessing the actual code changes is a bit more involved
 3417:08 <stickies-v> i think we can definitely talk more about rationale and general understanding in this PR, i too have quite a bit of outstanding questions so hopefully we can all learn together
 3517:09 <stickies-v> let's get started with the first conceptual questions
 3617:09 <stickies-v> 1. Which purpose(s) does `ChainstateManager::m_best_header` serve?
 3717:09 <stringintech> Concept ACK. Almost everything was new to me though so I had to spend some time to grasp the context a bit. I also wrote some small integration test (python test framework) to understand the behaviour in some cases.
 3817:11 <monlovesmango> tracks the header with the most work that may or may not be invalid (since we may not have all the blocks to validate yet)
 3917:11 <marcofleon> m_best_header points to the most PoW chain, even if it hasn't been validated yet
 4017:11 <stickies-v> oh that's a great approach stringintech, perhaps they're worth sharing on the PR too for other people's understanding and/or adding them to the codebase?
 4117:11 <strat___> it's the best block header we've seen so far (not known to be invalid)
 4217:11 <catnip>  @xgclass introduced to manage one or more CChainState objects, which track the state of the blockchain (e.g., the UTXO set and the active chain of blocks
 4317:12 <stringintech> :stickies-v I doubt they could be added to the codebase (not clean/comprehensive enough) but I will definitely check if I can share them as learning material on the PR page.
 4417:13 <stickies-v> monlovesmango: marcofleon strat___ yes! with the nuance of "not known to be invalid" as strat___ correctly pointed out
 4517:14 <stickies-v> and what are some important use cases for this member? i.e., if it's not always correct, why do we even have it in the first place?
 4617:14 <lightlike> I think of it as the header for the block we would like our chain to move to (by downloading missing block data, doing validation etc.) - which may work out or not.
 4717:14 <monlovesmango> IBD is the main one right?
 4817:14 <stickies-v> catnip: i think you might be talking about the ChainstateManager more generally, instead of just its `m_best_header` member?
 4917:15 <stringintech> stickies-v: I guess it helps us to see where we wanna go / what blocks to sync next
 5017:15 <stringintech> for the best/active chain
 5117:15 <strat___> m_best_header is displayed by getblockchaininfo()['bestblockhash']
 5217:15 <strat___> active chain tip is displayed by getbestblockhash
 5317:15 <strat___> (the PR notes explains the difference well but just wanted to mention a way to see the effects!)
 5417:16 <stickies-v> lightlike: yeah, kinda like a north start (that might turn out to have died millions of years ago when the light/block data finally reaches us)
 5517:16 <marcofleon> very poetic
 5617:16 <monlovesmango> good analogy hahah
 5717:17 <stickies-v> stringintech: stringintech: yep! there are some other use cases too
 5817:18 <stickies-v> for example, it also seems like it's used as a proxy for current time, e.g. to determine if a block is old enough to qualify as historic (which in turn affects how much we'll serve it to peers etc)
 5917:19 <stringintech> Interesting!
 6017:19 <lightlike> monlovesmango: You could imagine that (and I think it could've been implemented that way), but I if I remember correctly I think It's actually not being used during IBD to determine which blocks to download next - but it is for related things such as whether to apply -assumevalid (i.e. skip script validation) or not.
 6117:19 <stickies-v> as lightlike pointed out on a previous PR, bluematt also listed out most use cases on a ~6yo PR: https://github.com/bitcoin/bitcoin/pull/16974 - that's pretty interesting to read!
 6217:20 <strat___> where is the proxy for current time part?
 6317:20 <monlovesmango> lightlike: interesting thank you
 6417:21 <stickies-v> lightlike: it is used when we receive unconnecting headers (in `HandleUnconnectingHeaders()`), to determine which headers to request from our peer, but I'm not sure if that function is actually reached in IBD? didn't look into that
 6517:22 <stickies-v> strat___: in `BlockRequestAllowed()` and `ProcessGetBlockData()`, for example
 6617:22 <lightlike> stickies-v: good point, yes, that should be reachable in IBD.
 6717:23 <stickies-v> I'm going to launch the next question already, but as always, feel free to continue the discussion on previous topics if there's anything else you find
 6817:23 <stickies-v> 2. Prior to this PR, which of these statements are true, if any?
 6917:23 <stickies-v> A) a `CBlockIndex` with an INVALID predecessor will ALWAYS have a `BLOCK_FAILED_CHILD` `nStatus`
 7017:24 <stickies-v> B) a `CBlockIndex` with only VALID predecessors will NEVER have a `BLOCK_FAILED_CHILD` `nStatus`
 7117:25 <marcofleon> I wanna say false for A but not entirely sure. I think in AcceptBlockHeader we do walk back along the block tree and mark blocks with `BLOCK_FAILED_CHILD`?
 7217:25 <monlovesmango> I think B is true
 7317:25 <stringintech> A- false (the PR fixes that)
 7417:25 <stringintech> B- true (not sure though; can we change our mind on a previously invalid parent?)
 7517:26 <catnip> Orphans?
 7617:26 <marcofleon> but yeah the PR is addressing A it seems. Where in the code can it be missed?
 7717:26 <catnip> @stickies-v orphans?
 7817:26 <stickies-v> stringintech: there's a "reconsiderblock" RPC which serves as a counterpart to "invalidateblock"
 7917:26 <marcofleon> BLOCK_MUTATED maybe?
 8017:27 <stickies-v> and stringintech: yes you're correct about A), that's one of the goals of this PR
 8117:28 <strat___> A is false - before this PR - only in the active chain we were supposed to mark children as BLOCK_FAILED_CHILD (there is an incorrect traversal
 8217:28 <strat___> which #31835 fixes). other blocks in the block index which descend from BLOCK_FAILED_VALID were not marked as invalid. and this PR fixes that.
 8317:29 <stringintech> :stickies-v Oh. I have not seen "reconsiderblock". But perhaps if we reconsider a parent block (and it becomes valid) its descendant could become valid too.
 8417:29 <lightlike> "invalidateblock" and "reconsiderblock" make these kind of changes more complicated, because they are kind of artificial: Normally, a block that was once completely validated will be valid forever, "invalidateblock" changes that assumption it's basically the user overruling: "this block is invalid because I say so".
 8517:29 <stickies-v> monlovesmango: what about this scenario: B is a child of A, and B is in the active chain when A gets invalidated. Then, the node receives a B' (with higher cumulative work than B), also building on A. Then we make A valid again with `reconsiderblock()`. Does it still hold?
 8617:30 <strat___> :lightlike +1, B is true in world without invalidateblock, reconsiderblock powers. but false if we have the powers :)
 8717:32 <stickies-v> strat___: lightlike: well, actually, I think B holds true even with `reconsiderblock` (and the scenario I outlined), because `reconsiderblock` does nicely walk all the descendants of the reconsider block
 8817:33 <strat___> https://usercontent.irccloud-cdn.com/file/fh2i10iu/pic.png
 8917:34 <stickies-v> so my understanding is A) is clearly false (as per the PR), but i've not found a scenario to falsify B)
 9017:34 <stickies-v> strat___: if you reconsiderblock @height=2, how would that suddenly make it valid if its parent is BLOCK_FAILED_VALID?
 9117:34 <marcofleon> strat___: that's a nice diagram
 9217:34 <strat___> stickies-v: is the scenario in the pic what you described or is it another scenario?
 9317:35 <lightlike> A used to be true in even more cases, my previous PR from last summer https://github.com/bitcoin/bitcoin/pull/30666 already fixed some of them (but not all) - so this PR fixes the remaining ones so that we can add the assumptions to CheckBlockIndex().
 9417:35 <stickies-v> strat___: almost - in my scenario we reconsiderblock @height=1
 9517:35 <monlovesmango> stickies-v: Interesting scenario. if B is immediately invalidated then maybe (but i think this PR is what would guarantee B is immediately invalidated). B' should be reconsidered when reconsidering A..?
 9617:35 <stickies-v> (lightlike: did yo mean "false" in even more cases?)
 9717:35 <strat___> stickies-v: because reconsiderblock traverses all ancestors and descendants in a linear fashion. but not the entire block index.
 9817:35 <lightlike> stickies-v: oops, yes!
 9917:36 <stringintech> :stickies-v Regarding your the last scenario, you mean that invalidated blocks through invalidate RPC may not remain invalid forever? (makes sense to me just wanted to make sure)
10017:36 <monlovesmango> prior to this pr, when A gets invalidated would B also immediately be invalidated?
10117:36 <stickies-v> strat___: mmm, I think it does traverse the entire block index? https://github.com/bitcoin/bitcoin/blob/bd0ee07310c3dcdd08633c69eac330e2e567b235/src/validation.cpp#L3841
10217:37 <strat___> but there's an if guard which checks if you are a descendant and only then you enter the BLOCK_FAILED_VALID clearing part
10317:37 <stickies-v> monlovesmango: yes indeed, and B` would be reconsidered! but the "gotcha" here could be that B is now no longer in the most-pow chain, so it wouldn't get reconnected, so we need to check that there's some other process to update its validity flags
10417:37 <catnip> stickies-v  would that mean a chain reorg?
10517:38 <strat___> similarly there's an if condition few lines below which checks if you're an ancestor and then clears BLOCK_FAILED_MASK
10617:41 <stickies-v> yes sorry you're right we only consider ancestors and descendants, i meant that we traverse the entire block index to find them but that's not super relevant
10717:41 <monlovesmango> yes I suppose so! just to check my understanding, prior to this PR would B get stuck with a valid status (since child blocks aren't invalidated immediately)?
10817:42 <stickies-v> monlovesmango: that's a good question!
10917:43 <stickies-v> my understanding is that yes it would, until it's fixed in https://github.com/bitcoin/bitcoin/pull/31405/commits/e32df45a62e6999b12d035758c9c6bd4994ea682
11017:43 <stickies-v> 3. One of the goals of this PR is to ensure `m_best_header`, and the `nStatus` of successors of an invalid block are always correctly set. Which functions are directly responsible for updating these values?
11117:44 <stickies-v> catnip: would what mean a chain reorg, exactly?
11217:45 <stringintech> SetBlockFailureFlags (for nStatus), RecalculateBestHeader (for best header), and InvalidateBlock (both)? (if we are considering those who change the values directly)
11317:46 <stickies-v> stringintech: exactly! `InvalidateBlock`implements its own best header calculation, which is a good segue into the next question
11417:46 <stickies-v> 5. Most of the logic in commit [validation: in invalidateblock, calculate m_best_header right away](https://github.com/bitcoin-core-review-club/bitcoin/commit/4100495125e9a06b2403f7520fae9f45c3fd9e4c) implements finding the new best header. What prevents us from just using `RecalculateBestHeader()` here?
11517:47 <stringintech> You mean just replacing the PR logic with RecalculateBestHeader or call RecalculateBestHeader after disconnecting all the blocks?
11617:48 <marcofleon> is it because we want to avoid traversing the block index?
11717:49 <marcofleon> so we use `best_header_blocks_by_work` instead?
11817:49 <stickies-v> stringintech: sorry, I mean pretty much replacing ~these lines (https://github.com/bitcoin/bitcoin/pull/31405/commits/4100495125e9a06b2403f7520fae9f45c3fd9e4c#diff-97c3a52bc5fad452d82670a7fd291800bae20c7bc35bb82686c2c0a4ea7b5b98R3777-R3788) with a `RecalculateBestHeader()` call
11917:49 <catnip> m_best_header` points to a valid tip and goes back to genesis?
12017:49 <monlovesmango> is it the lock on cs_main?
12117:50 <stringintech> Ahaa, so as marcofleon: already mentioned, it wouldn't make sense to traverse the whole block index with each block-disconnect
12217:50 <lightlike> stringintech: these are the non-normal cases for invalid block edge cases. I'd say the most important spot where m_best_header is set is AddToBlockIndex() in blockmanager when a new block index is received.
12317:51 <lightlike> (referring to question 3 still)
12417:51 <stickies-v> catnip: there are no guarantees that m_best_header is a valid block, it's a best effort attempt, but yes you can traverse back to genesis from there
12517:52 <stickies-v> monlovesmango: can you elaborate? do you mean one approach is holding a lock while the other isn't?
12617:52 <catnip> race conditions on csmain with lock?
12717:52 <stringintech> lightlike: Oh right. I will look into it. Thanks.
12817:53 <monlovesmango> I misread the RecalculateBestHeader function, it only asserting that lock is held, not trying to create a new lock
12917:54 <stickies-v> marcofleon: stringintech: yeah , my understanding for Q5 is that it's an optimization to avoid unnecessarily traversing `m_block_index` - lightlike do you have anything to add there? we talked about this offline a bit
13017:54 <stickies-v> catnip: what do you mean?
13117:55 <stickies-v> we're almost out of time - quick poll - would people prefer we cover Q7 or Q8? otherwise i pick
13217:56 <lightlike> stickies-v: yes, exactly. The idea is to traverse m_block_index only once, not multiple times (imagine someone invalidateblock for a lot of blocks, which is already quite slow)
13317:56 <strat___> (old question - understood the scenario now) catnip: stickies-v: A - B' becomes the active chain now. maybe that's the reorg you're referring to. https://usercontent.irccloud-cdn.com/file/u0RCF8KU/pic-2.png
13417:56 <monlovesmango> you choose
13517:56 <marcofleon> stickies-v: you pick
13617:56 <stickies-v> 8. Would we still need the `cand_invalid_descendants` cache if we were able to iterate forwards (i.e. away from the genesis block) over the block tree? What would be the pros and cons of such an approach, compared to the one taken in this PR?
13717:58 <stickies-v> strat___: yes!! thank you for making a diagram, that's so much more helpful. So my concern (which turned out to be unfounded) was that because B was no longer in active chain, it wouldn't get connected again, and so its status might not get updated (but `reconnectblock` does indeed handle it well)
13817:58 <stringintech> We wouldn't need it if we had pointers to all children of a block I guess (which is impractical to maintain).
13917:58 <catnip> temp cache for invalid descendants
14017:59 <monlovesmango> I don't think so. the cons would be there would be a whole new data structure to maintain on CBlockIndex (and it would be more complex than the pprev pointer as you can multiple decendants)
14117:59 <stickies-v> stringintech: what makes it impractical?
14218:00 <stringintech> maybe complexity in the design? and always maintaining a correct list of children?
14318:00 <stickies-v> monlovesmango: it would just be a simpler container of pointers, like a `std::vector<CBlockIndex*>`. and since we never remove items from `m_block_index`, we'd only have to add a pointer to a predecessor once, and then never look at it again
14418:00 <stringintech> I was also gonna say storage... but maybe not :)
14518:00 <marcofleon> we'd have to be able to point to multiple children
14618:00 <stickies-v> s/it would/it could
14718:01 <lightlike> also all block indexes are constantly held in memory, so it adds up the longer the chain gets (testnet 3 has 4M blocks now...).
14818:01 <stickies-v> stringintech: we wouldn't have to persist this to storage, can be calculated whenever we load from disk
14918:02 <monlovesmango> the pro might be that it could be useful for other things
15018:02 <stringintech> :stickies-v Oh; You are right.
15118:02 <monlovesmango> stickies-v: you make it sound simple haha. but agree.
15218:02 <stickies-v> lightlike: yeah , it would mean a lot of additional vectors and pointers in memory
15318:03 <catnip> stickies-v  Cons: memory usage ? Redundant trees walking through?
15418:03 <monlovesmango> I guess my assumption was that this is unwanted overhead which is why it hasn't been done
15518:03 <stickies-v> catnip: yes! iterating backwards is trivial, because each block has 1 parent. But iterating forward, even if we explicitly link the block indexes, would still be non-trivial because we have to cover all branches
15618:04 <stickies-v> there aren't really any performance implications, because this is a pretty rare code path
15718:04 <stickies-v> and i'm not sure it would meaningfully simplify the code (but i'd love to be proven wrong if anyone wants to prototype it)
15818:05 <stickies-v> monlovesmango: yes to both your points!
15918:05 <stickies-v> alright, went a bit over time already, thank you everyone for your participation on today's review club, and lightlike for making variables less wrong, yay!
16018:05 <catnip> stickies-v -_-
16118:05 <marcofleon> thanks stickies, good stuff
16218:06 <lightlike> thank you stickies-v!
16318:06 <monlovesmango> thank you stickies-v! great review session :)
16418:06 <catnip> Merci
16518:06 <stringintech> Thank you all!
16618:06 <stickies-v> i'll be on here for another ~15 minutes, if anyone has general conceptual questions on the block validation touched by this PR
16718:06 <strat___> thank you stickies-v for hosting!
16818:06 <stickies-v> so feel free to shoot questions, concerns, ideas, if you still have them
16918:06 <stickies-v> #endmeeting

Footnotes

  1. https://bitcoin.stackexchange.com/a/51026/129640 â†©