Remove dead CheckForkWarningConditionsOnNewFork (consensus)

https://github.com/bitcoin/bitcoin/pull/19905

Host: MarcoFalke  -  PR author: MarcoFalke

The PR branch HEAD was fa7eed5b at the time of this review club meeting.

Notes

  • Following an accidental chain split in 2013 (documented in BIP 50), a fork warning system was added in PR 2658.

  • The feature was originally tested via the large reorg test (today found in test/functional/feature_block.py). However it is no longer possible to trigger the warning due to changes in the validation logic.

  • The validation logic to validate and potentially connect a new block is handled in ProcessNewBlock(), which can be called from ProcessMessage() in src/net_processing.cpp or from the submitblock RPC. ProcessNewBlock() runs basic validation sanity checks and stores the block to disk. If the initial checks pass, it calls ActivateBestChain() to (try to) connect the block to the best chain.

Questions

  1. Did you review the PRs? Concept ACK, approach ACK, ACK, or NACK?

  2. What are some examples of when a chain split can happen? Which ones should users be warned about?

  3. ActivateBestChain() consists of two nested loops. What is the condition for each loop to terminate?

  4. The inner loop calls ActivateBestChainStep(), which disconnects blocks (in case of a reorg) and connects a batch of up to 32 blocks toward the new tip. When an invalid block is found, CheckForkWarningConditionsOnNewFork() is called. Which block from the batch is passed to CheckForkWarningConditionsOnNewFork()?

  5. ActivateBestChainStep() will also call m_chain.SetTip(), which updates the global reference to the active chain tip (::ChainActive().Tip()). What is the maximum block height difference between pindexNewForkTip and pfork?

  6. Is it possible to hit the condition that updates pindexBestForkTip?

  7. What are your thoughts on the approach of the pull request? Should the bug in the warning system be fixed or should the code be removed? What alternative places/ways to implement the warning logic can you think of?

Meeting Log

  117:00 <jnewbery> #startmeeting
  217:00 <jnewbery> hi folks. Welcome to review club!
  317:00 <emzy> hi
  417:00 <stacie> hi
  517:00 <sift> hi
  617:00 <murch> hi
  717:00 <andozw> hi
  817:00 <jnewbery> feel free to say hi to let everyone know you're here
  917:00 <fjahr> hi
 1017:00 <lightlike> hi
 1117:00 <glozow> hi
 1217:00 <carlakc> hi
 1317:00 <dongcarl> hi
 1417:00 <michaelfolkson> hi
 1517:00 <jnewbery> anyone here for the first time?
 1617:01 <andozw> :waves: :)
 1717:01 <troygiorshev> hi
 1817:01 <fi3> hi
 1917:01 <jnewbery> welcome andozw! Thanks for joining us :)
 2017:01 <MarcoFalke> hi
 2117:01 <andozw> thank you! excited to learn
 2217:01 <nehan> hi
 2317:01 <jnewbery> Notes and questions are in the normal place: https://bitcoincore.reviews/19905
 2417:02 <MarcoFalke> Reminder don't ask to ask, just ask :)
 2517:02 <jnewbery> MarcoFalke wrote the notes and questions and is hosting today, so I'll pass over to him. Thanks Marco!
 2617:02 <sergei-t> hi
 2717:02 <MarcoFalke> ok, let's get started. Who reviewed the pull?
 2817:02 <jonatack> hi
 2917:02 <jnewbery> y
 3017:02 <stacie> y
 3117:02 <sergei-t> jnewbery: I'm for the first time. Totally unprepared, will probably just lurk.
 3217:02 <nehan> y
 3317:03 <troygiorshev> n (only a little)
 3417:03 <MarcoFalke> sergei-t: Welcome
 3517:03 <dongcarl> y
 3617:03 <fi3> n
 3717:03 <lightlike> 0.5
 3817:03 <jnewbery> sergei-t: that's fine. Lurkers are welcome too!
 3917:03 <carlakc> I went through it, but still getting a handle on cpp so very high level
 4017:03 <carlakc> will likely also lurk :)
 4117:04 <fjahr> wip
 4217:04 <felixweis> hi
 4317:04 <emzy> n
 4417:04 <MarcoFalke> Today we are talking about chain splits. What are some examples of when a chain split can happen? Which ones should users be warned about?
 4517:04 <felixweis> 2013 types of chain splits
 4617:05 <michaelfolkson> Name all 2013 please
 4717:05 <nehan> if the network is partitioned
 4817:05 <emzy> There was the problem with 32 and 64 bit systems. Remember not more about it.
 4917:06 <sipa> emzy: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-July/009697.html
 5017:06 <MarcoFalke> Which ones should the user *not* be warned about?
 5117:06 <stacie> BIP-50 describes a chain split as an unintended result of switching from Berkely DB to Level DB. Level DB was able to handle blocks with more tx inputs
 5217:06 <nehan> or if there's a soft fork
 5317:06 <michaelfolkson> I guess there are bugs causing network chain splits. Natural re-orgs would be considered temporary chain splits?
 5417:06 <lightlike> one harmless example would just two blocks found at almost the same time by chance.
 5517:06 <nehan> a user probably shouldn't be notified if it's very small, or if they're still doing IBD
 5617:06 <fjahr> Competing blocks mined by different miners competing for the longest chain.
 5717:06 <felixweis> few blog reorgs. 1 stale block happens regularly
 5817:07 <pinheadmz> chainsplits are worrisome after a soft fork like strict DER - miners were not fully validating!
 5917:07 <MarcoFalke> Good points so far
 6017:07 <sift> pinheadmz +1
 6117:08 <MarcoFalke> About expected stale blocks, users shouldn't be warned, but if there is a disagreement about consensus rules, they should be warned.
 6217:08 <MarcoFalke> Let's got to the next question.
 6317:08 <MarcoFalke> ActivateBestChain() consists of two nested loops. What is the condition for each loop to terminate?
 6417:08 <sift> consensus-equivalent, block-production equivalent chain split would be worrying
 6517:09 <nehan> i think outer loop: caught up to tip of most-work chain we've seen
 6617:10 <MarcoFalke> btw, this is the function for those lurking: https://github.com/bitcoin/bitcoin/blob/ed9f5477502e5856eba742656e8fc0fcf7bb053b/src/validation.cpp#L2870
 6717:10 <stacie> I noticed CBlockIndexWorkComparator returns true if the block passed in as the second parameter is the “better” block (based on the following criteria, in order: if it has more work, was received earlier (nSequenceId), or has a lower pointer address). My question is, is that how comparators typically work in C++? Return true if the second parameter is “better”?
 6817:10 <MarcoFalke> nehan: right
 6917:11 <lightlike> inner loop: our current tip (after connecting some blocks) has more work than the one we started this loop with
 7017:11 <MarcoFalke> lightlike: correct
 7117:12 <sipa> stacie: comparators' contract is to return true for "less than"
 7217:12 <sipa> e.g. std::set<int> is the same as std::set<int, std::less<int>>
 7317:13 <MarcoFalke> The inner loop calls ActivateBestChainStep(), which disconnects blocks (in case of a reorg) and connects a batch of up to 32 blocks toward the new tip. When an invalid block is found, CheckForkWarningConditionsOnNewFork() is called. Which block from the batch is passed to CheckForkWarningConditionsOnNewFork()?
 7417:13 <sipa> where std::less<int> is an object that exposes an operator()(int a, int b) { return a < b; }
 7517:13 <jnewbery> Compare function: https://en.cppreference.com/w/cpp/named_req/Compare
 7617:14 <stacie> ahh! thanks sipa & jnewbery!
 7717:14 <MarcoFalke> again for lurkers: https://github.com/bitcoin/bitcoin/blob/ed9f5477502e5856eba742656e8fc0fcf7bb053b/src/validation.cpp#L2744
 7817:16 <stacie> the first invalid block from the batch (the one with the lowest height) is the one passed to CheckForkWarningConditionsOnNewFork()
 7917:17 <MarcoFalke> stacie: correct
 8017:17 <MarcoFalke> ActivateBestChainStep() will also call m_chain.SetTip(), which updates the global reference to the active chain tip (::ChainActive().Tip()). What is the maximum block height difference between pindexNewForkTip and pfork?
 8117:18 <MarcoFalke> (This question is a bit more tricky, so I won't accept an answer that is just a single number ;)
 8217:19 <MarcoFalke> for lurkers, we just jumped into a new function: https://github.com/bitcoin/bitcoin/blob/ed9f5477502e5856eba742656e8fc0fcf7bb053b/src/validation.cpp#L1378
 8317:20 <MarcoFalke> reminder that pindexNewForkTip is "the first invalid block from the batch (the one with the lowest height)" (correct answer from stacie)
 8417:21 <michaelfolkson> A tip within 72 blocks but a fork that is at least 7 blocks. What does that mean?
 8517:22 <jnewbery> MarcoFalke: are you asking for the difference in height between pindexNewForkTip and plonger?
 8617:22 <jnewbery> (pfork is set to pindexNewForkTip)
 8717:22 <MarcoFalke> jnewbery: pfork, when it has been assigned it's final value
 8817:23 <MarcoFalke> pfork is initialized to pindexNewForkTip, and then "walked back" to determine the base of the fork
 8917:24 <fjahr> Should be just 1 because if its only the first block in the fork that is passed to the function
 9017:24 <murch> That comment is odd. Was that second sentence inserted into the middle of the big sentence?
 9117:24 <jnewbery> I think pindexNewForkTip is either on the active chain, or it's the child of a block on the active chain
 9217:25 <MarcoFalke> michaelfolkson: The code is more clear on that. It means the fork tip is within 72 blocks of our current tip, but the fork is at least 7 blocks "heavy" (with pow)
 9317:26 <MarcoFalke> fjahr: Correct
 9417:26 <murch> Given that different difficulties can only appear across diff adjustments, and the adjustment is limited to a factor of four, why a factor of 10?
 9517:26 <MarcoFalke> jnewbery: Also correct. If it is on the active chain, the difference is 0. If not, it is 1
 9617:27 <MarcoFalke> murch: Are you asking about the "10%"?
 9717:27 <jnewbery> just to clarify a point above. The block that's passed to CheckForkWarningConditionsOnNewFork() isn't necessarily an invalid block
 9817:27 <sift> MarcoFalke "but the fork is at least 7 blocks "heavy" (with pow)" 7 blocks worth of PoW weight...relative to what?
 9917:28 <MarcoFalke> sift: Measured relative to the pow of the fork base block
10017:28 <stacie_> For the earlier question about the inner loop in ActivateBestChain(). terminating, I see the line where the conditions are defined, https://github.com/bitcoin/bitcoin/blob/ed9f5477502e5856eba742656e8fc0fcf7bb053b/src/validation.cpp#L2932 but I can't find where starting_tip is updated. I'm guessing it might have something to do with the fact that it's a pointer? (my cpp is rusty :) )
10117:28 <sift> MarcoFalke e.g. 7 blocks worth of PoW weight at the divergence point?
10217:28 <jnewbery> imagine we've tried to connect 32 blocks in a loop in ActivateBestChainStep() and the 10th block is invalid. The pindex that we pass to CheckForkWarningConditionsOnNewFork() will be the first block in that batch of 32, which is a valid block
10317:28 <MarcoFalke> sift: yes, that's how I see it
10417:29 <sift> ty
10517:29 <sipa> stacie_: https://github.com/bitcoin/bitcoin/blob/ed9f5477502e5856eba742656e8fc0fcf7bb053b/src/validation.cpp#L2898
10617:29 * sift thinks about a miner, under producing blocks to drop that 7 block pow weight and then turning on hash rate
10717:30 <MarcoFalke> jnewbery: thanks for the extended explanation :)
10817:30 <MarcoFalke> jnewbery: To round up, can you also explain why the diff wouln't be negative in case the passed block is valid?
10917:31 <stacie_> sipa - thanks. so every time the loop progresses, the m_chain.Tip() call will result in a new value, right?
11017:31 <murch> 7 blocks worth of weight, but 72 blocks long seems impossible, because diff could only change minusculy between two competing tips
11117:31 <MarcoFalke> murch: not "72 blocks long", but "withing 72 blocks of our tip"
11217:31 <sipa> stacie_: starting_tip doesn't change
11317:31 <murch> But I think I haven't really grokked what the 72 and 7 blocks are
11417:32 <jnewbery> MarcoFalke: because plonger will walk back until it's the same block as pindexNewForkTip
11517:32 <sipa> but m_chain.Tip will
11617:32 <murch> oh, you mean it could include a stale tip forking off 30 blocks ago?
11717:32 <MarcoFalke> jnewbery: Correct, thx.
11817:32 <troygiorshev> stacie_: if I'm seeing it right, it's line 2547 that makes the change to m_chain.Tip
11917:32 <MarcoFalke> murch: Yes, that was the design goal (I think)
12017:33 <lightlike> stacie_: I think that starting_tip is not updated once defined (it is the reference we compare to), but m_chain.Tip() ist updated inside of ConnectTip(), which is called from ActivateBestChainStep
12117:33 <troygiorshev> stacie_: i take it back, it's what lightlike said :)
12217:34 <michaelfolkson> murch: I am unsure why you need a warning about a stale tip forking off 30 blocks ago. Who cares?
12317:34 <murch> Well, would still indicate a consensus failure
12417:35 <murch> someone got forked off and recognized an ancestor of your chaintip as invalid
12517:35 <MarcoFalke> michaelfolkson: For example if 40% of miners are building on a different chain, that is 30 blocks back
12617:35 <michaelfolkson> Oh ok, that makes sense
12717:37 <MarcoFalke> Is it possible to hit the condition that updates pindexBestForkTip?
12817:37 <stacie_> catching up now - thank you sipa, trygiorshev and lightlike! That explains it. I didn't dive into the methods that ActivateBestChainStep calls but it makes sense now!
12917:37 <stacie_> *troygiorshev
13017:38 <elle> MarcoFalke: pIndexNewForkTip never gets set because it is only set if pfork is set...which is only set if pIndexNewForkTip is set
13117:40 <elle> oh sorry , looking at wrong variable
13217:40 <MarcoFalke> To recap: pfork is set, but the height difference between pindexNewForkTip and pfork is at most 1
13317:41 <jnewbery> We need to look at this condition: pindexNewForkTip->nChainWork - pfork->nChainWork > (GetBlockProof(*pfork) * 7)
13417:42 <MarcoFalke> Jup, we are in CheckForkWarningConditionsOnNewFork, which is supposed to update the pindexBest* variables
13517:43 <glozow> is pfork the base of the fork there? sorry very basic question
13617:43 <MarcoFalke> glozow: Correct
13717:43 <MarcoFalke> pfork is the assumed base of the fork relative to our current chain tip
13817:43 <glozow> so the condition jnewbery is talking about = there's 7 blocks worth of work on the fork?
13917:44 <glozow> 7 blocks of work on 1 block? 🤔
14017:44 <MarcoFalke> glozow: Yes. This is one of the conditions that need to be fulfilled to update the pindexBest* variables
14117:45 <michaelfolkson> 7 blocks of work on 7 blocks
14217:45 <fjahr> michaelfolkson: but the condition would only be fulfilled if you would have 7 blocks of work in 1 block
14317:46 <jnewbery> so we've established that pindexNewForkTip and pfork can be at most one block apart, and the condition is that there's at least 7 block's worth of difference in their total work. Is that possible?
14417:46 <fjahr> that's the thing why it can't be hit or only in theory
14517:46 <glozow> seems like the answer to Marco's question is ~no
14617:47 <MarcoFalke> Correct, this can not be hit
14717:47 <MarcoFalke> Extra question: if the we passed the highest block of the to-be-connected chain instead, what would the height difference between pfork and pindexNewForkTip be?
14817:48 <jnewbery> the to-be-connected chain is 32 blocks
14917:48 <fjahr> 32
15017:48 <jnewbery> *up to 32 blocks, sorry
15117:48 <MarcoFalke> up to 32 or always 32?
15217:48 <MarcoFalke> jnewbery: Heh, nice
15317:49 <MarcoFalke> That the lowest hight block is passed is an oversight/bug, but can the fork detection logic be fixed by passing the highest block?
15417:49 <murch> jnewbery: no, because two competing blocks can only differ in difficulty in forks across a difficulty adjustment and a factor 7 won't be possible there
15517:50 <MarcoFalke> murch: Even with a difficulty adjustment this should not be possible
15617:50 <sift> 10 minutes left
15717:50 <murch> yes
15817:50 <MarcoFalke> I think those are limited to a factor of 4
15917:50 <murch> Well, one could go down, the other up
16017:51 <murch> but the time difference wouldn't be that big in practice
16117:51 <MarcoFalke> What are your thoughts on the approach of the pull request? Should the bug in the warning system be fixed or should the code be removed? What alternative places/ways to implement the warning logic can you think of?
16217:52 <michaelfolkson> I have a couple of unrelated questions. What tools are available to the user to react to the warning? Presumably if they became aware of a UASF like scenario they could join it
16317:52 <michaelfolkson> And then where is the warning test in test/functional/feature_block.py? I couldn't find it with a quick search
16417:53 <MarcoFalke> michaelfolkson: Good q. If I was an exchange and I'd see a warning, I'd shut down all nodes for now
16517:53 <MarcoFalke> michaelfolkson: It is the "re-org" test (last test which creates more than 1000 blocks)
16617:53 <jnewbery> MarcoFalke: I think that removing dead code and adding working code can be seen as two separate questions. I don't think "You should fix it" is a reasonable response to someone removing broken code.
16717:54 <MarcoFalke> jnewbery: I tend to agree
16817:54 <jnewbery> if it's broken and has been for many releases, then removing the dead code so the code reflects reality is a quality improvement in itself
16917:54 <jnewbery> if someone then wants to implement a working warning system, that should be judged on its own merits
17017:54 <troygiorshev> Might this be a similar situation to the old "alert notify" system? Where (even if it was fixed) it's simply outdated?
17117:54 <sift> seems like priority should be soundness first, performance second
17217:55 <michaelfolkson> In response to your questions MarcoFalke you say in the PR comment "add a fork detection somewhere outside of the ActivateBestChainStep logic (maybe net_processing)."
17317:55 <murch> Anything that is an obvious improvement has merit, and removing dead code is an obvious improvement
17417:55 <MarcoFalke> michaelfolkson: Yes, I think that the internal validation logic is the wrong place to put the fork detection algorithm
17517:56 <michaelfolkson> I don't think anything has changed re whether the warning is needed or not. If it was needed back then it is needed today too
17617:56 <murch> Upping the bar just leads to abandoned PRs and frustrated would be contributors
17717:57 <fjahr> It doesn't seem like the ideal way to solve this so removing and (possibly) reimplementing seems better than fixing which would make the PR more complicated. But I don't have great ideas how it should be done.
17817:57 <michaelfolkson> But yeah if it doesn't work and it is in the wrong place it should be removed
17917:57 <michaelfolkson> Ideally there would be a follow up PR reimplementing it :)
18017:57 <MarcoFalke> A warning system is very much needed
18117:58 <jnewbery> michaelfolkson: you're very welcome to open that PR!
18217:58 <emzy> Could be first an external one.
18317:58 <michaelfolkson> Haha
18417:58 <emzy> To figure out what's the best parameters.
18517:58 <fjahr> michaelfolkson: just change the 7 to 1 and reuse the rest of the code :P
18617:59 <sift> The_scream.jpg
18717:59 <MarcoFalke> Any last minute questions/suggestions?
18818:00 <MarcoFalke> #endmeeting
18918:00 <MarcoFalke> Thanks everyone!
19018:00 <troygiorshev> thanks MarcoFalke!
19118:00 <sift> thanks MarcoFalke !
19218:00 <murch> Thanks
19318:00 <emzy> Thank you MarcoFalke!
19418:00 <stacie_> Thank you MarcoFalke!
19518:00 <jnewbery> I tend to think that if it wasn't working, then ripping it out and starting from scratch is safer. It shouldn't be the case, but tweaking code that wasn't working would probably receive less scrutiny in review than implementing something new
19618:00 <felixweis> Thanks MarcoFalke!
19718:00 <sergei-t> thanks!
19818:00 <lightlike> thanks!
19918:00 <carlakc> thanks for the lurker links marco!
20018:00 <fjahr> thanks MarcoFalke
20118:00 <jnewbery> Thanks MarcoFalke!
20218:01 <dongcarl> thanks MarcoFalke!
20318:01 <michaelfolkson> Thanks MarcoFalke, very interesting
20418:01 <MarcoFalke> jnewbery: If the fix was trivial, I might have fixed it. But I don't even see a way to fix this without rewriting the validation core
20518:01 <MarcoFalke> logic
20618:02 <michaelfolkson> I guess in terms of review the most important thing is to check that the code being ripped out isn't used elsewhere. Just for the warning
20718:02 <jnewbery> Well done everyone. Today was a difficult PR to review. We went really deep into the consensus code. If you managed to follow along for the whole meeting you can give yourself a pat on the back :)
20818:02 <jnewbery> we'll do something a bit easier next week
20918:04 <fjahr> BTW, that the code discussed is not hit by any of the tests can also be seen in the coverage analysis: https://marcofalke.github.io/btc_cov/total.coverage/src/validation.cpp.gcov.html obviously also provided by MarcoFalke
21018:04 <MarcoFalke> fjahr: lol, I initially spent a day trying to write a test for it
21118:05 <MarcoFalke> writing tests for uncovered code is a good way to find dead and buggy code
21218:05 <michaelfolkson> What are you using to identify code that can be safely be ripped out? Just eyes and something like ctags?
21318:05 <fjahr> :)
21418:05 <MarcoFalke> michaelfolkson: writing tests. I haven't used ctags yet
21518:06 <MarcoFalke> I only use vanilla git and vim (and firefox for GitHub)
21618:06 <michaelfolkson> OK cool, thanks