Return the AcceptBlock CValidationState directly in ProcessNewBlock (validation)

https://github.com/bitcoin/bitcoin/pulls/16279

Host: jnewbery  -  PR author: TheBlueMatt

Notes

  • ProcessNewBlock() is net processing’s entry point into validation, called whenever net processing receives a new block. It can be called in ProcessMessage() when a BLOCK, CMPCTBLOCK or BLOCKTXN message is received and net processing is ready to hand a block to the validation layer.
  • ProcessNewBlock() calls CheckBlock() to carry out non-contextual validation tests on the block. If those checks pass, then ProcessNewBlock() calls AcceptBlock(), which does more checks on the block and then stores it to disk.
  • If AcceptBlock() passes and the block is valid, then ActivateBestChain() is called to update the node’s view of the blockchain. If the new block has more work than the node’s current tip, then ActivateBestChain will try to connect it to the chain and update the UTXO set. This can fail, for example if a transaction in the block spends non-existent coins.
  • If a peer is sending us invalid blocks, then depending on the invalidity reason, we may wish to disconnect that peer.
  • If either CheckBlock() or AcceptBlock() fails, then the failure reason will be saved in a CValidationState object and returned to ProcessNewBlock().
  • Prior to this PR, ProcessNewBlock() would then call the BlockChecked method in the CValidationInterface. net processing would receive the BlockChecked callback and then potentially disconnect the peer.
  • This PR changes ProcessNewBlock so that if CheckBlock() or AcceptBlock() fail, then it will return the CValidationState directly to the caller, which can then decide whether to ban the peer.
  • This PR is a step towards PR 16175, which makes ProcessNewBlock() asynchronous, and would allow parallelization between net processing and validation.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)

  2. What steps did you take, beyond reading the code?

  3. Did anything about the way new blocks are checked and accepted surprise you? Do you have any questions about the series of steps to add a block to the block chain?

  4. The author describes this PR as “a first step towards a series of cleanups leading to a (more-complete) #16175. It’s just two rather-nice (IMO) cleanups.” Do you agree? Is this just a cleanup PR or are there behaviour changes?

  5. Are there any other CValidationInterface clients that use the BlockChecked callback?

  6. The other place that the BlockChecked method is called is in ConnectTip() in ActivateBestChain(). Why does that need to be a callback? Why does net processing need to keep a mapBlockSource map from the block to the peer that provided it?

Meeting Log

  113:00 <jnewbery> hi
  213:00 <zenogais> hi
  313:00 <fox2p> hey
  413:00 <ariard> hi
  513:00 <amiti> hi
  613:01 <jnewbery> what did everyone think of this week's PR?
  713:01 <sebastianvstaa> hi
  813:01 <ajonas> hi
  913:01 <zenogais> Was interesting, small change that interacts with a lot of crucial systems.
 1013:01 <jnewbery> zenogais: yes! I agree
 1113:02 <jnewbery> very little functional change, but there's lots to dig into
 1213:02 <jonatack_> hi
 1313:02 <zenogais> Definitely caused me to explore parts of the codebase I'd never seen before.
 1413:02 <amiti> yeah agreed, not a ton changed but I learned/explored a lot to get the context
 1513:02 <jnewbery> we've touched on the CValidationInterface in a few previous reviews, but I think this is the first time we've looked at changes in validation.cpp
 1613:02 <pinheadmz> hi !
 1713:03 <amiti> very context heavy, as many of the conversations revealed
 1813:03 <jnewbery> So I think the first thing to do for this one is understand what's going on with the CValidationInterface. It's declared here: https://github.com/jnewbery/bitcoin/blob/2ec121f09d8f7117fc9a8f830a7242f9a3602b78/src/validationinterface.h#L71
 1913:03 <jnewbery> that's where the useful comments are
 2013:04 <jnewbery> you can see that there are 8 methods in the interface. Most of them are asynchronous, but a couple are synchronous. Can you see which ones?
 2113:05 <pinheadmz> well, like RegisterBackgroundSignalScheduler takes a callback as an arg?
 2213:06 <pinheadmz> so id guess that is async
 2313:06 <jnewbery> RegisterBackgroundSignalScheduler is adding the scheduler thread. I think it's only called at startup/initialization
 2413:06 <jnewbery> The methods are in the CValidationInterface() class
 2513:07 <jnewbery> UpdatedBlockTip, TransactionAddedToMempool, etc
 2613:07 <jnewbery> It might be easier if you look at where those methods are defined: https://github.com/jnewbery/bitcoin/blob/2ec121f09d8f7117fc9a8f830a7242f9a3602b78/src/validationinterface.cpp#L131
 2713:07 <ariard> ChainStateFlushed and BlockChecked
 2813:08 <ariard> and NewPowValidBlock?
 2913:08 <fox2p> and NewPoWValidBlock?
 3013:08 <fox2p> yeah
 3113:08 <zenogais> So anything that doesn't call `AddToProcessQueue`?
 3213:08 <jnewbery> zenogais: correct
 3313:08 <ajonas> so not ChainStateFlushed then
 3413:09 <jnewbery> BlockChecked and NewPOWValidBlock are synchronous
 3513:09 <ariard> No ChainStateFlushed sorry
 3613:09 <jnewbery> you can see those two methods are making direct function calls. The other methods are all adding a lambda function to the queue
 3713:10 <jnewbery> the scheduler thread will come and service those functions in the background at some point.
 3813:10 <jnewbery> ok, so with that background, let's move onto the PR
 3913:10 <jnewbery> describe briefly what the PR is doing
 4013:11 <jnewbery> all of you... GO!
 4113:11 <pinheadmz> ha!
 4213:12 <pinheadmz> adds a new arg to ProcessNewBlock, a CValidationState
 4313:12 <pinheadmz> so every time you call PNB, you have to init a state first, pass it in, thenafter PNB returns, you can examine it for isValid()
 4413:12 <jnewbery> pinheadmz: right, and what's that CValidationState argument used for
 4513:12 <jnewbery> pinheadmz: yes
 4613:12 <pinheadmz> seems to just return validation errors and state
 4713:12 <pinheadmz> i was hoping the util tests would have a bad block in there and i could see a "bad" response
 4813:13 <pinheadmz> but I think all the tests return boring valid states
 4913:13 <zenogais> Validation state is used for DoS checks
 5013:13 <jnewbery> zenogais: correct. And where do we do those DoS checks?
 5113:13 <jnewbery> Which function handles invalid blocks?
 5213:14 <pinheadmz> is that checkblock and contextualcheckblock ?
 5313:14 <zenogais> InvalidBlockFound
 5413:14 <zenogais> and some others, let me check
 5513:15 <jnewbery> pinheadmz: I'm thinking more about on the net processing layer. Which function sees that we've received an invalid block and then handles whether to punish peers?
 5613:15 <zenogais> Ah and BlockChecked
 5713:15 <pinheadmz> BlockChecked()
 5813:15 <jnewbery> exactly
 5913:15 <pinheadmz> funny to name a function past-tense but ok i get it
 6013:16 <jnewbery> pinheadmz: the function is an implementation of that CValidationInterface method
 6113:16 <zenogais> which calls MaybePunishNode with the state given
 6213:16 <jnewbery> so that method is telling subscribers that 'a block has been checked'
 6313:17 <pinheadmz> 👍
 6413:17 <jnewbery> You can see here (in master) where the function is defined: https://github.com/bitcoin/bitcoin/blob/c34b88620dc8435b83e6744895f2ecd3c9ec8de7/src/net_processing.cpp#L1231
 6513:18 <jnewbery> It's a member function of the PeerLogicValidation class
 6613:19 <jnewbery> and you can see where that class is declared that it's inheriting from CValidationInterface: https://github.com/bitcoin/bitcoin/blob/c34b88620dc8435b83e6744895f2ecd3c9ec8de7/src/net_processing.h#L22
 6713:19 <jnewbery> so prior to this PR, we had net processing calling into validation (using ProcessNewBlock()), which then called directly into net processing (using the BlockChecked() validation interface method, which is a direct function call)
 6813:20 <jnewbery> so net processing -> validation -> net processing
 6913:20 <jnewbery> does that make sense? any questions about that?
 7013:20 <zenogais> Yep makes sense
 7113:20 <pinheadmz> cool.
 7213:20 <zenogais> Looks like it's registered here in init.cpp: https://github.com/bitcoin/bitcoin/blob/master/src/init.cpp#L1323
 7313:21 <jnewbery> zenogais: exactly. Any client that wants to register with the validation interface needs to call that RegisterValidationInterface() function
 7413:22 <jnewbery> and how does this PR change that?
 7513:22 <jnewbery> (the net processing -> validation -> net processing stack)
 7613:23 <zenogais> Adds CConnman* as an argument
 7713:23 <pinheadmz> oh yeah i was a bit confused - what would be a 'client' in this sense?
 7813:23 <pinheadmz> are we actually talking about a process outside bitcoind? like ZMQ notificaitons ?
 7913:24 <jnewbery> pinheadmz: no, the clients are the components or classes that register with the validation interface. They're all internal to bitcoind
 8013:24 <ariard> like the index or wallet
 8113:24 <jnewbery> ariard: exactly
 8213:25 <jnewbery> other clients are the ZMQ component, which receives notifcations over the validation interface and then sends ZMQ notifications out
 8313:25 <ariard> and mining stack IIRC
 8413:25 <jnewbery> and PeerLogicValidation, which is net processing
 8513:26 <jnewbery> ariard: right. That uses the validation interface to check that a submitted block was valid
 8613:26 <zenogais> Adds an argument to ProcessNewBlock as well to bubble up the CValidationState
 8713:26 <jnewbery> I think that's all of them. Just look for classes which inherit CValidationInterface
 8813:26 <jnewbery> and then call RegisterValidationInterface()
 8913:27 <jnewbery> zenogais: right, so ProcessNewBlock is no longer calling BlockChecked directly
 9013:27 <jnewbery> which means validation is not making a direct functional call into net processing
 9113:28 <jnewbery> why is that important/interesting here?
 9213:28 <zenogais> Breaks the dependency there
 9313:29 <pinheadmz> The PR emphasizes pushing validation into a backgroun thread there as well
 9413:29 <pinheadmz> does it allow the process to be async then?
 9513:29 <amiti> easier to make async if its just a message being passed from net processing -> validation ?
 9613:29 <jnewbery> great! Yes, that's the motivation for this PR
 9713:30 <jnewbery> PR 16175 (WIP, currently closed) is the PR to make ProcessNewBlock async: https://github.com/bitcoin/bitcoin/pull/16175
 9813:31 <jnewbery> really, I think we want net processing to just be passing blocks/transactions up to validation, and being informed asynchronously of any validation events
 9913:31 <jnewbery> validation shouldn't be calling functions directly in net processing
10013:32 <jnewbery> any thoughts about that? Shall we move onto the questions: https://bitcoincore.reviews/16279.html#questions
10113:32 <amiti> can you hash that out a bit further?
10213:32 <amiti> why not?
10313:32 <amiti> it mostly makes sense to me, but not entirely
10413:32 <zenogais> My guess would be cleaner separation of concerns, and ability to run more work in parallel.
10513:33 <jnewbery> zenogais: yeah, that's it
10613:34 <jnewbery> if we want net processing and validation to run in parallel, we want net processing to be simply handing messages to validation and not have net processing and validation calling into each other
10713:35 <ariard> we would be able to check validity of blocks in parallel
10813:35 <amiti> ok ya. thanks
10913:35 <zenogais> Also just cleaner to understand if its strict message passing rather than nested levels of function calls between components.
11013:36 <jonatack> Yes, while avoiding reaching into internals across layers.
11113:36 <jnewbery> Perhaps we can look at PR 16175 in a future review club meeting if people are interested
11213:36 <jnewbery> to see where these changes are heading
11313:36 <ariard> have a look on this https://bitcoincore.org/en/meetings/2018/05/03/
11413:37 <ariard> about #12934, the OG refactoring PR
11513:37 <jnewbery> ariard: thanks! Nice find
11613:37 <jnewbery> ok, onto questions
11713:37 <jnewbery> What steps did you take, beyond reading the code?
11813:38 <jnewbery> Did you test this? If so, how?
11913:38 <pinheadmz> ran the tests, tried adding extra logging with printf to see what happens to dos_state
12013:38 <zenogais> I ran all unit and functional tests and reviewed the existing DoS tests to look for blind spots.
12113:38 <jnewbery> pinheadmz: zenogais: that's great!
12213:38 <zenogais> It seems like most of what is easy to test is tested already
12313:38 <jnewbery> did you find anything interesting?
12413:39 <pinheadmz> not really, the validation_block_tests dont seem to test bad blocks
12513:40 <pinheadmz> probably could have been better for me to log in CValidationState and then run the python tests
12613:40 <jnewbery> interesting. I don't think I've looked at that file. A lot of the block failure modes are tested in the feature_blocks.py functional test
12713:40 <pinheadmz> yeah, but the PR didn't touch those :-)
12813:41 <jnewbery> You all should go back and ACK https://bitcoincore.reviews/16688.html so we have better logging in the CValidationInterface :)
12913:41 <zenogais> One interesting thing I found was `CheckForkWarningConditions`. Seems untested, but interesting system to have in place.
13013:41 <jnewbery> ok, next question: Did anything about the way new blocks are checked and accepted surprise you? Do you have any questions about the series of steps to add a block to the block chain?
13113:42 <pinheadmz> (heh - the link on that 16688.html page goes to the wrong PR...)
13213:43 <jnewbery> pinheadmz: oops. Thanks - I'll fix after this meeting
13313:43 <ariard> have a look on BlockStatus in src/chain.h
13413:44 <ariard> we have different state according the validation work which has been done on a given Block
13513:44 <zenogais> ah yeah, the BlockStatus stuff was interesting
13613:44 <zenogais> still don't fully understandit
13713:45 <jnewbery> Yeah, I went digging into that because of this change: https://github.com/bitcoin/bitcoin/pull/16279/commits/c16e139246f215965bb572da1a11382b4f61d957
13813:45 <jnewbery> (which Matt moved into its own commit today)
13913:47 <jnewbery> BlockStatus is kind of weird. It seems to be partly treated as an enum and partly as a bitfield
14013:48 <ariard> jnewbery: where it's treated as a bitfield?
14113:48 <jnewbery> everything above the bottom three bits are treated as a bitfield, no?
14213:48 <jnewbery> BLOCK_HAVE_DATA = 8, //!< full block available in blk*.dat
14313:48 <jnewbery> BLOCK_HAVE_UNDO = 16, //!< undo data available in rev*.dat
14413:49 <jnewbery> etc
14513:49 <zenogais> The bottom three bits seem to be treated as a bitfield
14613:49 <zenogais> BLOCK_VALID_MASK ors some of those bits together AFAICT
14713:49 <ariard> Oh I see what you mean
14813:49 <zenogais> it's odd to or BLOCK_VALID_TRANSACTIONS w/ BLOCK_VALID_RESERVED and BLOCK_VALID_TREE since some of them flip the same bits on
14913:50 <jnewbery> the bottom three bits are used to store the validity
15013:50 <jnewbery> the fourth bit is whether the block is available in a blk*.dat file
15113:51 <ariard> well a OR of enum is still an enum
15213:51 <jnewbery> BLOCK_VALID_MASK could just be defined as 7 (or 0b11100000...)
15313:51 <jnewbery> ok, we've got 10 minutes left. Let's keep moving through the questions
15413:51 <jnewbery> The author describes this PR as “a first step towards a series of cleanups leading to a (more-complete) #16175. It’s just two rather-nice (IMO) cleanups.” Do you agree? Is this just a cleanup PR or are there behaviour changes?
15513:52 <ajonas> you definitely don't agree in https://github.com/bitcoin/bitcoin/pull/16279/files#r333115791
15613:52 <jnewbery> ajonas: :)
15713:52 <ariard> I agree there is a confusion between validity of block and its availability as data in BlockStatus
15813:52 <zenogais> Hard to draw the line between cleanup and behavior change. I think some of the stuff pointed out with BlockStatus by jnewbery leads me to believe there's a behavior change.
15913:53 <jnewbery> yeah, I found that change very difficult to review.
16013:54 <jnewbery> before I put my ACK on a PR I want to understand exactly how behaviour might change, especially in net processing and validation where very tiny changes can have surprising and dangerous effects
16113:55 <jnewbery> changing from `if (pindex->IsValid(BLOCK_VALID_TRANSACTIONS))` to `if (dos_state.IsValid())` seems reasonable, but it sent me off in lots of different directions trying to work out when BlockStatus is changed, what MarkBlockAsReceived() is doing exactly, etc
16213:56 <zenogais> Yeah, I found I went down some pretty interesting rabbit holes there
16313:56 <jnewbery> ok, a couple of minutes left. I had two other questions:
16413:56 <jnewbery> Are there any other CValidationInterface clients that use the BlockChecked callback?
16513:56 <jnewbery> The other place that the BlockChecked method is called is in ConnectTip() in ActivateBestChain(). Why does that need to be a callback? Why does net processing need to keep a mapBlockSource map from the block to the peer that provided it?
16613:57 <jnewbery> I'm happy to extend this for a few extra minutes if everyone else is
16713:57 <zenogais> submitblock_StateCatcher
16813:57 <jnewbery> zenogais: nice. What's that used for?
16913:58 <zenogais> I'm not exactly sure lol
17013:58 <zenogais> It's in mining.cpp so clearly related to that
17113:58 <zenogais> But hadn't had a chance to explore it
17213:58 <jnewbery> right. It temporarily registers with the validation interface
17313:58 <jnewbery> and then the block is submitted
17413:59 <jnewbery> and it catches whether a BlockChecked notification was sent
17513:59 <ariard> You need to keep a mapBlockSource to punish peer in case of invalidity ?
17613:59 <jnewbery> ariard: yes!
17713:59 <jnewbery> because we don't necessarily know whether a block was invalid until we try to connect it to the chain
17813:59 <ariard> exactly
17914:00 <jnewbery> and so at that point, if it is invalid (eg it contains a double-spend or invalid tx), we want to disconnect from the peer that sent it to us
18014:00 <jnewbery> so mapBlockSource keeps a map from the block hash to the peer that sent it to us
18114:01 <jnewbery> ok, we've overrun a bit, but does anyone have any final questions before we wrap it up?
18214:02 <amiti> yes! I'm confused by this comment: https://github.com/bitcoin/bitcoin/pull/16279/files#diff-349fbb003d5ae550a2e8fa658e475880R213
18314:02 <amiti> if fForceProcessing is set, then you force going through ProcessNewBlock again right
18414:03 <amiti> cause it passes that value through to AcceptBlock
18514:03 <jnewbery> yes, and I think it forces you to save to disk
18614:04 <amiti> so, reasons you would go through ProcessNewBlock are 1. fNewBlock is true 2. you force it with fForceProcessing 3. pruning enabled & trying to re-download a block that was pruned
18714:05 <jnewbery> fNewBlock I think is a return argument
18814:05 <amiti> oh right
18914:05 <amiti> ok so if you force it or pruned the block, you could repeat
19014:06 <amiti> but otherwise you wouldnt ever repeat with the same hash
19114:06 <zenogais> Have to head out. Thanks for hosting jnewberry!
19214:07 <jnewbery> I think the comment is trying to say - we don't need to call ProcessNewBlock again if we receive a block with the same hash, as long as the state returned was valid and (it was fForceProcessing or a new block)
19314:08 <jnewbery> the state.IsValid() means that the block hasn't been mutated - ie that the PoW is valid
19414:08 <amiti> ok
19514:09 <amiti> I think I get it
19614:09 <jnewbery> it's a confusing comment!
19714:09 <amiti> yeah and I was trying to fix it, but didn't actually understand what was trying to be said
19814:09 <pinheadmz> ive been down this road before - bc a legacy node could send a block w no witness data, but itd have the same hash
19914:10 <jnewbery> we're about 10 minutes over. Let's close it there.
20014:10 <amiti> thank you :)
20114:10 <jnewbery> Thanks everyone. Great discussion this week!
20214:10 <pinheadmz> yeah thanks JN!!!!
20314:11 <ariard> thanks
20414:11 <sebastianvstaa> thanks
20514:11 <ajonas> thanks John