#16279 Return the AcceptBlock CValidationState directly in ProcessNewBlock (validation)

https://github.com/bitcoin/bitcoin/16279

Host: jnewbery

Notes

  • ProcessNewBlock() is net processing’s entry point into validation, called whenever net processing receives a new block. It can be called in ProcessMessage() when a BLOCK, CMPCTBLOCK or BLOCKTXN message is received and net processing is ready to hand a block to the validation layer.
  • ProcessNewBlock() calls CheckBlock() to carry out non-contextual validation tests on the block. If those checks pass, then ProcessNewBlock() calls AcceptBlock(), which does more checks on the block and then stores it to disk.
  • If AcceptBlock() passes and the block is valid, then ActivateBestChain() is called to update the node’s view of the blockchain. If the new block has more work than the node’s current tip, then ActivateBestChain will try to connect it to the chain and update the UTXO set. This can fail, for example if a transaction in the block spends non-existent coins.
  • If a peer is sending us invalid blocks, then depending on the invalidity reason, we may wish to disconnect that peer.
  • If either CheckBlock() or AcceptBlock() fails, then the failure reason will be saved in a CValidationState object and returned to ProcessNewBlock().
  • Prior to this PR, ProcessNewBlock() would then call the BlockChecked method in the CValidationInterface. net processing would receive the BlockChecked callback and then potentially disconnect the peer.
  • This PR changes ProcessNewBlock so that if CheckBlock() or AcceptBlock() fail, then it will return the CValidationState directly to the caller, which can then decide whether to ban the peer.
  • This PR is a step towards PR 16175, which makes ProcessNewBlock() asynchronous, and would allow parallelization between net processing and validation.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)

  2. What steps did you take, beyond reading the code?

  3. Did anything about the way new blocks are checked and accepted surprise you? Do you have any questions about the series of steps to add a block to the block chain?

  4. The author describes this PR as “a first step towards a series of cleanups leading to a (more-complete) #16175. It’s just two rather-nice (IMO) cleanups.” Do you agree? Is this just a cleanup PR or are there behaviour changes?

  5. Are there any other CValidationInterface clients that use the BlockChecked callback?

  6. The other place that the BlockChecked method is called is in ConnectTip() in ActivateBestChain(). Why does that need to be a callback? Why does net processing need to keep a mapBlockSource map from the block to the peer that provided it?

Meeting Log

13:00 < jnewbery> hi
13:00 < zenogais> hi
13:00 < fox2p> hey
13:00 < ariard> hi
13:00 < amiti> hi
13:01 < jnewbery> what did everyone think of this week's PR?
13:01 < sebastianvstaa> hi
13:01 < ajonas> hi
13:01 < zenogais> Was interesting, small change that interacts with a lot of crucial systems.
13:01 < jnewbery> zenogais: yes! I agree
13:02 < jnewbery> very little functional change, but there's lots to dig into
13:02 < jonatack_> hi
13:02 < zenogais> Definitely caused me to explore parts of the codebase I'd never seen before.
13:02 < amiti> yeah agreed, not a ton changed but I learned/explored a lot to get the context
13:02 < jnewbery> we've touched on the CValidationInterface in a few previous reviews, but I think this is the first time we've looked at changes in validation.cpp
13:02 < pinheadmz> hi !
13:03 < amiti> very context heavy, as many of the conversations revealed
13:03 < jnewbery> So I think the first thing to do for this one is understand what's going on with the CValidationInterface. It's declared here: https://github.com/jnewbery/bitcoin/blob/2ec121f09d8f7117fc9a8f830a7242f9a3602b78/src/validationinterface.h#L71
13:03 < jnewbery> that's where the useful comments are
13:04 < jnewbery> you can see that there are 8 methods in the interface. Most of them are asynchronous, but a couple are synchronous. Can you see which ones?
13:05 < pinheadmz> well, like RegisterBackgroundSignalScheduler takes a callback as an arg?
13:06 < pinheadmz> so id guess that is async
13:06 < jnewbery> RegisterBackgroundSignalScheduler is adding the scheduler thread. I think it's only called at startup/initialization
13:06 < jnewbery> The methods are in the CValidationInterface() class
13:07 < jnewbery> UpdatedBlockTip, TransactionAddedToMempool, etc
13:07 < jnewbery> It might be easier if you look at where those methods are defined: https://github.com/jnewbery/bitcoin/blob/2ec121f09d8f7117fc9a8f830a7242f9a3602b78/src/validationinterface.cpp#L131
13:07 < ariard> ChainStateFlushed and BlockChecked
13:08 < ariard> and NewPowValidBlock?
13:08 < fox2p> and NewPoWValidBlock?
13:08 < fox2p> yeah
13:08 < zenogais> So anything that doesn't call `AddToProcessQueue`?
13:08 < jnewbery> zenogais: correct
13:08 < ajonas>  so not ChainStateFlushed then
13:09 < jnewbery> BlockChecked and NewPOWValidBlock are synchronous
13:09 < ariard> No ChainStateFlushed sorry
13:09 < jnewbery> you can see those two methods are making direct function calls. The other methods are all adding a lambda function to the queue
13:10 < jnewbery> the scheduler thread will come and service those functions in the background at some point.
13:10 < jnewbery> ok, so with that background, let's move onto the PR
13:10 < jnewbery> describe briefly what the PR is doing
13:11 < jnewbery> all of you... GO!
13:11 < pinheadmz> ha!
13:12 < pinheadmz> adds a new arg to ProcessNewBlock, a CValidationState
13:12 < pinheadmz> so every time you call PNB, you have to init a state first, pass it in, thenafter PNB returns, you can examine it for isValid()
13:12 < jnewbery> pinheadmz: right, and what's that CValidationState argument used for
13:12 < jnewbery> pinheadmz: yes
13:12 < pinheadmz> seems to just return validation errors and state
13:12 < pinheadmz> i was hoping the util tests would have a bad block in there and i could see a "bad" response
13:13 < pinheadmz> but I think all the tests return boring valid states
13:13 < zenogais> Validation state is used for DoS checks
13:13 < jnewbery> zenogais: correct. And where do we do those DoS checks?
13:13 < jnewbery> Which function handles invalid blocks?
13:14 < pinheadmz> is that checkblock and contextualcheckblock ?
13:14 < zenogais> InvalidBlockFound
13:14 < zenogais> and some others, let me check
13:15 < jnewbery> pinheadmz: I'm thinking more about on the net processing layer. Which function sees that we've received an invalid block and then handles whether to punish peers?
13:15 < zenogais> Ah and BlockChecked
13:15 < pinheadmz> BlockChecked()
13:15 < jnewbery> exactly
13:15 < pinheadmz> funny to name a function past-tense but ok i get it
13:16 < jnewbery> pinheadmz: the function is an implementation of that CValidationInterface method
13:16 < zenogais> which calls MaybePunishNode with the state given
13:16 < jnewbery> so that method is telling subscribers that 'a block has been checked'
13:17 < pinheadmz> 👍
13:17 < jnewbery> You can see here (in master) where the function is defined: https://github.com/bitcoin/bitcoin/blob/c34b88620dc8435b83e6744895f2ecd3c9ec8de7/src/net_processing.cpp#L1231
13:18 < jnewbery> It's a member function of the PeerLogicValidation class
13:19 < jnewbery> and you can see where that class is declared that it's inheriting from CValidationInterface: https://github.com/bitcoin/bitcoin/blob/c34b88620dc8435b83e6744895f2ecd3c9ec8de7/src/net_processing.h#L22
13:19 < jnewbery> so prior to this PR, we had net processing calling into validation (using ProcessNewBlock()), which then called directly into net processing (using the BlockChecked() validation interface method, which is a direct function call)
13:20 < jnewbery> so net processing -> validation -> net processing
13:20 < jnewbery> does that make sense? any questions about that?
13:20 < zenogais> Yep makes sense
13:20 < pinheadmz> cool.
13:20 < zenogais> Looks like it's registered here in init.cpp: https://github.com/bitcoin/bitcoin/blob/master/src/init.cpp#L1323
13:21 < jnewbery> zenogais: exactly. Any client that wants to register with the validation interface needs to call that RegisterValidationInterface() function
13:22 < jnewbery> and how does this PR change that?
13:22 < jnewbery> (the net processing -> validation -> net processing stack)
13:23 < zenogais> Adds CConnman* as an argument
13:23 < pinheadmz> oh yeah i was a bit confused - what would be a 'client' in this sense?
13:23 < pinheadmz> are we actually talking about a process outside bitcoind? like ZMQ notificaitons ?
13:24 < jnewbery> pinheadmz: no, the clients are the components or classes that register with the validation interface. They're all internal to bitcoind
13:24 < ariard> like the index or wallet
13:24 < jnewbery> ariard: exactly
13:25 < jnewbery> other clients are the ZMQ component, which receives notifcations over the validation interface and then sends ZMQ notifications out
13:25 < ariard> and mining stack IIRC
13:25 < jnewbery> and PeerLogicValidation, which is net processing
13:26 < jnewbery> ariard: right. That uses the validation interface to check that a submitted block was valid
13:26 < zenogais> Adds an argument to ProcessNewBlock as well to bubble up the CValidationState
13:26 < jnewbery> I think that's all of them. Just look for classes which inherit CValidationInterface
13:26 < jnewbery> and then call RegisterValidationInterface()
13:27 < jnewbery> zenogais: right, so ProcessNewBlock is no longer calling BlockChecked directly
13:27 < jnewbery> which means validation is not making a direct functional call into net processing
13:28 < jnewbery> why is that important/interesting here?
13:28 < zenogais> Breaks the dependency there
13:29 < pinheadmz> The PR emphasizes pushing validation into a backgroun thread there as well
13:29 < pinheadmz> does it allow the process to be async then?
13:29 < amiti> easier to make async if its just a message being passed from net processing -> validation ?
13:29 < jnewbery> great! Yes, that's the motivation for this PR
13:30 < jnewbery> PR 16175 (WIP, currently closed) is the PR to make ProcessNewBlock async: https://github.com/bitcoin/bitcoin/pull/16175
13:31 < jnewbery> really, I think we want net processing to just be passing blocks/transactions up to validation, and being informed asynchronously of any validation events
13:31 < jnewbery> validation shouldn't be calling functions directly in net processing
13:32 < jnewbery> any thoughts about that? Shall we move onto the questions: https://bitcoincore.reviews/16279.html#questions
13:32 < amiti> can you hash that out a bit further?
13:32 < amiti> why not?
13:32 < amiti> it mostly makes sense to me, but not entirely
13:32 < zenogais> My guess would be cleaner separation of concerns, and ability to run more work in parallel.
13:33 < jnewbery> zenogais: yeah, that's it
13:34 < jnewbery> if we want net processing and validation to run in parallel, we want net processing to be simply handing messages to validation and not have net processing and validation calling into each other
13:35 < ariard> we would be able to check validity of blocks in parallel
13:35 < amiti> ok ya. thanks
13:35 < zenogais> Also just cleaner to understand if its strict message passing rather than nested levels of function calls between components.
13:36 < jonatack> Yes, while avoiding reaching into internals across layers.
13:36 < jnewbery> Perhaps we can look at PR 16175 in a future review club meeting if people are interested
13:36 < jnewbery> to see where these changes are heading
13:36 < ariard> have a look on this https://bitcoincore.org/en/meetings/2018/05/03/
13:37 < ariard> about #12934, the OG refactoring PR
13:37 < jnewbery> ariard: thanks! Nice find
13:37 < jnewbery> ok, onto questions
13:37 < jnewbery> What steps did you take, beyond reading the code?
13:38 < jnewbery> Did you test this? If so, how?
13:38 < pinheadmz> ran the tests, tried adding extra logging with printf to see what happens to dos_state
13:38 < zenogais> I ran all unit and functional tests and reviewed the existing DoS tests to look for blind spots.
13:38 < jnewbery> pinheadmz: zenogais: that's great!
13:38 < zenogais> It seems like most of what is easy to test is tested already
13:38 < jnewbery> did you find anything interesting?
13:39 < pinheadmz> not really, the validation_block_tests dont seem to test bad blocks
13:40 < pinheadmz> probably could have been better for me to log in CValidationState and then run the python tests
13:40 < jnewbery> interesting. I don't think I've looked at that file. A lot of the block failure modes are tested in the feature_blocks.py functional test
13:40 < pinheadmz> yeah, but the PR didn't touch those :-)
13:41 < jnewbery> You all should go back and ACK https://bitcoincore.reviews/16688.html so we have better logging in the CValidationInterface :)
13:41 < zenogais> One interesting thing I found was `CheckForkWarningConditions`. Seems untested, but interesting system to have in place.
13:41 < jnewbery> ok, next question: Did anything about the way new blocks are checked and accepted surprise you? Do you have any questions about the series of steps to add a block to the block chain?
13:42 < pinheadmz> (heh - the link on that 16688.html page goes to the wrong PR...)
13:43 < jnewbery> pinheadmz: oops. Thanks - I'll fix after this meeting
13:43 < ariard> have a look on BlockStatus in src/chain.h
13:44 < ariard> we have different state according the validation work which has been done on a given Block
13:44 < zenogais> ah yeah, the BlockStatus stuff was interesting
13:44 < zenogais> still don't fully understandit
13:45 < jnewbery> Yeah, I went digging into that because of this change: https://github.com/bitcoin/bitcoin/pull/16279/commits/c16e139246f215965bb572da1a11382b4f61d957
13:45 < jnewbery> (which Matt moved into its own commit today)
13:47 < jnewbery> BlockStatus is kind of weird. It seems to be partly treated as an enum and partly as a bitfield
13:48 < ariard> jnewbery: where it's treated as a bitfield?
13:48 < jnewbery> everything above the bottom three bits are treated as a bitfield, no?
13:48 < jnewbery>     BLOCK_HAVE_DATA          =    8, //!< full block available in blk*.dat
13:48 < jnewbery>     BLOCK_HAVE_UNDO          =   16, //!< undo data available in rev*.dat
13:49 < jnewbery> etc
13:49 < zenogais> The bottom three bits seem to be treated as a bitfield
13:49 < zenogais> BLOCK_VALID_MASK ors some of those bits together AFAICT
13:49 < ariard> Oh I see what you mean
13:49 < zenogais> it's odd to or BLOCK_VALID_TRANSACTIONS w/ BLOCK_VALID_RESERVED and BLOCK_VALID_TREE since some of them flip the same bits on
13:50 < jnewbery> the bottom three bits are used to store the validity
13:50 < jnewbery> the fourth bit is whether the block is available in a blk*.dat file
13:51 < ariard> well a OR of enum is still an enum
13:51 < jnewbery> BLOCK_VALID_MASK could just be defined as 7 (or 0b11100000...)
13:51 < jnewbery> ok, we've got 10 minutes left. Let's keep moving through the questions
13:51 < jnewbery> The author describes this PR as “a first step towards a series of cleanups leading to a (more-complete) #16175. It’s just two rather-nice (IMO) cleanups.” Do you agree? Is this just a cleanup PR or are there behaviour changes?
13:52 < ajonas> you definitely don't agree in https://github.com/bitcoin/bitcoin/pull/16279/files#r333115791
13:52 < jnewbery> ajonas: :)
13:52 < ariard> I agree there is a confusion between validity of block and its availability as data in BlockStatus
13:52 < zenogais> Hard to draw the line between cleanup and behavior change. I think some of the stuff pointed out with BlockStatus by jnewbery leads me to believe there's a behavior change.
13:53 < jnewbery> yeah, I found that change very difficult to review.
13:54 < jnewbery> before I put my ACK on a PR I want to understand exactly how behaviour might change, especially in net processing and validation where very tiny changes can have surprising and dangerous effects
13:55 < jnewbery> changing from `if (pindex->IsValid(BLOCK_VALID_TRANSACTIONS))` to `if (dos_state.IsValid())` seems reasonable, but it sent me off in lots of different directions trying to work out when BlockStatus is changed, what MarkBlockAsReceived() is doing exactly, etc
13:56 < zenogais> Yeah, I found I went down some pretty interesting rabbit holes there
13:56 < jnewbery> ok, a couple of minutes left. I had two other questions:
13:56 < jnewbery> Are there any other CValidationInterface clients that use the BlockChecked callback?
13:56 < jnewbery> The other place that the BlockChecked method is called is in ConnectTip() in ActivateBestChain(). Why does that need to be a callback? Why does net processing need to keep a mapBlockSource map from the block to the peer that provided it?
13:57 < jnewbery> I'm happy to extend this for a few extra minutes if everyone else is
13:57 < zenogais> submitblock_StateCatcher
13:57 < jnewbery> zenogais: nice. What's that used for?
13:58 < zenogais> I'm not exactly sure lol
13:58 < zenogais> It's in mining.cpp so clearly related to that
13:58 < zenogais> But hadn't had a chance to explore it
13:58 < jnewbery> right. It temporarily registers with the validation interface
13:58 < jnewbery> and then the block is submitted
13:59 < jnewbery> and it catches whether a BlockChecked notification was sent
13:59 < ariard> You need to keep a mapBlockSource to punish peer in case of invalidity ?
13:59 < jnewbery> ariard: yes!
13:59 < jnewbery> because we don't necessarily know whether a block was invalid until we try to connect it to the chain
13:59 < ariard> exactly
14:00 < jnewbery> and so at that point, if it is invalid (eg it contains a double-spend or invalid tx), we want to disconnect from the peer that sent it to us
14:00 < jnewbery> so mapBlockSource keeps a map from the block hash to the peer that sent it to us
14:01 < jnewbery> ok, we've overrun a bit, but does anyone have any final questions before we wrap it up?
14:02 < amiti> yes! I'm confused by this comment: https://github.com/bitcoin/bitcoin/pull/16279/files#diff-349fbb003d5ae550a2e8fa658e475880R213
14:02 < amiti> if fForceProcessing is set, then you force going through ProcessNewBlock again right
14:03 < amiti> cause it passes that value through to AcceptBlock
14:03 < jnewbery> yes, and I think it forces you to save to disk
14:04 < amiti> so, reasons you would go through ProcessNewBlock are 1. fNewBlock is true 2. you force it with fForceProcessing 3. pruning enabled & trying to re-download a block that was pruned
14:05 < jnewbery> fNewBlock I think is a return argument
14:05 < amiti> oh right
14:05 < amiti> ok so if you force it or pruned the block, you could repeat
14:06 < amiti> but otherwise you wouldnt ever repeat with the same hash
14:06 < zenogais> Have to head out. Thanks for hosting jnewberry!
14:07 < jnewbery> I think the comment is trying to say - we don't need to call ProcessNewBlock again if we receive a block with the same hash, as long as the state returned was valid and (it was fForceProcessing or a new block)
14:08 < jnewbery> the state.IsValid() means that the block hasn't been mutated - ie that the PoW is valid
14:08 < amiti> ok
14:09 < amiti> I think I get it
14:09 < jnewbery> it's a confusing comment!
14:09 < amiti> yeah and I was trying to fix it, but didn't actually understand what was trying to be said
14:09 < pinheadmz> ive been down this road before - bc a legacy node could send a block w no witness data, but itd have the same hash
14:10 < jnewbery> we're about 10 minutes over. Let's close it there.
14:10 < amiti> thank you :)
14:10 < jnewbery> Thanks everyone. Great discussion this week!
14:10 < pinheadmz> yeah thanks JN!!!!
14:11 < ariard> thanks
14:11 < sebastianvstaa> thanks
14:11 < ajonas> thanks John