Return the AcceptBlock CValidationState directly in ProcessNewBlock (validation)

Oct 16, 2019

https://github.com/bitcoin/bitcoin/pull/16279

Host: jnewbery - PR author: TheBlueMatt

Notes

ProcessNewBlock() is net processing’s entry point into validation, called whenever net processing receives a new block. It can be called in ProcessMessage() when a BLOCK, CMPCTBLOCK or BLOCKTXN message is received and net processing is ready to hand a block to the validation layer.
ProcessNewBlock() calls CheckBlock() to carry out non-contextual validation tests on the block. If those checks pass, then ProcessNewBlock() calls AcceptBlock(), which does more checks on the block and then stores it to disk.
If AcceptBlock() passes and the block is valid, then ActivateBestChain() is called to update the node’s view of the blockchain. If the new block has more work than the node’s current tip, then ActivateBestChain will try to connect it to the chain and update the UTXO set. This can fail, for example if a transaction in the block spends non-existent coins.
If a peer is sending us invalid blocks, then depending on the invalidity reason, we may wish to disconnect that peer.
If either CheckBlock() or AcceptBlock() fails, then the failure reason will be saved in a CValidationState object and returned to ProcessNewBlock().
Prior to this PR, ProcessNewBlock() would then call the BlockChecked method in the CValidationInterface. net processing would receive the BlockChecked callback and then potentially disconnect the peer.
This PR changes ProcessNewBlock so that if CheckBlock() or AcceptBlock() fail, then it will return the CValidationState directly to the caller, which can then decide whether to ban the peer.
This PR is a step towards PR 16175, which makes ProcessNewBlock() asynchronous, and would allow parallelization between net processing and validation.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)
What steps did you take, beyond reading the code?
Did anything about the way new blocks are checked and accepted surprise you? Do you have any questions about the series of steps to add a block to the block chain?
The author describes this PR as “a first step towards a series of cleanups leading to a (more-complete) #16175. It’s just two rather-nice (IMO) cleanups.” Do you agree? Is this just a cleanup PR or are there behaviour changes?
Are there any other CValidationInterface clients that use the BlockChecked callback?
The other place that the BlockChecked method is called is in ConnectTip() in ActivateBestChain(). Why does that need to be a callback? Why does net processing need to keep a mapBlockSource map from the block to the peer that provided it?

Meeting Log

13:00

<jnewbery> hi

13:00

<zenogais> hi

13:00

<fox2p> hey

13:00

<ariard> hi

13:00

<amiti> hi

13:01

<jnewbery> what did everyone think of this week's PR?

13:01

<sebastianvstaa> hi

13:01

<ajonas> hi

13:01

<zenogais> Was interesting, small change that interacts with a lot of crucial systems.

13:01

<jnewbery> zenogais: yes! I agree

13:02

<jnewbery> very little functional change, but there's lots to dig into

13:02

<jonatack_> hi

13:02

<zenogais> Definitely caused me to explore parts of the codebase I'd never seen before.

13:02

<amiti> yeah agreed, not a ton changed but I learned/explored a lot to get the context

13:02

<jnewbery> we've touched on the CValidationInterface in a few previous reviews, but I think this is the first time we've looked at changes in validation.cpp

13:02

<pinheadmz> hi !

13:03

<amiti> very context heavy, as many of the conversations revealed

13:03

<jnewbery> So I think the first thing to do for this one is understand what's going on with the CValidationInterface. It's declared here: https://github.com/jnewbery/bitcoin/blob/2ec121f09d8f7117fc9a8f830a7242f9a3602b78/src/validationinterface.h#L71

13:03

<jnewbery> that's where the useful comments are

13:04

<jnewbery> you can see that there are 8 methods in the interface. Most of them are asynchronous, but a couple are synchronous. Can you see which ones?

13:05

<pinheadmz> well, like RegisterBackgroundSignalScheduler takes a callback as an arg?

13:06

<pinheadmz> so id guess that is async

13:06

<jnewbery> RegisterBackgroundSignalScheduler is adding the scheduler thread. I think it's only called at startup/initialization

13:06

<jnewbery> The methods are in the CValidationInterface() class

13:07

<jnewbery> UpdatedBlockTip, TransactionAddedToMempool, etc

13:07

<jnewbery> It might be easier if you look at where those methods are defined: https://github.com/jnewbery/bitcoin/blob/2ec121f09d8f7117fc9a8f830a7242f9a3602b78/src/validationinterface.cpp#L131

13:07

<ariard> ChainStateFlushed and BlockChecked

13:08

<ariard> and NewPowValidBlock?

13:08

<fox2p> and NewPoWValidBlock?

13:08

<fox2p> yeah

13:08

<zenogais> So anything that doesn't call `AddToProcessQueue`?

13:08

<jnewbery> zenogais: correct

13:08

<ajonas> so not ChainStateFlushed then

13:09

<jnewbery> BlockChecked and NewPOWValidBlock are synchronous

13:09

<ariard> No ChainStateFlushed sorry

13:09

<jnewbery> you can see those two methods are making direct function calls. The other methods are all adding a lambda function to the queue

13:10

<jnewbery> the scheduler thread will come and service those functions in the background at some point.

13:10

<jnewbery> ok, so with that background, let's move onto the PR

13:10

<jnewbery> describe briefly what the PR is doing

13:11

<jnewbery> all of you... GO!

13:11

<pinheadmz> ha!

13:12

<pinheadmz> adds a new arg to ProcessNewBlock, a CValidationState

13:12

<pinheadmz> so every time you call PNB, you have to init a state first, pass it in, thenafter PNB returns, you can examine it for isValid()

13:12

<jnewbery> pinheadmz: right, and what's that CValidationState argument used for

13:12

<jnewbery> pinheadmz: yes

13:12

<pinheadmz> seems to just return validation errors and state

13:12

<pinheadmz> i was hoping the util tests would have a bad block in there and i could see a "bad" response

13:13

<pinheadmz> but I think all the tests return boring valid states

13:13

<zenogais> Validation state is used for DoS checks

13:13

<jnewbery> zenogais: correct. And where do we do those DoS checks?

13:13

<jnewbery> Which function handles invalid blocks?

13:14

<pinheadmz> is that checkblock and contextualcheckblock ?

13:14

<zenogais> InvalidBlockFound

13:14

<zenogais> and some others, let me check

13:15

<jnewbery> pinheadmz: I'm thinking more about on the net processing layer. Which function sees that we've received an invalid block and then handles whether to punish peers?

13:15

<zenogais> Ah and BlockChecked

13:15

<pinheadmz> BlockChecked()

13:15

<jnewbery> exactly

13:15

<pinheadmz> funny to name a function past-tense but ok i get it

13:16

<jnewbery> pinheadmz: the function is an implementation of that CValidationInterface method

13:16

<zenogais> which calls MaybePunishNode with the state given

13:16

<jnewbery> so that method is telling subscribers that 'a block has been checked'

13:17

<pinheadmz> 👍

13:17

<jnewbery> You can see here (in master) where the function is defined: https://github.com/bitcoin/bitcoin/blob/c34b88620dc8435b83e6744895f2ecd3c9ec8de7/src/net_processing.cpp#L1231

13:18

<jnewbery> It's a member function of the PeerLogicValidation class

13:19

<jnewbery> and you can see where that class is declared that it's inheriting from CValidationInterface: https://github.com/bitcoin/bitcoin/blob/c34b88620dc8435b83e6744895f2ecd3c9ec8de7/src/net_processing.h#L22

13:19

<jnewbery> so prior to this PR, we had net processing calling into validation (using ProcessNewBlock()), which then called directly into net processing (using the BlockChecked() validation interface method, which is a direct function call)

13:20

<jnewbery> so net processing -> validation -> net processing

13:20

<jnewbery> does that make sense? any questions about that?

13:20

<zenogais> Yep makes sense

13:20

<pinheadmz> cool.

13:20

<zenogais> Looks like it's registered here in init.cpp: https://github.com/bitcoin/bitcoin/blob/master/src/init.cpp#L1323

13:21

<jnewbery> zenogais: exactly. Any client that wants to register with the validation interface needs to call that RegisterValidationInterface() function

13:22

<jnewbery> and how does this PR change that?

13:22

<jnewbery> (the net processing -> validation -> net processing stack)

13:23

<zenogais> Adds CConnman* as an argument

13:23

<pinheadmz> oh yeah i was a bit confused - what would be a 'client' in this sense?

13:23

<pinheadmz> are we actually talking about a process outside bitcoind? like ZMQ notificaitons ?

13:24

<jnewbery> pinheadmz: no, the clients are the components or classes that register with the validation interface. They're all internal to bitcoind

13:24

<ariard> like the index or wallet

13:24

<jnewbery> ariard: exactly

13:25

<jnewbery> other clients are the ZMQ component, which receives notifcations over the validation interface and then sends ZMQ notifications out

13:25

<ariard> and mining stack IIRC

13:25

<jnewbery> and PeerLogicValidation, which is net processing

13:26

<jnewbery> ariard: right. That uses the validation interface to check that a submitted block was valid

13:26

<zenogais> Adds an argument to ProcessNewBlock as well to bubble up the CValidationState

13:26

<jnewbery> I think that's all of them. Just look for classes which inherit CValidationInterface

13:26

<jnewbery> and then call RegisterValidationInterface()

13:27

<jnewbery> zenogais: right, so ProcessNewBlock is no longer calling BlockChecked directly

13:27

<jnewbery> which means validation is not making a direct functional call into net processing

13:28

<jnewbery> why is that important/interesting here?

13:28

<zenogais> Breaks the dependency there

13:29

<pinheadmz> The PR emphasizes pushing validation into a backgroun thread there as well

13:29

<pinheadmz> does it allow the process to be async then?

13:29

<amiti> easier to make async if its just a message being passed from net processing -> validation ?

13:29

<jnewbery> great! Yes, that's the motivation for this PR

13:30

<jnewbery> PR 16175 (WIP, currently closed) is the PR to make ProcessNewBlock async: https://github.com/bitcoin/bitcoin/pull/16175

13:31

<jnewbery> really, I think we want net processing to just be passing blocks/transactions up to validation, and being informed asynchronously of any validation events

13:31

<jnewbery> validation shouldn't be calling functions directly in net processing

100

13:32

<jnewbery> any thoughts about that? Shall we move onto the questions: https://bitcoincore.reviews/16279.html#questions

101

13:32

<amiti> can you hash that out a bit further?

102

13:32

<amiti> why not?

103

13:32

<amiti> it mostly makes sense to me, but not entirely

104

13:32

<zenogais> My guess would be cleaner separation of concerns, and ability to run more work in parallel.

105

13:33

<jnewbery> zenogais: yeah, that's it

106

13:34

<jnewbery> if we want net processing and validation to run in parallel, we want net processing to be simply handing messages to validation and not have net processing and validation calling into each other

107

13:35

<ariard> we would be able to check validity of blocks in parallel

108

13:35

<amiti> ok ya. thanks

109

13:35

<zenogais> Also just cleaner to understand if its strict message passing rather than nested levels of function calls between components.

110

13:36

<jonatack> Yes, while avoiding reaching into internals across layers.

111

13:36

<jnewbery> Perhaps we can look at PR 16175 in a future review club meeting if people are interested

112

13:36

<jnewbery> to see where these changes are heading

113

13:36

<ariard> have a look on this https://bitcoincore.org/en/meetings/2018/05/03/

114

13:37

<ariard> about #12934, the OG refactoring PR

115

13:37

<jnewbery> ariard: thanks! Nice find

116

13:37

<jnewbery> ok, onto questions

117

13:37

<jnewbery> What steps did you take, beyond reading the code?

118

13:38

<jnewbery> Did you test this? If so, how?

119

13:38

<pinheadmz> ran the tests, tried adding extra logging with printf to see what happens to dos_state

120

13:38

<zenogais> I ran all unit and functional tests and reviewed the existing DoS tests to look for blind spots.

121

13:38

<jnewbery> pinheadmz: zenogais: that's great!

122

13:38

<zenogais> It seems like most of what is easy to test is tested already

123

13:38

<jnewbery> did you find anything interesting?

124

13:39

<pinheadmz> not really, the validation_block_tests dont seem to test bad blocks

125

13:40

<pinheadmz> probably could have been better for me to log in CValidationState and then run the python tests

126

13:40

<jnewbery> interesting. I don't think I've looked at that file. A lot of the block failure modes are tested in the feature_blocks.py functional test

127

13:40

<pinheadmz> yeah, but the PR didn't touch those :-)

128

13:41

<jnewbery> You all should go back and ACK https://bitcoincore.reviews/16688.html so we have better logging in the CValidationInterface :)

129

13:41

<zenogais> One interesting thing I found was `CheckForkWarningConditions`. Seems untested, but interesting system to have in place.

130

13:41

<jnewbery> ok, next question: Did anything about the way new blocks are checked and accepted surprise you? Do you have any questions about the series of steps to add a block to the block chain?

131

13:42

<pinheadmz> (heh - the link on that 16688.html page goes to the wrong PR...)

132

13:43

<jnewbery> pinheadmz: oops. Thanks - I'll fix after this meeting

133

13:43

<ariard> have a look on BlockStatus in src/chain.h

134

13:44

<ariard> we have different state according the validation work which has been done on a given Block

135

13:44

<zenogais> ah yeah, the BlockStatus stuff was interesting

136

13:44

<zenogais> still don't fully understandit

137

13:45

<jnewbery> Yeah, I went digging into that because of this change: https://github.com/bitcoin/bitcoin/pull/16279/commits/c16e139246f215965bb572da1a11382b4f61d957

138

13:45

<jnewbery> (which Matt moved into its own commit today)

139

13:47

<jnewbery> BlockStatus is kind of weird. It seems to be partly treated as an enum and partly as a bitfield

140

13:48

<ariard> jnewbery: where it's treated as a bitfield?

141

13:48

<jnewbery> everything above the bottom three bits are treated as a bitfield, no?

142

13:48

<jnewbery> BLOCK_HAVE_DATA = 8, //!< full block available in blk*.dat

143

13:48

<jnewbery> BLOCK_HAVE_UNDO = 16, //!< undo data available in rev*.dat

144

13:49

<jnewbery> etc

145

13:49

<zenogais> The bottom three bits seem to be treated as a bitfield

146

13:49

<zenogais> BLOCK_VALID_MASK ors some of those bits together AFAICT

147

13:49

<ariard> Oh I see what you mean

148

13:49

<zenogais> it's odd to or BLOCK_VALID_TRANSACTIONS w/ BLOCK_VALID_RESERVED and BLOCK_VALID_TREE since some of them flip the same bits on

149

13:50

<jnewbery> the bottom three bits are used to store the validity

150

13:50

<jnewbery> the fourth bit is whether the block is available in a blk*.dat file

151

13:51

<ariard> well a OR of enum is still an enum

152

13:51

<jnewbery> BLOCK_VALID_MASK could just be defined as 7 (or 0b11100000...)

153

13:51

<jnewbery> ok, we've got 10 minutes left. Let's keep moving through the questions

154

13:51

<jnewbery> The author describes this PR as “a first step towards a series of cleanups leading to a (more-complete) #16175. It’s just two rather-nice (IMO) cleanups.” Do you agree? Is this just a cleanup PR or are there behaviour changes?

155

13:52

<ajonas> you definitely don't agree in https://github.com/bitcoin/bitcoin/pull/16279/files#r333115791

156

13:52

<jnewbery> ajonas: :)

157

13:52

<ariard> I agree there is a confusion between validity of block and its availability as data in BlockStatus

158

13:52

<zenogais> Hard to draw the line between cleanup and behavior change. I think some of the stuff pointed out with BlockStatus by jnewbery leads me to believe there's a behavior change.

159

13:53

<jnewbery> yeah, I found that change very difficult to review.

160

13:54

<jnewbery> before I put my ACK on a PR I want to understand exactly how behaviour might change, especially in net processing and validation where very tiny changes can have surprising and dangerous effects

161

13:55

<jnewbery> changing from `if (pindex->IsValid(BLOCK_VALID_TRANSACTIONS))` to `if (dos_state.IsValid())` seems reasonable, but it sent me off in lots of different directions trying to work out when BlockStatus is changed, what MarkBlockAsReceived() is doing exactly, etc

162

13:56

<zenogais> Yeah, I found I went down some pretty interesting rabbit holes there

163

13:56

<jnewbery> ok, a couple of minutes left. I had two other questions:

164

13:56

<jnewbery> Are there any other CValidationInterface clients that use the BlockChecked callback?

165

13:56

<jnewbery> The other place that the BlockChecked method is called is in ConnectTip() in ActivateBestChain(). Why does that need to be a callback? Why does net processing need to keep a mapBlockSource map from the block to the peer that provided it?

166

13:57

<jnewbery> I'm happy to extend this for a few extra minutes if everyone else is

167

13:57

<zenogais> submitblock_StateCatcher

168

13:57

<jnewbery> zenogais: nice. What's that used for?

169

13:58

<zenogais> I'm not exactly sure lol

170

13:58

<zenogais> It's in mining.cpp so clearly related to that

171

13:58

<zenogais> But hadn't had a chance to explore it

172

13:58

<jnewbery> right. It temporarily registers with the validation interface

173

13:58

<jnewbery> and then the block is submitted

174

13:59

<jnewbery> and it catches whether a BlockChecked notification was sent

175

13:59

<ariard> You need to keep a mapBlockSource to punish peer in case of invalidity ?

176

13:59

<jnewbery> ariard: yes!

177

13:59

<jnewbery> because we don't necessarily know whether a block was invalid until we try to connect it to the chain

178

13:59

<ariard> exactly

179

14:00

<jnewbery> and so at that point, if it is invalid (eg it contains a double-spend or invalid tx), we want to disconnect from the peer that sent it to us

180

14:00

<jnewbery> so mapBlockSource keeps a map from the block hash to the peer that sent it to us

181

14:01

<jnewbery> ok, we've overrun a bit, but does anyone have any final questions before we wrap it up?

182

14:02

<amiti> yes! I'm confused by this comment: https://github.com/bitcoin/bitcoin/pull/16279/files#diff-349fbb003d5ae550a2e8fa658e475880R213

183

14:02

<amiti> if fForceProcessing is set, then you force going through ProcessNewBlock again right

184

14:03

<amiti> cause it passes that value through to AcceptBlock

185

14:03

<jnewbery> yes, and I think it forces you to save to disk

186

14:04

<amiti> so, reasons you would go through ProcessNewBlock are 1. fNewBlock is true 2. you force it with fForceProcessing 3. pruning enabled & trying to re-download a block that was pruned

187

14:05

<jnewbery> fNewBlock I think is a return argument

188

14:05

<amiti> oh right

189

14:05

<amiti> ok so if you force it or pruned the block, you could repeat

190

14:06

<amiti> but otherwise you wouldnt ever repeat with the same hash

191

14:06

<zenogais> Have to head out. Thanks for hosting jnewberry!

192

14:07

<jnewbery> I think the comment is trying to say - we don't need to call ProcessNewBlock again if we receive a block with the same hash, as long as the state returned was valid and (it was fForceProcessing or a new block)

193

14:08

<jnewbery> the state.IsValid() means that the block hasn't been mutated - ie that the PoW is valid

194

14:08

<amiti> ok

195

14:09

<amiti> I think I get it

196

14:09

<jnewbery> it's a confusing comment!