Parallelize CheckInputs() in AcceptToMemoryPool() (mempool)

Jul 17, 2019

https://github.com/bitcoin/bitcoin/pull/15169

Host: jachiang - PR author: sdaftuar

Notes

The CheckInputs() function is called when checking the inputs to a transaction. Each input has a script which is verified. That verification can be run:
- sequentially and synchronously, in which case CheckInputs() will return true if all scripts pass verification or pass back the error for the first script that failed verification; or
- in parallel, in which case CheckInputs() will emplace the script check onto a queue for future verification.
Previously, CheckInputs() would always check inputs sequentially when accepting a transaction to the mempool. Transaction inputs can be checked in parallel during block validation. Note the vChecks argument when calling CheckInputs() from ConnectBlock().
This PR enables the input checks to be performed in parallel when a transaction is entering the mempool.
It does so by replacing calls to CheckInputs() with RunCheckInputsMaybeParallel(), which will push the input checks to an existing CCheckQueue worker queue.
- CheckInputs
- CCheckQueue (Currently used for block validation)
- Script check worker loop (Currently used for block validation)
There is a significant performance gain resulting from this optimization.
This PR also changes behavior when transactions are denied mempool acceptance. RunCheckInputsMaybeParallel (and by extension CheckInputs()) no longer set TX_NOT_STANDARD during input check failure, but only a consensus failure state.
Note: This has an effect on peer connections.
- Current behaviour: A peer is disconnected when a consensus-invalid transaction is received, but remains connected if the transaction is only policy-invalid (non-standard).
- Preposed behaviour: A peer is no longer disconnected, even if a consensus-invalid transaction is received.

Questions

When does RunCheckInputsMaybeParallel() still potentially call CheckInputs()?
How are the input check jobs dispatched to the thread pool? How are the check results returned after execution?
Does RunCheckInputsMaybeParallel() set the same “invalid reason” on input check failure as CheckInputs()?
If not, why is this not possible?
How does this change affect RPC methods such as testmempoolaccept?
What effect does the modified input check failure behaviour have on peer connections?
What was the reason to modify the behavior in AcceptToMempoolWorker() when input checks fail? Should this be separate PR?

Meeting Log

12:57

<digi_james> Hello everyone :)

12:58

<jonatack> Hi James :)

12:59

<digi_james> jonatack: :)

12:59

<jnewbery> hi!

13:01

<sosthene> hi

13:01

<jnewbery> Hi folks. This week, digi_james has kindly offered to host the meeting. He wrote all the notes and questions at https://bitcoin-core-review-club.github.io/15169.html. Thanks James!

13:01

<jonas_> hi

13:01

<ariard> hi

13:01

<jnewbery> A reminder again that I'm always looking for suggestions for PRs to cover and volunteers to host. Please comment here: https://github.com/bitcoin-core-review-club/bitcoin-core-review-club.github.io/issues/14 or DM me for either of those.

13:01

<fjahr> hi

13:01

<elichai2> Hi

13:02

<jnewbery> It's a bit of work to prepare to host, but I think it's really worth it. You'll get a lot out of really digging into the PR and understanding it well enough to talk about. We already have lightlike and elichai2 lined up to host future meetings.

13:02

<sosthene> wow, that's great

13:02

<jnewbery> The PR this week is really interesting. Even though the number of lines touched is fairly small, it touches on some very important concepts: peer-to-peer, DoS protection and caching. I hadn't reviewed it before James suggested it for PR club, so thanks for pointing me at it!

13:02

<jnewbery> Ok, over to James.

13:03

<digi_james> Thanks John. This was a bit of a tricky PR for me to understand, but I hope you all will appreciate the nuances and perhaps even discover new ones today.

13:04

<digi_james> I suppose on the surface it would appear to be a performance optimization, verifying inputs fo new transactions being accepted to the mempool in parallel.

13:04

<digi_james> So I suggest we start with that, and then move onto the parts which change behaviour afterwards ...

13:05

<digi_james> So perhaps I'll start out by asking how the parallelisation is achieved?

13:06

<digi_james> Is it new code that is queuing the check jobs and executing these?

13:07

<pinheadmz> no I think it is using something from block validation

13:07

<digi_james> pinheadmz: exactly

13:07

<ariard> well it's now used scriptcheckqueue to parallelize script verification, scriptcheckqueue was only used by ConnectBlock before

13:07

<ariard> s/used/using/g

13:08

<pinheadmz> this applies to each tx entering the mempool 1 at a time though?

13:08

<pinheadmz> so a tx with 100 inputs can be checked in parrallel

13:08

<digi_james> pinheadmz: yes exactly, parallel input checks for a given transaction

13:09

<digi_james> Is there a difference in validation logic between the script check queue and previously?

13:10

<digi_james> so if you look at RunCheckInputsParallelMaybe

13:10

<jnewbery> Note that the CheckInputs() function is slightly confusingly named (and the comment above is incorrect in parts): https://github.com/bitcoin/bitcoin/blob/459baa1756b7f2d10d261daa0d0f5f4b91cef21f/src/validation.cpp#L1244

13:11

<jnewbery> The comment says "Check whether all inputs of this transaction are valid (no double spends, scripts & sigs, amounts)

13:11

<jnewbery> historically, that was true (it'd check amounts and doublespends). Now it only checks scripts and sigs.

13:11

<jnewbery> That's what's being done in parallel in this PR - script verification

13:12

<lf> It's not clear to me why tx input check can be parallelized: is it not the case that if I create a transaction that uses two identical inputs a parallelized function will return false-positive validity

13:12

<digi_james> Thx John I wasnt aware it didnt do contextual verification

13:13

<lf> ok I think I understand: this is just verify valid inputs, not verify valid transaction

13:14

<jnewbery> yeah, it used to call CheckTxInputs that does some checking of amounts, etc.

13:14

<lf> Verify valid inputs aka assert that script conditions being met are allowed

13:14

<jnewbery> That was removed here: https://github.com/bitcoin/bitcoin/commit/832e0744cb8b1e1625cdb19b257f97316ac16a90#diff-24efdb00bfbe56b140fb006b562cc70bL1164

13:14

<digi_james> John: If there are no contextual checks, how can it be paralellized? Inputs from a given tx double spending for example?

13:15

<jnewbery> All it's checking is the scripts. Those are completely independent from contextual information

13:16

<jnewbery> contextual checks and amount checks are done elsewhere

13:16

<lf> jnewbery so a broad assesment of the change is: "lets separate CheckTxInputs function into more indepedently functional pieces s.t. performance can increase"?

13:16

<digi_james> I see

13:16

<jnewbery> eg AcceptToMemoryPoolWorker() calls CheckTxInputs here: https://github.com/bitcoin/bitcoin/blob/459baa1756b7f2d10d261daa0d0f5f4b91cef21f/src/validation.cpp#L559

13:16

<lf> e.g. "before CheckTxInputs can complete, CheckInputs needs to finish and that can be parallelized"?

13:17

<jnewbery> lf: no, that change that I linked to is from an old PR

13:17

<lf> jnewbery ty, reviewing now

13:17

<jnewbery> this PR is simply changing AcceptToMemoryPool (ATMP) to call CheckInputs() with a checkqueue so that script verification for the different inputs happens in parallel

13:18

<hugohn> Q: is there any scenario where CCheckQueue (scriptcheckqueue) can contain more than one tx to be validated? Or is it guaranteed that the queue can only contain one pending tx at a time?

13:19

<digi_james> The ATMP function takes a TX as an argument

13:19

<digi_james> so it is the inputs of that transaction which are pushed the the CCheckQueue

13:19

<hugohn> yes, all inputs of a given tx

13:20

<digi_james> When the queue is then run, it processes all queued input checkscript jobs of a given transaction.

13:20

<hugohn> my question is is it possible to queue up vChecks for more than 1 tx into scriptcheckqueue? I assume the answer is no, but want to confirm

13:20

<digi_james> I think ATMP locks up cs_main so that would prevent other tx inputs to be queued in the meantime

13:21

<digi_james> Sorry it locks the cs.mempool

13:22

<digi_james> Ok, in the interest of time, I suggest also covering the behavioural changes this PR introduces, what are they?

13:22

<hugohn> I see. yes that makes sense. If check queue accepts more than 1 tx at a time, !control.Wait() would not make sense because the failure could be ambiguous (not sure which tx is invalid)

13:22

<digi_james> hugohn: Thats a good point

13:23

<ariard> hugohn: IMO CCheckQueue let you do that, but you have a wrapper around CCheckQueueControl to let verification showing up as atomic for callers

13:24

<jnewbery> hugohn: I believe during block validation, the inputs from all the txs are enqueued onto a single CCheckQueueControl and then dispatched, but like you and james have said, that can't happen for checking txs for mempool entry

13:24

<PaulTroon> hugohn: can a blockvalidation use the queue at the same time as a inputs from a tx is queued?

13:24

<digi_james> ariard: But can you call ATMP for multiple transactions concurrently?

13:24

<digi_james> It would seem the mempool lock acquisition prevents this

13:24

<hugohn> ariard: isn't CCheckQueueControl only to ensure all vChecks in the queue must be finished? it doesn't protect you against 2 concurrent txs in the queue

13:25

<jnewbery> ATMP holds the main lock, so it can only be running in one thread

13:26

<digi_james> jnewbery: can CCheckQueueControl allow two callers to dispatch separate job sets, returning the completion of each separately to each caller as ariard suggested?

13:26

<ariard> digi_james: you can call ATMP for multiple transactions concurrently but given there is only one set of ScriptCheck threads, CCheckQueueControl constructor should be pass the scriptcheckqueue to ensure there isn't conflict

13:27

<jonatack> ariard: bypass?

13:27

<ariard> I mean CCheckQueueControl constructor is locking the passed CCheckQueue

13:27

<ariard> jonatakc: bypass what you mean ? Using the raw scriptcheckqueue without CCheckQueueControl instanciation ?

13:27

<jonatack> ah, be passed

13:28

<ariard> yes sorry

13:29

<digi_james> Ok, I think we should move to the behavioural change regarding invalid transactions

13:29

<jnewbery> ariard: ATMP holds the main lock, so it can't be called concurrently

13:29

<jonatack> behavior changes: CheckInputs invokes scripts only once and always returns CONSENSUS for script failures. Ban behavior.

13:30

<digi_james> jonatack: Thank you!

13:30

<hugohn> jnewbery: sorry still a bit confused. I see 2 locks used in ATMP, cs_main and pool.cs. which lock prevents you against 1+ txs in the queue?

13:30

<ariard> jnewbery: ah thanks but so why you need to lock scriptcheckqueue in this case ?

13:30

<hugohn> cs_main right?

13:30

<digi_james> :)

13:30

<jnewbery> ariard: not sure, perhaps just to be extra safe!

13:30

<jnewbery> let's move on to behavioural changes

13:31

<ariard> hugohn: in fact in ConnectBlock transaction are mixed, you may have multiplee txs in queue

13:31

<ariard> IMO it's ok because you want atomic validity at block granularity

13:31

<ariard> and not a transaction granularity like in ATMP

13:32

<ariard> *at

13:32

<hugohn> ariard: it does makes sense for ConnectBlock I think? because block is invalid as long as any tx in the block is invalid

13:32

<hugohn> right

13:32

<jnewbery> hugohn ariard: please, let's move on

13:32

<ariard> yes we agree on this

13:32

<digi_james> As jonatack mentioned, RunCheckInputsMaybeParallel can no longer differentiate between policy invalid and consensus invalid failures during input checks. Rather it just sets the invalid state to Consensus.

13:33

<digi_james> CheckInputs previously would run two script checks during failure. One with policy flags active, so checks for standardness, and if this failed, a second check would be run with consensus flags only.

13:33

<digi_james> Why cant we do this in the new parallelized case?

13:35

<digi_james> Ok, one way to look at is in the CheckQueue design, which was originally written to be called when blocks are connected, there only consensus verification matters, standardness applies to mempool acceptance only

13:36

<jnewbery> digi_james: I think we could call CheckInputs() twice after this PR, but it'd require some code rearrangement, and we'd lose some of the performance benefit of the PR

13:37

<digi_james> john: Interesting. So both with all inputs in vChecks?

13:37

<jnewbery> (ie call CheckInputs() with policy flags, and then a second time with consensus flags)

100

13:37

<jnewbery> right

101

13:38

<digi_james> I am thinking out loud. So this would potentially be more scriptchecks than in the single threaded case, albeit parallelized.

102

13:38

<jnewbery> CheckInputs() is a bit weird right now. We call it and say "please run the scripts with these flags". CheckInputs() runs the scripts with the flags we pass it, but if those fails, it runs the scripts _again_ with different flags

103

13:38

<michaelf_> <digi_james>: What does "when blocks are connected" mean?

104

13:38

<jnewbery> this PR removes that second script verification

105

13:38

<digi_james> michaelf_: I mean when a block is connected to the tip of the chain.

106

13:39

<michaelf_> Ah ok thanks

107

13:39

<lf> chain tip update*?

108

13:39

<jnewbery> but it could also add back a second call to CheckInputs() with the consensus flags

109

13:39

<jnewbery> The reason Suhas didn't do that is contained in this comment: https://github.com/bitcoin/bitcoin/pull/15169#issuecomment-492232216

110

13:39

<ariard> jnewbery: or CheckInputs() may also push the script verification in pvChecks

111

13:39

<jnewbery> "The motivation for this is to mitigate adversarial behavior: in order to determine whether a transaction is invalid according to consensus rules (rather than policy) we end up validating a transaction's scripts twice, so an adversary trying a CPU DoS could construct transactions that only fail policy checks, but otherwise tie up our CPU for twice as long."

112

13:42

<digi_james> jnewbery: If we add a second CheckInputs run (in the new parallelized case) as you suggested above, how would this be resolved?

113

13:42

<jnewbery> how would what be resolved?

114

13:43

<jonatack> Unsure how it relates but Suhas also wrote: "AcceptToMemoryPoolWorker() already invokes CheckInputs multiple times, with different script flags, to detect certain kinds of segwit validation errors."

115

13:43

<digi_james> Transactions failing policy checks, but tying up CPU for a second round of checks

116

13:43

<jnewbery> the fact that a malicious peer can make us verify twice at no cost to themselves?

117

13:43

<jonatack> in the commit message of https://github.com/bitcoin/bitcoin/pull/15169/commits/5f4e514

118

13:43

<digi_james> jnewbery: exactly

119

13:43

<jnewbery> that's on open question. We probably want to deprioritize traffic from such peers

120

13:44

<jnewbery> jonatack: yes, that's right. There are 4 places that ATMP can call CheckInputs() I believe:

121

13:44

<jnewbery> first, when verifying against policy

122

13:45

<digi_james> btw I realize this PR review may be difficult to follow, it's not easy to understnad and as you can see there are remaining nuances I haven't fully processed myself. Sorry about that.

123

13:45

<jnewbery> if that fails, it calls CheckInputs twice more, to see if the failure was due to a segwit witness mutation

124

13:46

<jnewbery> if it succeeds, we call CheckInputs() again from CheckInputsFromMempoolAndCache()

125

13:46

<jnewbery> digi_james: don't apologise. It's a great PR to review!

126

13:47

<jonatack> jnewbery: thank you!

127

13:47

<digi_james> Are there any remaining questions in regards to the current PR state which we can cover?

128

13:47

<ariard> digi_james: that's fine you pick up a really nice one :)

129

13:47

<sosthene> I wonder why we check against policy before consensus rules? Wouldn't it make more sense the other way around?

130

13:47

<digi_james> policy is tighter than consensus

131

13:48

<digi_james> so consensus is a second check, with loosened rules if you will

132

13:48

<jnewbery> exactly right digi_james. The mainline case is that policy checking succeeds, in which case we don't need to check against consensus

133

13:48

<digi_james> a consensus failure will always be a policy failure, but not the other way around

134

13:48

<hugohn> nit: the (nScriptCheckThreads && !single_threaded_validation) check is used 3 times in RunCheckInputsMaybeParallel(). it seems suboptimal & makes things a bit hard to read.

135

13:49

<jonatack> digi_james: agree, great choice of PR :+1:

136

13:49

<sosthene> oh ok, I don't know why intuitively I was thinking it was the other way around (consensus tighter than policy)

137

13:49

<digi_james> jonatack: Thanks for joinin gus

138

13:49

<digi_james> *us

139

13:49

<jnewbery> (but in fact we do also check against consensus rules later (in CheckInputsFromMempoolAndCache(), but that's for slightly different and obscure reasons, which I don't think we have time to get into!)

140

13:50

<jnewbery> (the comment here: https://github.com/bitcoin/bitcoin/blob/459baa1756b7f2d10d261daa0d0f5f4b91cef21f/src/validation.cpp#L770 explains why)

141

13:51

<sosthene> digi_james: thanks it was very interesting, it probably wasn't easy to get ready for this one :)

142

13:51

<sosthene> gotta go now, goodby everyone

143

13:51

<jonatack> jnewbery: how critical is the caching issue you mention: "I believe this breaks script caching - CheckInputs() will only cache script success if running non-parallel."

144

13:51

<jnewbery> jonatack: it's pretty bad!

145

13:52

<hugohn> I thought suhas addressed that?

146

13:52

<jnewbery> script caching means that once we've accepted the tx to our mempool, we don't need to revalidate the scripts when we receive a block with that tx

147

13:53

<jonatack> (link to jnewbery comment: https://github.com/bitcoin/bitcoin/pull/15169#discussion_r304044230)

148

13:53

<ariard> jnewbery: assuming script flags don't change right ?

149

13:53

<jnewbery> here's a blog post I wrote about script caching when it was introduced: https://bitcointechtalk.com/whats-new-in-bitcoin-core-v0-15-part-5-6a9cfa85821f

150

13:54

<jnewbery> ariard: exactly correct. The cache entry includes which script flags were used in verification

151

13:55

<jnewbery> that's one of the obscure reasons that we call CheckInputsFromMempoolAndCache() later - it's so we verify the scripts according to the consensus rules of the next block

152

13:55

<jnewbery> so according to that blog post, block validation is ~40-50% faster with script caching

153

13:56

<jonatack> How does this change affect RPC methods such as testmempoolaccept?

154

13:56

<jnewbery> (or put differently, breaking script caching would ~double block validation time)

155

13:57

<ariard> Hmmm verifying the scripts according to the consensus rules of the next block, it's really obscure it's in case of softfork ?

156

13:57

<digi_james> jonatack: as of the current PR, it will return consensus failure

157

13:57

<digi_james> with script errors, but the 2nd check for policy failure has been removed.

158

13:58

<digi_james> But the rpc call adds a single_threaded_validation argument, so the inputs are checked in sequence

159

13:58

<jonatack> thanks!

160

13:58

<digi_james> ....which allows the caller to receive specific script errors, unlike the parallel case

161

13:59

<digi_james> Previously, you would get policy or consensus failures

162

13:59

<jnewbery> ariard: yes that's particularly obscure, but just the fact that we're calling CheckInputs() again seems pretty weird until you realise it's specifically for caching

163

13:59

<digi_james> That no longer the case

164

13:59

<ariard> I mean CheckInputsFromMempoolAndCache seems only an optimization in case of softfork activation, which doesn't happen often,so you add a burden for every tx getting into memppol

165

13:59

<hugohn> jnewbery: the script caching is fixed with this commit, correct? https://github.com/bitcoin/bitcoin/pull/15169/commits/39dfbc9d5e58bdecac52267336ecf532c99de9a2

166

14:00

<jnewbery> ariard: no, it's an optimization for all txs

167

14:00

<hugohn> scriptExecutionCache.insert(hash_cache_entry); is called in the running parallel case

168

14:00

<jnewbery> before CheckInputsFromMempoolAndCache we haven't validated the scripts according to consensus rules

169

14:00

<ariard> but couldn't we cache at once in the first calls of CheckInputs ?

170

14:01

<ariard> oh gotcha it's because the check against consensus rules is only in case of standardness check failure

171

14:01

<jnewbery> ariard: no, because the first time round, if the scripts succeed verification with policy rules, we don't check against consensus rules

172

14:01

<jnewbery> exactly

173

14:01

<ariard> that's weird

174

14:02

<fjahr> Before everyone leaves, question on code structure/naming: I would like to understand the reasoning for wrapping CheckInputs() in RunCheckInputsMaybeParallel(). Rather than that I would probably have put that logic into ChecksInputs() itself and then have it delegate to CheckInputsParallel() or so. I would like to understand if there are other factors at play that I am not seeing, like keeping the changes in

175

14:02

<fjahr> CheckInputs() as minimal as possible for example. Then the chosen structure would make more sense to me.

176

14:03

<digi_james> +1

177

14:03

<jnewbery> I also thought that was a bit weird, but didn't look into alternative ways of doing it

178

14:04

<digi_james> Ok, if you don't mind, Id suggest wrapping it up here.