Parallelize CheckInputs() in AcceptToMemoryPool() (mempool)
- The CheckInputs() function is called when checking the inputs to a transaction. Each input has a script which is verified. That verification can be run:
- sequentially and synchronously, in which case
trueif all scripts pass verification or pass back the error for the first script that failed verification; or
- in parallel, in which case
CheckInputs()will emplace the script check onto a queue for future verification.
- sequentially and synchronously, in which case
CheckInputs()would always check inputs sequentially when accepting a transaction to the mempool. Transaction inputs can be checked in parallel during block validation. Note the
vChecksargument when calling
- This PR enables the input checks to be performed in parallel when a transaction is entering the mempool.
- It does so by replacing calls to
RunCheckInputsMaybeParallel(), which will push the input checks to an existing
- There is a significant performance gain resulting from this optimization.
- This PR also changes behavior when transactions are denied mempool acceptance.
RunCheckInputsMaybeParallel(and by extension
CheckInputs()) no longer set TX_NOT_STANDARD during input check failure, but only a consensus failure state.
- Note: This has an effect on peer connections.
- Current behaviour: A peer is disconnected when a consensus-invalid transaction is received, but remains connected if the transaction is only policy-invalid (non-standard).
- Preposed behaviour: A peer is no longer disconnected, even if a consensus-invalid transaction is received.
- When does
RunCheckInputsMaybeParallel()still potentially call
- How are the input check jobs dispatched to the thread pool? How are the check results returned after execution?
RunCheckInputsMaybeParallel()set the same “invalid reason” on input check failure as
- If not, why is this not possible?
- How does this change affect RPC methods such as
- What effect does the modified input check failure behaviour have on peer connections?
- What was the reason to modify the behavior in AcceptToMempoolWorker() when input checks fail? Should this be separate PR?
12:57 < digi_james> Hello everyone :) 12:58 < jonatack> Hi James :) 12:59 < digi_james> jonatack: :) 12:59 < jnewbery> hi! 13:01 < sosthene> hi 13:01 < jnewbery> Hi folks. This week, digi_james has kindly offered to host the meeting. He wrote all the notes and questions at https://bitcoin-core-review-club.github.io/15169.html. Thanks James! 13:01 < jonas_> hi 13:01 < ariard> hi 13:01 < jnewbery> A reminder again that I'm always looking for suggestions for PRs to cover and volunteers to host. Please comment here: https://github.com/bitcoin-core-review-club/bitcoin-core-review-club.github.io/issues/14 or DM me for either of those. 13:01 < fjahr> hi 13:01 < elichai2> Hi 13:02 < jnewbery> It's a bit of work to prepare to host, but I think it's really worth it. You'll get a lot out of really digging into the PR and understanding it well enough to talk about. We already have lightlike and elichai2 lined up to host future meetings. 13:02 < sosthene> wow, that's great 13:02 < jnewbery> The PR this week is really interesting. Even though the number of lines touched is fairly small, it touches on some very important concepts: peer-to-peer, DoS protection and caching. I hadn't reviewed it before James suggested it for PR club, so thanks for pointing me at it! 13:02 < jnewbery> Ok, over to James. 13:03 < digi_james> Thanks John. This was a bit of a tricky PR for me to understand, but I hope you all will appreciate the nuances and perhaps even discover new ones today. 13:04 < digi_james> I suppose on the surface it would appear to be a performance optimization, verifying inputs fo new transactions being accepted to the mempool in parallel. 13:04 < digi_james> So I suggest we start with that, and then move onto the parts which change behaviour afterwards ... 13:05 < digi_james> So perhaps I'll start out by asking how the parallelisation is achieved? 13:06 < digi_james> Is it new code that is queuing the check jobs and executing these? 13:07 < pinheadmz> no I think it is using something from block validation 13:07 < digi_james> pinheadmz: exactly 13:07 < ariard> well it's now used scriptcheckqueue to parallelize script verification, scriptcheckqueue was only used by ConnectBlock before 13:07 < ariard> s/used/using/g 13:08 < pinheadmz> this applies to each tx entering the mempool 1 at a time though? 13:08 < pinheadmz> so a tx with 100 inputs can be checked in parrallel 13:08 < digi_james> pinheadmz: yes exactly, parallel input checks for a given transaction 13:09 < digi_james> Is there a difference in validation logic between the script check queue and previously? 13:10 < digi_james> so if you look at RunCheckInputsParallelMaybe 13:10 < jnewbery> Note that the CheckInputs() function is slightly confusingly named (and the comment above is incorrect in parts): https://github.com/bitcoin/bitcoin/blob/459baa1756b7f2d10d261daa0d0f5f4b91cef21f/src/validation.cpp#L1244 13:11 < jnewbery> The comment says "Check whether all inputs of this transaction are valid (no double spends, scripts & sigs, amounts) 13:11 < jnewbery> historically, that was true (it'd check amounts and doublespends). Now it only checks scripts and sigs. 13:11 < jnewbery> That's what's being done in parallel in this PR - script verification 13:12 < lf> It's not clear to me why tx input check can be parallelized: is it not the case that if I create a transaction that uses two identical inputs a parallelized function will return false-positive validity 13:12 < digi_james> Thx John I wasnt aware it didnt do contextual verification 13:13 < lf> ok I think I understand: this is just verify valid inputs, not verify valid transaction 13:14 < jnewbery> yeah, it used to call CheckTxInputs that does some checking of amounts, etc. 13:14 < lf> Verify valid inputs aka assert that script conditions being met are allowed 13:14 < jnewbery> That was removed here: https://github.com/bitcoin/bitcoin/commit/832e0744cb8b1e1625cdb19b257f97316ac16a90#diff-24efdb00bfbe56b140fb006b562cc70bL1164 13:14 < digi_james> John: If there are no contextual checks, how can it be paralellized? Inputs from a given tx double spending for example? 13:15 < jnewbery> All it's checking is the scripts. Those are completely independent from contextual information 13:16 < jnewbery> contextual checks and amount checks are done elsewhere 13:16 < lf> jnewbery so a broad assesment of the change is: "lets separate CheckTxInputs function into more indepedently functional pieces s.t. performance can increase"? 13:16 < digi_james> I see 13:16 < jnewbery> eg AcceptToMemoryPoolWorker() calls CheckTxInputs here: https://github.com/bitcoin/bitcoin/blob/459baa1756b7f2d10d261daa0d0f5f4b91cef21f/src/validation.cpp#L559 13:16 < lf> e.g. "before CheckTxInputs can complete, CheckInputs needs to finish and that can be parallelized"? 13:17 < jnewbery> lf: no, that change that I linked to is from an old PR 13:17 < lf> jnewbery ty, reviewing now 13:17 < jnewbery> this PR is simply changing AcceptToMemoryPool (ATMP) to call CheckInputs() with a checkqueue so that script verification for the different inputs happens in parallel 13:18 < hugohn> Q: is there any scenario where CCheckQueue (scriptcheckqueue) can contain more than one tx to be validated? Or is it guaranteed that the queue can only contain one pending tx at a time? 13:19 < digi_james> The ATMP function takes a TX as an argument 13:19 < digi_james> so it is the inputs of that transaction which are pushed the the CCheckQueue 13:19 < hugohn> yes, all inputs of a given tx 13:20 < digi_james> When the queue is then run, it processes all queued input checkscript jobs of a given transaction. 13:20 < hugohn> my question is is it possible to queue up vChecks for more than 1 tx into scriptcheckqueue? I assume the answer is no, but want to confirm 13:20 < digi_james> I think ATMP locks up cs_main so that would prevent other tx inputs to be queued in the meantime 13:21 < digi_james> Sorry it locks the cs.mempool 13:22 < digi_james> Ok, in the interest of time, I suggest also covering the behavioural changes this PR introduces, what are they? 13:22 < hugohn> I see. yes that makes sense. If check queue accepts more than 1 tx at a time, !control.Wait() would not make sense because the failure could be ambiguous (not sure which tx is invalid) 13:22 < digi_james> hugohn: Thats a good point 13:23 < ariard> hugohn: IMO CCheckQueue let you do that, but you have a wrapper around CCheckQueueControl to let verification showing up as atomic for callers 13:24 < jnewbery> hugohn: I believe during block validation, the inputs from all the txs are enqueued onto a single CCheckQueueControl and then dispatched, but like you and james have said, that can't happen for checking txs for mempool entry 13:24 < PaulTroon> hugohn: can a blockvalidation use the queue at the same time as a inputs from a tx is queued? 13:24 < digi_james> ariard: But can you call ATMP for multiple transactions concurrently? 13:24 < digi_james> It would seem the mempool lock acquisition prevents this 13:24 < hugohn> ariard: isn't CCheckQueueControl only to ensure all vChecks in the queue must be finished? it doesn't protect you against 2 concurrent txs in the queue 13:25 < jnewbery> ATMP holds the main lock, so it can only be running in one thread 13:26 < digi_james> jnewbery: can CCheckQueueControl allow two callers to dispatch separate job sets, returning the completion of each separately to each caller as ariard suggested? 13:26 < ariard> digi_james: you can call ATMP for multiple transactions concurrently but given there is only one set of ScriptCheck threads, CCheckQueueControl constructor should be pass the scriptcheckqueue to ensure there isn't conflict 13:27 < jonatack> ariard: bypass? 13:27 < ariard> I mean CCheckQueueControl constructor is locking the passed CCheckQueue 13:27 < ariard> jonatakc: bypass what you mean ? Using the raw scriptcheckqueue without CCheckQueueControl instanciation ? 13:27 < jonatack> ah, be passed 13:28 < ariard> yes sorry 13:29 < digi_james> Ok, I think we should move to the behavioural change regarding invalid transactions 13:29 < jnewbery> ariard: ATMP holds the main lock, so it can't be called concurrently 13:29 < jonatack> behavior changes: CheckInputs invokes scripts only once and always returns CONSENSUS for script failures. Ban behavior. 13:30 < digi_james> jonatack: Thank you! 13:30 < hugohn> jnewbery: sorry still a bit confused. I see 2 locks used in ATMP, cs_main and pool.cs. which lock prevents you against 1+ txs in the queue? 13:30 < ariard> jnewbery: ah thanks but so why you need to lock scriptcheckqueue in this case ? 13:30 < hugohn> cs_main right? 13:30 < digi_james> :) 13:30 < jnewbery> ariard: not sure, perhaps just to be extra safe! 13:30 < jnewbery> let's move on to behavioural changes 13:31 < ariard> hugohn: in fact in ConnectBlock transaction are mixed, you may have multiplee txs in queue 13:31 < ariard> IMO it's ok because you want atomic validity at block granularity 13:31 < ariard> and not a transaction granularity like in ATMP 13:32 < ariard> *at 13:32 < hugohn> ariard: it does makes sense for ConnectBlock I think? because block is invalid as long as any tx in the block is invalid 13:32 < hugohn> right 13:32 < jnewbery> hugohn ariard: please, let's move on 13:32 < ariard> yes we agree on this 13:32 < digi_james> As jonatack mentioned, RunCheckInputsMaybeParallel can no longer differentiate between policy invalid and consensus invalid failures during input checks. Rather it just sets the invalid state to Consensus. 13:33 < digi_james> CheckInputs previously would run two script checks during failure. One with policy flags active, so checks for standardness, and if this failed, a second check would be run with consensus flags only. 13:33 < digi_james> Why cant we do this in the new parallelized case? 13:35 < digi_james> Ok, one way to look at is in the CheckQueue design, which was originally written to be called when blocks are connected, there only consensus verification matters, standardness applies to mempool acceptance only 13:36 < jnewbery> digi_james: I think we could call CheckInputs() twice after this PR, but it'd require some code rearrangement, and we'd lose some of the performance benefit of the PR 13:37 < digi_james> john: Interesting. So both with all inputs in vChecks? 13:37 < jnewbery> (ie call CheckInputs() with policy flags, and then a second time with consensus flags) 13:37 < jnewbery> right 13:38 < digi_james> I am thinking out loud. So this would potentially be more scriptchecks than in the single threaded case, albeit parallelized. 13:38 < jnewbery> CheckInputs() is a bit weird right now. We call it and say "please run the scripts with these flags". CheckInputs() runs the scripts with the flags we pass it, but if those fails, it runs the scripts _again_ with different flags 13:38 < michaelf_> <digi_james>: What does "when blocks are connected" mean? 13:38 < jnewbery> this PR removes that second script verification 13:38 < digi_james> michaelf_: I mean when a block is connected to the tip of the chain. 13:39 < michaelf_> Ah ok thanks 13:39 < lf> chain tip update*? 13:39 < jnewbery> but it could also add back a second call to CheckInputs() with the consensus flags 13:39 < jnewbery> The reason Suhas didn't do that is contained in this comment: https://github.com/bitcoin/bitcoin/pull/15169#issuecomment-492232216 13:39 < ariard> jnewbery: or CheckInputs() may also push the script verification in pvChecks 13:39 < jnewbery> "The motivation for this is to mitigate adversarial behavior: in order to determine whether a transaction is invalid according to consensus rules (rather than policy) we end up validating a transaction's scripts twice, so an adversary trying a CPU DoS could construct transactions that only fail policy checks, but otherwise tie up our CPU for twice as long." 13:42 < digi_james> jnewbery: If we add a second CheckInputs run (in the new parallelized case) as you suggested above, how would this be resolved? 13:42 < jnewbery> how would what be resolved? 13:43 < jonatack> Unsure how it relates but Suhas also wrote: "AcceptToMemoryPoolWorker() already invokes CheckInputs multiple times, with different script flags, to detect certain kinds of segwit validation errors." 13:43 < digi_james> Transactions failing policy checks, but tying up CPU for a second round of checks 13:43 < jnewbery> the fact that a malicious peer can make us verify twice at no cost to themselves? 13:43 < jonatack> in the commit message of https://github.com/bitcoin/bitcoin/pull/15169/commits/5f4e514 13:43 < digi_james> jnewbery: exactly 13:43 < jnewbery> that's on open question. We probably want to deprioritize traffic from such peers 13:44 < jnewbery> jonatack: yes, that's right. There are 4 places that ATMP can call CheckInputs() I believe: 13:44 < jnewbery> first, when verifying against policy 13:45 < digi_james> btw I realize this PR review may be difficult to follow, it's not easy to understnad and as you can see there are remaining nuances I haven't fully processed myself. Sorry about that. 13:45 < jnewbery> if that fails, it calls CheckInputs twice more, to see if the failure was due to a segwit witness mutation 13:46 < jnewbery> if it succeeds, we call CheckInputs() again from CheckInputsFromMempoolAndCache() 13:46 < jnewbery> digi_james: don't apologise. It's a great PR to review! 13:47 < jonatack> jnewbery: thank you! 13:47 < digi_james> Are there any remaining questions in regards to the current PR state which we can cover? 13:47 < ariard> digi_james: that's fine you pick up a really nice one :) 13:47 < sosthene> I wonder why we check against policy before consensus rules? Wouldn't it make more sense the other way around? 13:47 < digi_james> policy is tighter than consensus 13:48 < digi_james> so consensus is a second check, with loosened rules if you will 13:48 < jnewbery> exactly right digi_james. The mainline case is that policy checking succeeds, in which case we don't need to check against consensus 13:48 < digi_james> a consensus failure will always be a policy failure, but not the other way around 13:48 < hugohn> nit: the (nScriptCheckThreads && !single_threaded_validation) check is used 3 times in RunCheckInputsMaybeParallel(). it seems suboptimal & makes things a bit hard to read. 13:49 < jonatack> digi_james: agree, great choice of PR :+1: 13:49 < sosthene> oh ok, I don't know why intuitively I was thinking it was the other way around (consensus tighter than policy) 13:49 < digi_james> jonatack: Thanks for joinin gus 13:49 < digi_james> *us 13:49 < jnewbery> (but in fact we do also check against consensus rules later (in CheckInputsFromMempoolAndCache(), but that's for slightly different and obscure reasons, which I don't think we have time to get into!) 13:50 < jnewbery> (the comment here: https://github.com/bitcoin/bitcoin/blob/459baa1756b7f2d10d261daa0d0f5f4b91cef21f/src/validation.cpp#L770 explains why) 13:51 < sosthene> digi_james: thanks it was very interesting, it probably wasn't easy to get ready for this one :) 13:51 < sosthene> gotta go now, goodby everyone 13:51 < jonatack> jnewbery: how critical is the caching issue you mention: "I believe this breaks script caching - CheckInputs() will only cache script success if running non-parallel." 13:51 < jnewbery> jonatack: it's pretty bad! 13:52 < hugohn> I thought suhas addressed that? 13:52 < jnewbery> script caching means that once we've accepted the tx to our mempool, we don't need to revalidate the scripts when we receive a block with that tx 13:53 < jonatack> (link to jnewbery comment: https://github.com/bitcoin/bitcoin/pull/15169#discussion_r304044230) 13:53 < ariard> jnewbery: assuming script flags don't change right ? 13:53 < jnewbery> here's a blog post I wrote about script caching when it was introduced: https://bitcointechtalk.com/whats-new-in-bitcoin-core-v0-15-part-5-6a9cfa85821f 13:54 < jnewbery> ariard: exactly correct. The cache entry includes which script flags were used in verification 13:55 < jnewbery> that's one of the obscure reasons that we call CheckInputsFromMempoolAndCache() later - it's so we verify the scripts according to the consensus rules of the next block 13:55 < jnewbery> so according to that blog post, block validation is ~40-50% faster with script caching 13:56 < jonatack> How does this change affect RPC methods such as testmempoolaccept? 13:56 < jnewbery> (or put differently, breaking script caching would ~double block validation time) 13:57 < ariard> Hmmm verifying the scripts according to the consensus rules of the next block, it's really obscure it's in case of softfork ? 13:57 < digi_james> jonatack: as of the current PR, it will return consensus failure 13:57 < digi_james> with script errors, but the 2nd check for policy failure has been removed. 13:58 < digi_james> But the rpc call adds a single_threaded_validation argument, so the inputs are checked in sequence 13:58 < jonatack> thanks! 13:58 < digi_james> ....which allows the caller to receive specific script errors, unlike the parallel case 13:59 < digi_james> Previously, you would get policy or consensus failures 13:59 < jnewbery> ariard: yes that's particularly obscure, but just the fact that we're calling CheckInputs() again seems pretty weird until you realise it's specifically for caching 13:59 < digi_james> That no longer the case 13:59 < ariard> I mean CheckInputsFromMempoolAndCache seems only an optimization in case of softfork activation, which doesn't happen often,so you add a burden for every tx getting into memppol 13:59 < hugohn> jnewbery: the script caching is fixed with this commit, correct? https://github.com/bitcoin/bitcoin/pull/15169/commits/39dfbc9d5e58bdecac52267336ecf532c99de9a2 14:00 < jnewbery> ariard: no, it's an optimization for all txs 14:00 < hugohn> scriptExecutionCache.insert(hash_cache_entry); is called in the running parallel case 14:00 < jnewbery> before CheckInputsFromMempoolAndCache we haven't validated the scripts according to consensus rules 14:00 < ariard> but couldn't we cache at once in the first calls of CheckInputs ? 14:01 < ariard> oh gotcha it's because the check against consensus rules is only in case of standardness check failure 14:01 < jnewbery> ariard: no, because the first time round, if the scripts succeed verification with policy rules, we don't check against consensus rules 14:01 < jnewbery> exactly 14:01 < ariard> that's weird 14:02 < fjahr> Before everyone leaves, question on code structure/naming: I would like to understand the reasoning for wrapping CheckInputs() in RunCheckInputsMaybeParallel(). Rather than that I would probably have put that logic into ChecksInputs() itself and then have it delegate to CheckInputsParallel() or so. I would like to understand if there are other factors at play that I am not seeing, like keeping the changes in 14:02 < fjahr> CheckInputs() as minimal as possible for example. Then the chosen structure would make more sense to me. 14:03 < digi_james> +1 14:03 < jnewbery> I also thought that was a bit weird, but didn't look into alternative ways of doing it 14:04 < digi_james> Ok, if you don't mind, Id suggest wrapping it up here. 14:04 < jnewbery> Thanks digi_james! 14:04 < PaulTroon> +1 14:04 < digi_james> jnewbery: Cheer! 14:04 < digi_james> *Cheers 14:04 < jnewbery> Tough PR this week. Great job uncovering the subtleties. 14:04 < hugohn> wouldn't one reason be that CheckInputs() is also called by BlockConnected(), which already does things in parallel? 14:05 < jonatack> Thanks digi_james and everyone!