#15169 Parallelize CheckInputs() in AcceptToMemoryPool() (mempool)

https://github.com/bitcoin/bitcoin/15169

Host: jachiang

Notes

  • The CheckInputs() function is called when checking the inputs to a transaction. Each input has a script which is verified. That verification can be run:
    • sequentially and synchronously, in which case CheckInputs() will return true if all scripts pass verification or pass back the error for the first script that failed verification; or
    • in parallel, in which case CheckInputs() will emplace the script check onto a queue for future verification.
  • Previously, CheckInputs() would always check inputs sequentially when accepting a transaction to the mempool. Transaction inputs can be checked in parallel during block validation. Note the vChecks argument when calling CheckInputs() from ConnectBlock().
  • This PR enables the input checks to be performed in parallel when a transaction is entering the mempool.
  • It does so by replacing calls to CheckInputs() with RunCheckInputsMaybeParallel(), which will push the input checks to an existing CCheckQueue worker queue.
  • There is a significant performance gain resulting from this optimization.
  • This PR also changes behavior when transactions are denied mempool acceptance. RunCheckInputsMaybeParallel (and by extension CheckInputs()) no longer set TX_NOT_STANDARD during input check failure, but only a consensus failure state.
  • Note: This has an effect on peer connections.
    • Current behaviour: A peer is disconnected when a consensus-invalid transaction is received, but remains connected if the transaction is only policy-invalid (non-standard).
    • Preposed behaviour: A peer is no longer disconnected, even if a consensus-invalid transaction is received.

Questions

  • When does RunCheckInputsMaybeParallel() still potentially call CheckInputs()?
  • How are the input check jobs dispatched to the thread pool? How are the check results returned after execution?
  • Does RunCheckInputsMaybeParallel() set the same “invalid reason” on input check failure as CheckInputs()?
  • If not, why is this not possible?
  • How does this change affect RPC methods such as testmempoolaccept?
  • What effect does the modified input check failure behaviour have on peer connections?
  • What was the reason to modify the behavior in AcceptToMempoolWorker() when input checks fail? Should this be separate PR?

Meeting Log

12:57 < digi_james>  Hello everyone :)
12:58 < jonatack> Hi James :)
12:59 < digi_james> jonatack: :)
12:59 < jnewbery> hi!
13:01 < sosthene> hi
13:01 < jnewbery> Hi folks. This week, digi_james has kindly offered to host the meeting. He wrote all the notes and questions at https://bitcoin-core-review-club.github.io/15169.html. Thanks James!
13:01 < jonas_> hi
13:01 < ariard> hi
13:01 < jnewbery> A reminder again that I'm always looking for suggestions for PRs to cover and volunteers to host. Please comment here: https://github.com/bitcoin-core-review-club/bitcoin-core-review-club.github.io/issues/14 or DM me for either of those.
13:01 < fjahr> hi
13:01 < elichai2> Hi
13:02 < jnewbery> It's a bit of work to prepare to host, but I think it's really worth it. You'll get a lot out of really digging into the PR and understanding it well enough to talk about. We already have lightlike and elichai2 lined up to host future meetings.
13:02 < sosthene> wow, that's great
13:02 < jnewbery> The PR this week is really interesting. Even though the number of lines touched is fairly small, it touches on some very important concepts: peer-to-peer, DoS protection and caching. I hadn't reviewed it before James suggested it for PR club, so thanks for pointing me at it!
13:02 < jnewbery> Ok, over to James.
13:03 < digi_james> Thanks John. This was a bit of a tricky PR for me to understand, but I hope you all will appreciate the nuances and perhaps even discover new ones today.
13:04 < digi_james> I suppose on the surface it would appear to be a performance optimization, verifying inputs fo new transactions being accepted to the mempool in parallel.
13:04 < digi_james> So I suggest we start with that, and then move onto the parts which change behaviour afterwards ...
13:05 < digi_james> So perhaps I'll start out by asking how the parallelisation is achieved?
13:06 < digi_james> Is it new code that is queuing the check jobs and executing these?
13:07 < pinheadmz> no I think it is using something from block validation
13:07 < digi_james> pinheadmz: exactly
13:07 < ariard> well it's now used scriptcheckqueue to parallelize script verification, scriptcheckqueue was only used by ConnectBlock  before
13:07 < ariard> s/used/using/g
13:08 < pinheadmz> this applies to each tx entering the mempool 1 at a time though?
13:08 < pinheadmz> so a tx with 100 inputs can be checked in parrallel
13:08 < digi_james> pinheadmz: yes exactly, parallel input checks for a given transaction
13:09 < digi_james> Is there a difference in validation logic between the script check queue and previously?
13:10 < digi_james> so if you look at RunCheckInputsParallelMaybe
13:10 < jnewbery> Note that the CheckInputs() function is slightly confusingly named (and the comment above is incorrect in parts): https://github.com/bitcoin/bitcoin/blob/459baa1756b7f2d10d261daa0d0f5f4b91cef21f/src/validation.cpp#L1244
13:11 < jnewbery> The comment says "Check whether all inputs of this transaction are valid (no double spends, scripts & sigs, amounts)
13:11 < jnewbery> historically, that was true (it'd check amounts and doublespends). Now it only checks scripts and sigs.
13:11 < jnewbery> That's what's being done in parallel in this PR - script verification
13:12 < lf> It's not clear to me why tx input check can be parallelized: is it not the case that if I create a transaction that uses two identical inputs a parallelized function will return false-positive validity
13:12 < digi_james> Thx John I wasnt aware it didnt do contextual verification
13:13 < lf> ok I think I understand: this is just verify valid inputs, not verify valid transaction
13:14 < jnewbery> yeah, it used to call CheckTxInputs that does some checking of amounts, etc.
13:14 < lf> Verify valid inputs aka assert that script conditions being met are allowed
13:14 < jnewbery> That was removed here: https://github.com/bitcoin/bitcoin/commit/832e0744cb8b1e1625cdb19b257f97316ac16a90#diff-24efdb00bfbe56b140fb006b562cc70bL1164
13:14 < digi_james> John: If there are no contextual checks, how can it be paralellized? Inputs from a given tx double spending for example?
13:15 < jnewbery> All it's checking is the scripts. Those are completely independent from contextual information
13:16 < jnewbery> contextual checks and amount checks are done elsewhere
13:16 < lf> jnewbery so a broad assesment of the change is: "lets separate CheckTxInputs function into more indepedently functional pieces s.t. performance can increase"?
13:16 < digi_james> I see
13:16 < jnewbery> eg AcceptToMemoryPoolWorker() calls CheckTxInputs here: https://github.com/bitcoin/bitcoin/blob/459baa1756b7f2d10d261daa0d0f5f4b91cef21f/src/validation.cpp#L559
13:16 < lf> e.g. "before CheckTxInputs can complete, CheckInputs needs to finish and that can be parallelized"?
13:17 < jnewbery> lf: no, that change that I linked to is from an old PR
13:17 < lf> jnewbery ty, reviewing now
13:17 < jnewbery> this PR is simply changing AcceptToMemoryPool (ATMP) to call CheckInputs() with a checkqueue so that script verification for the different inputs happens in parallel
13:18 < hugohn> Q: is there any scenario where CCheckQueue (scriptcheckqueue) can contain more than one tx to be validated? Or is it guaranteed that the queue can only contain one pending tx at a time?
13:19 < digi_james> The ATMP function takes a TX as an argument
13:19 < digi_james> so it is the inputs of that transaction which are pushed the the CCheckQueue
13:19 < hugohn> yes, all inputs of a given tx
13:20 < digi_james> When the queue is then run, it processes all queued input checkscript jobs of a given transaction.
13:20 < hugohn> my question is is it possible to queue up vChecks for more than 1 tx into scriptcheckqueue? I assume the answer is no, but want to confirm
13:20 < digi_james> I think ATMP locks up cs_main so that would prevent other tx inputs to be queued in the meantime
13:21 < digi_james> Sorry it locks the cs.mempool
13:22 < digi_james> Ok, in the interest of time, I suggest also covering the behavioural changes this PR introduces, what are they?
13:22 < hugohn> I see. yes that makes sense. If check queue accepts more than 1 tx at a time, !control.Wait() would not make sense because the failure could be ambiguous (not sure which tx is invalid)
13:22 < digi_james> hugohn: Thats a good point
13:23 < ariard> hugohn: IMO CCheckQueue let you do that, but you have a wrapper around CCheckQueueControl to let verification showing up as atomic for callers
13:24 < jnewbery> hugohn: I believe during block validation, the inputs from all the txs are enqueued onto a single CCheckQueueControl and then dispatched, but like you and james have said, that can't happen for checking txs for mempool entry
13:24 < PaulTroon> hugohn: can a blockvalidation use the queue at the same time as a inputs from a tx is queued?
13:24 < digi_james> ariard: But can you call ATMP for multiple transactions concurrently?
13:24 < digi_james> It would seem the mempool lock acquisition prevents this
13:24 < hugohn> ariard: isn't CCheckQueueControl only to ensure all vChecks in the queue must be finished? it doesn't protect you against 2 concurrent txs in the queue
13:25 < jnewbery> ATMP holds the main lock, so it can only be running in one thread
13:26 < digi_james> jnewbery: can CCheckQueueControl allow two callers to dispatch separate job sets, returning the completion of each separately to each caller as ariard suggested?
13:26 < ariard> digi_james: you can call ATMP for multiple transactions concurrently but given there is only one set of ScriptCheck threads, CCheckQueueControl constructor should be pass the scriptcheckqueue to ensure there isn't conflict
13:27 < jonatack> ariard: bypass?
13:27 < ariard> I mean CCheckQueueControl constructor is locking the passed CCheckQueue
13:27 < ariard> jonatakc: bypass what you mean ? Using the raw scriptcheckqueue without CCheckQueueControl instanciation ?
13:27 < jonatack> ah, be passed
13:28 < ariard> yes sorry
13:29 < digi_james> Ok, I think we should move to the behavioural change regarding invalid transactions
13:29 < jnewbery> ariard: ATMP holds the main lock, so it can't be called concurrently
13:29 < jonatack> behavior changes: CheckInputs invokes scripts only once and always returns CONSENSUS for script failures. Ban behavior.
13:30 < digi_james> jonatack: Thank you!
13:30 < hugohn> jnewbery: sorry still a bit confused. I see 2 locks used in ATMP, cs_main and pool.cs. which lock prevents you against 1+ txs in the queue?
13:30 < ariard> jnewbery: ah thanks but so why you need to lock scriptcheckqueue in this case ?
13:30 < hugohn> cs_main right?
13:30 < digi_james> :)
13:30 < jnewbery> ariard: not sure, perhaps just to be extra safe!
13:30 < jnewbery> let's move on to behavioural changes
13:31 < ariard> hugohn: in fact in ConnectBlock transaction are mixed, you may have multiplee txs in queue
13:31 < ariard> IMO it's ok because you want atomic validity at block granularity
13:31 < ariard> and not a transaction granularity like in ATMP
13:32 < ariard> *at
13:32 < hugohn> ariard: it does makes sense for ConnectBlock I think? because block is invalid as long as any tx in the block is invalid
13:32 < hugohn> right
13:32 < jnewbery> hugohn ariard: please, let's move on
13:32 < ariard> yes we agree on this
13:32 < digi_james> As jonatack mentioned, RunCheckInputsMaybeParallel can no longer differentiate between policy invalid and consensus invalid failures during input checks. Rather it just sets the invalid state to Consensus.
13:33 < digi_james> CheckInputs previously would run two script checks during failure. One with policy flags active, so checks for standardness, and if this failed, a second check would be run with consensus flags only.
13:33 < digi_james> Why cant we do this in the new parallelized case?
13:35 < digi_james> Ok, one way to look at is in the CheckQueue design, which was originally written to be called when blocks are connected, there only consensus verification matters, standardness applies to mempool acceptance only
13:36 < jnewbery> digi_james: I think we could call CheckInputs() twice after this PR, but it'd require some code rearrangement, and we'd lose some of the performance benefit of the PR
13:37 < digi_james> john: Interesting. So both with all inputs in vChecks?
13:37 < jnewbery> (ie call CheckInputs() with policy flags, and then a second time with consensus flags)
13:37 < jnewbery> right
13:38 < digi_james> I am thinking out loud. So this would potentially be more scriptchecks than in the single threaded case, albeit parallelized.
13:38 < jnewbery> CheckInputs() is a bit weird right now. We call it and say "please run the scripts with these flags". CheckInputs() runs the scripts with the flags we pass it, but if those fails, it runs the scripts _again_ with different flags
13:38 < michaelf_> <digi_james>: What does "when blocks are connected" mean?
13:38 < jnewbery> this PR removes that second script verification
13:38 < digi_james> michaelf_:  I mean when a block is connected to the tip of the chain.
13:39 < michaelf_> Ah ok thanks
13:39 < lf> chain tip update*?
13:39 < jnewbery> but it could also add back a second call to CheckInputs() with the consensus flags
13:39 < jnewbery> The reason Suhas didn't do that is contained in this comment: https://github.com/bitcoin/bitcoin/pull/15169#issuecomment-492232216
13:39 < ariard> jnewbery: or CheckInputs() may also push the script verification in pvChecks
13:39 < jnewbery> "The motivation for this is to mitigate adversarial behavior: in order to determine whether a transaction is invalid according to consensus rules (rather than policy) we end up validating a transaction's scripts twice, so an adversary trying a CPU DoS could construct transactions that only fail policy checks, but otherwise tie up our CPU for twice as long."
13:42 < digi_james> jnewbery: If we add a second CheckInputs run (in the new parallelized case) as you suggested above, how would this be resolved?
13:42 < jnewbery> how would what be resolved?
13:43 < jonatack> Unsure how it relates but Suhas also wrote: "AcceptToMemoryPoolWorker() already invokes CheckInputs multiple times, with different script flags, to detect certain kinds of segwit validation errors."
13:43 < digi_james> Transactions failing policy checks, but tying up CPU for a second round of checks
13:43 < jnewbery> the fact that a malicious peer can make us verify twice at no cost to themselves?
13:43 < jonatack> in the commit message of https://github.com/bitcoin/bitcoin/pull/15169/commits/5f4e514
13:43 < digi_james> jnewbery: exactly
13:43 < jnewbery> that's on open question. We probably want to deprioritize traffic from such peers
13:44 < jnewbery> jonatack: yes, that's right. There are 4 places that ATMP can call CheckInputs() I believe:
13:44 < jnewbery> first, when verifying against policy
13:45 < digi_james> btw I realize this PR review may be difficult to follow, it's not easy to understnad and as you can see there are remaining nuances I haven't fully processed myself. Sorry about that.
13:45 < jnewbery> if that fails, it calls CheckInputs twice more, to see if the failure was due to a segwit witness mutation
13:46 < jnewbery> if it succeeds, we call CheckInputs() again from CheckInputsFromMempoolAndCache()
13:46 < jnewbery> digi_james: don't apologise. It's a great PR to review!
13:47 < jonatack> jnewbery: thank you!
13:47 < digi_james> Are there any remaining questions in regards to the current PR state which we can cover?
13:47 < ariard> digi_james: that's fine you pick up a really nice one :)
13:47 < sosthene> I wonder why we check against policy before consensus rules? Wouldn't it make more sense the other way around?
13:47 < digi_james> policy is tighter than consensus
13:48 < digi_james> so consensus is a second check, with loosened rules if you will
13:48 < jnewbery> exactly right digi_james. The mainline case is that policy checking succeeds, in which case we don't need to check against consensus
13:48 < digi_james> a consensus failure will always be a policy failure, but not the other way around
13:48 < hugohn> nit: the (nScriptCheckThreads && !single_threaded_validation) check is used 3 times in RunCheckInputsMaybeParallel(). it seems suboptimal & makes things a bit hard to read.
13:49 < jonatack> digi_james: agree, great choice of PR :+1:
13:49 < sosthene> oh ok, I don't know why intuitively I was thinking it was the other way around (consensus tighter than policy)
13:49 < digi_james> jonatack: Thanks for joinin gus
13:49 < digi_james> *us
13:49 < jnewbery> (but in fact we do also check against consensus rules later (in CheckInputsFromMempoolAndCache(), but that's for slightly different and obscure reasons, which I don't think we have time to get into!)
13:50 < jnewbery> (the comment here: https://github.com/bitcoin/bitcoin/blob/459baa1756b7f2d10d261daa0d0f5f4b91cef21f/src/validation.cpp#L770 explains why)
13:51 < sosthene> digi_james: thanks it was very interesting, it probably wasn't easy to get ready for this one :)
13:51 < sosthene> gotta go now, goodby everyone
13:51 < jonatack> jnewbery: how critical is the caching issue you mention: "I believe this breaks script caching - CheckInputs() will only cache script success if running non-parallel."
13:51 < jnewbery> jonatack: it's pretty bad!
13:52 < hugohn> I thought suhas addressed that?
13:52 < jnewbery> script caching means that once we've accepted the tx to our mempool, we don't need to revalidate the scripts when we receive a block with that tx
13:53 < jonatack> (link to jnewbery comment: https://github.com/bitcoin/bitcoin/pull/15169#discussion_r304044230)
13:53 < ariard> jnewbery: assuming script flags don't change right ?
13:53 < jnewbery> here's a blog post I wrote about script caching when it was introduced: https://bitcointechtalk.com/whats-new-in-bitcoin-core-v0-15-part-5-6a9cfa85821f
13:54 < jnewbery> ariard: exactly correct. The cache entry includes which script flags were used in verification
13:55 < jnewbery> that's one of the obscure reasons that we call CheckInputsFromMempoolAndCache() later - it's so we verify the scripts according to the consensus rules of the next block
13:55 < jnewbery> so according to that blog post, block validation is ~40-50% faster with script caching
13:56 < jonatack> How does this change affect RPC methods such as testmempoolaccept?
13:56 < jnewbery> (or put differently, breaking script caching would ~double block validation time)
13:57 < ariard> Hmmm verifying the scripts according to the consensus rules of the next block, it's really obscure it's in case of softfork ?
13:57 < digi_james> jonatack: as of the current PR, it will return consensus failure
13:57 < digi_james> with script errors, but the 2nd check for policy failure has been removed.
13:58 < digi_james> But the rpc call adds a single_threaded_validation argument, so the inputs are checked in sequence
13:58 < jonatack> thanks!
13:58 < digi_james> ....which allows the caller to receive specific script errors, unlike the parallel case
13:59 < digi_james> Previously, you would get policy or consensus failures
13:59 < jnewbery> ariard: yes that's particularly obscure, but just the fact that we're calling CheckInputs() again seems pretty weird until you realise it's specifically for caching
13:59 < digi_james> That no longer the case
13:59 < ariard> I mean CheckInputsFromMempoolAndCache seems only an optimization in case of softfork activation, which doesn't happen often,so you add a burden for every tx getting into memppol
13:59 < hugohn> jnewbery: the script caching is fixed with this commit, correct? https://github.com/bitcoin/bitcoin/pull/15169/commits/39dfbc9d5e58bdecac52267336ecf532c99de9a2
14:00 < jnewbery> ariard: no, it's an optimization for all txs
14:00 < hugohn> scriptExecutionCache.insert(hash_cache_entry); is called in the running parallel case
14:00 < jnewbery> before CheckInputsFromMempoolAndCache we haven't validated the scripts according to consensus rules
14:00 < ariard> but couldn't we cache at once in the first calls of CheckInputs ?
14:01 < ariard> oh gotcha it's because the check against consensus rules is only in case of standardness check failure
14:01 < jnewbery> ariard: no, because the first time round, if the scripts succeed verification with policy rules, we don't check against consensus rules
14:01 < jnewbery> exactly
14:01 < ariard> that's weird
14:02 < fjahr> Before everyone leaves, question on code structure/naming: I would like to understand the reasoning for wrapping CheckInputs() in RunCheckInputsMaybeParallel(). Rather than that I would probably have put that logic into ChecksInputs() itself and then have it delegate to CheckInputsParallel() or so. I would like to understand if there are other factors at play that I am not seeing, like keeping the changes in
14:02 < fjahr> CheckInputs() as minimal as possible for example. Then the chosen structure would make more sense to me.
14:03 < digi_james> +1
14:03 < jnewbery> I also thought that was a bit weird, but didn't look into alternative ways of doing it
14:04 < digi_james> Ok, if you don't mind, Id suggest wrapping it up here.
14:04 < jnewbery> Thanks digi_james!
14:04 < PaulTroon> +1
14:04 < digi_james> jnewbery: Cheer!
14:04 < digi_james> *Cheers
14:04 < jnewbery> Tough PR this week. Great job uncovering the subtleties.
14:04 < hugohn> wouldn't one reason be that CheckInputs() is also called by BlockConnected(), which already does things in parallel?
14:05 < jonatack> Thanks digi_james and everyone!