Parallelize CheckInputs() in AcceptToMemoryPool() (mempool)

https://github.com/bitcoin/bitcoin/pull/15169

Host: jachiang  -  PR author: sdaftuar

Notes

  • The CheckInputs() function is called when checking the inputs to a transaction. Each input has a script which is verified. That verification can be run:
    • sequentially and synchronously, in which case CheckInputs() will return true if all scripts pass verification or pass back the error for the first script that failed verification; or
    • in parallel, in which case CheckInputs() will emplace the script check onto a queue for future verification.
  • Previously, CheckInputs() would always check inputs sequentially when accepting a transaction to the mempool. Transaction inputs can be checked in parallel during block validation. Note the vChecks argument when calling CheckInputs() from ConnectBlock().
  • This PR enables the input checks to be performed in parallel when a transaction is entering the mempool.
  • It does so by replacing calls to CheckInputs() with RunCheckInputsMaybeParallel(), which will push the input checks to an existing CCheckQueue worker queue.
  • There is a significant performance gain resulting from this optimization.
  • This PR also changes behavior when transactions are denied mempool acceptance. RunCheckInputsMaybeParallel (and by extension CheckInputs()) no longer set TX_NOT_STANDARD during input check failure, but only a consensus failure state.
  • Note: This has an effect on peer connections.
    • Current behaviour: A peer is disconnected when a consensus-invalid transaction is received, but remains connected if the transaction is only policy-invalid (non-standard).
    • Preposed behaviour: A peer is no longer disconnected, even if a consensus-invalid transaction is received.

Questions

  • When does RunCheckInputsMaybeParallel() still potentially call CheckInputs()?
  • How are the input check jobs dispatched to the thread pool? How are the check results returned after execution?
  • Does RunCheckInputsMaybeParallel() set the same “invalid reason” on input check failure as CheckInputs()?
  • If not, why is this not possible?
  • How does this change affect RPC methods such as testmempoolaccept?
  • What effect does the modified input check failure behaviour have on peer connections?
  • What was the reason to modify the behavior in AcceptToMempoolWorker() when input checks fail? Should this be separate PR?

Meeting Log

  112:57 <digi_james> Hello everyone :)
  212:58 <jonatack> Hi James :)
  312:59 <digi_james> jonatack: :)
  412:59 <jnewbery> hi!
  513:01 <sosthene> hi
  613:01 <jnewbery> Hi folks. This week, digi_james has kindly offered to host the meeting. He wrote all the notes and questions at https://bitcoin-core-review-club.github.io/15169.html. Thanks James!
  713:01 <jonas_> hi
  813:01 <ariard> hi
  913:01 <jnewbery> A reminder again that I'm always looking for suggestions for PRs to cover and volunteers to host. Please comment here: https://github.com/bitcoin-core-review-club/bitcoin-core-review-club.github.io/issues/14 or DM me for either of those.
 1013:01 <fjahr> hi
 1113:01 <elichai2> Hi
 1213:02 <jnewbery> It's a bit of work to prepare to host, but I think it's really worth it. You'll get a lot out of really digging into the PR and understanding it well enough to talk about. We already have lightlike and elichai2 lined up to host future meetings.
 1313:02 <sosthene> wow, that's great
 1413:02 <jnewbery> The PR this week is really interesting. Even though the number of lines touched is fairly small, it touches on some very important concepts: peer-to-peer, DoS protection and caching. I hadn't reviewed it before James suggested it for PR club, so thanks for pointing me at it!
 1513:02 <jnewbery> Ok, over to James.
 1613:03 <digi_james> Thanks John. This was a bit of a tricky PR for me to understand, but I hope you all will appreciate the nuances and perhaps even discover new ones today.
 1713:04 <digi_james> I suppose on the surface it would appear to be a performance optimization, verifying inputs fo new transactions being accepted to the mempool in parallel.
 1813:04 <digi_james> So I suggest we start with that, and then move onto the parts which change behaviour afterwards ...
 1913:05 <digi_james> So perhaps I'll start out by asking how the parallelisation is achieved?
 2013:06 <digi_james> Is it new code that is queuing the check jobs and executing these?
 2113:07 <pinheadmz> no I think it is using something from block validation
 2213:07 <digi_james> pinheadmz: exactly
 2313:07 <ariard> well it's now used scriptcheckqueue to parallelize script verification, scriptcheckqueue was only used by ConnectBlock before
 2413:07 <ariard> s/used/using/g
 2513:08 <pinheadmz> this applies to each tx entering the mempool 1 at a time though?
 2613:08 <pinheadmz> so a tx with 100 inputs can be checked in parrallel
 2713:08 <digi_james> pinheadmz: yes exactly, parallel input checks for a given transaction
 2813:09 <digi_james> Is there a difference in validation logic between the script check queue and previously?
 2913:10 <digi_james> so if you look at RunCheckInputsParallelMaybe
 3013:10 <jnewbery> Note that the CheckInputs() function is slightly confusingly named (and the comment above is incorrect in parts): https://github.com/bitcoin/bitcoin/blob/459baa1756b7f2d10d261daa0d0f5f4b91cef21f/src/validation.cpp#L1244
 3113:11 <jnewbery> The comment says "Check whether all inputs of this transaction are valid (no double spends, scripts & sigs, amounts)
 3213:11 <jnewbery> historically, that was true (it'd check amounts and doublespends). Now it only checks scripts and sigs.
 3313:11 <jnewbery> That's what's being done in parallel in this PR - script verification
 3413:12 <lf> It's not clear to me why tx input check can be parallelized: is it not the case that if I create a transaction that uses two identical inputs a parallelized function will return false-positive validity
 3513:12 <digi_james> Thx John I wasnt aware it didnt do contextual verification
 3613:13 <lf> ok I think I understand: this is just verify valid inputs, not verify valid transaction
 3713:14 <jnewbery> yeah, it used to call CheckTxInputs that does some checking of amounts, etc.
 3813:14 <lf> Verify valid inputs aka assert that script conditions being met are allowed
 3913:14 <jnewbery> That was removed here: https://github.com/bitcoin/bitcoin/commit/832e0744cb8b1e1625cdb19b257f97316ac16a90#diff-24efdb00bfbe56b140fb006b562cc70bL1164
 4013:14 <digi_james> John: If there are no contextual checks, how can it be paralellized? Inputs from a given tx double spending for example?
 4113:15 <jnewbery> All it's checking is the scripts. Those are completely independent from contextual information
 4213:16 <jnewbery> contextual checks and amount checks are done elsewhere
 4313:16 <lf> jnewbery so a broad assesment of the change is: "lets separate CheckTxInputs function into more indepedently functional pieces s.t. performance can increase"?
 4413:16 <digi_james> I see
 4513:16 <jnewbery> eg AcceptToMemoryPoolWorker() calls CheckTxInputs here: https://github.com/bitcoin/bitcoin/blob/459baa1756b7f2d10d261daa0d0f5f4b91cef21f/src/validation.cpp#L559
 4613:16 <lf> e.g. "before CheckTxInputs can complete, CheckInputs needs to finish and that can be parallelized"?
 4713:17 <jnewbery> lf: no, that change that I linked to is from an old PR
 4813:17 <lf> jnewbery ty, reviewing now
 4913:17 <jnewbery> this PR is simply changing AcceptToMemoryPool (ATMP) to call CheckInputs() with a checkqueue so that script verification for the different inputs happens in parallel
 5013:18 <hugohn> Q: is there any scenario where CCheckQueue (scriptcheckqueue) can contain more than one tx to be validated? Or is it guaranteed that the queue can only contain one pending tx at a time?
 5113:19 <digi_james> The ATMP function takes a TX as an argument
 5213:19 <digi_james> so it is the inputs of that transaction which are pushed the the CCheckQueue
 5313:19 <hugohn> yes, all inputs of a given tx
 5413:20 <digi_james> When the queue is then run, it processes all queued input checkscript jobs of a given transaction.
 5513:20 <hugohn> my question is is it possible to queue up vChecks for more than 1 tx into scriptcheckqueue? I assume the answer is no, but want to confirm
 5613:20 <digi_james> I think ATMP locks up cs_main so that would prevent other tx inputs to be queued in the meantime
 5713:21 <digi_james> Sorry it locks the cs.mempool
 5813:22 <digi_james> Ok, in the interest of time, I suggest also covering the behavioural changes this PR introduces, what are they?
 5913:22 <hugohn> I see. yes that makes sense. If check queue accepts more than 1 tx at a time, !control.Wait() would not make sense because the failure could be ambiguous (not sure which tx is invalid)
 6013:22 <digi_james> hugohn: Thats a good point
 6113:23 <ariard> hugohn: IMO CCheckQueue let you do that, but you have a wrapper around CCheckQueueControl to let verification showing up as atomic for callers
 6213:24 <jnewbery> hugohn: I believe during block validation, the inputs from all the txs are enqueued onto a single CCheckQueueControl and then dispatched, but like you and james have said, that can't happen for checking txs for mempool entry
 6313:24 <PaulTroon> hugohn: can a blockvalidation use the queue at the same time as a inputs from a tx is queued?
 6413:24 <digi_james> ariard: But can you call ATMP for multiple transactions concurrently?
 6513:24 <digi_james> It would seem the mempool lock acquisition prevents this
 6613:24 <hugohn> ariard: isn't CCheckQueueControl only to ensure all vChecks in the queue must be finished? it doesn't protect you against 2 concurrent txs in the queue
 6713:25 <jnewbery> ATMP holds the main lock, so it can only be running in one thread
 6813:26 <digi_james> jnewbery: can CCheckQueueControl allow two callers to dispatch separate job sets, returning the completion of each separately to each caller as ariard suggested?
 6913:26 <ariard> digi_james: you can call ATMP for multiple transactions concurrently but given there is only one set of ScriptCheck threads, CCheckQueueControl constructor should be pass the scriptcheckqueue to ensure there isn't conflict
 7013:27 <jonatack> ariard: bypass?
 7113:27 <ariard> I mean CCheckQueueControl constructor is locking the passed CCheckQueue
 7213:27 <ariard> jonatakc: bypass what you mean ? Using the raw scriptcheckqueue without CCheckQueueControl instanciation ?
 7313:27 <jonatack> ah, be passed
 7413:28 <ariard> yes sorry
 7513:29 <digi_james> Ok, I think we should move to the behavioural change regarding invalid transactions
 7613:29 <jnewbery> ariard: ATMP holds the main lock, so it can't be called concurrently
 7713:29 <jonatack> behavior changes: CheckInputs invokes scripts only once and always returns CONSENSUS for script failures. Ban behavior.
 7813:30 <digi_james> jonatack: Thank you!
 7913:30 <hugohn> jnewbery: sorry still a bit confused. I see 2 locks used in ATMP, cs_main and pool.cs. which lock prevents you against 1+ txs in the queue?
 8013:30 <ariard> jnewbery: ah thanks but so why you need to lock scriptcheckqueue in this case ?
 8113:30 <hugohn> cs_main right?
 8213:30 <digi_james> :)
 8313:30 <jnewbery> ariard: not sure, perhaps just to be extra safe!
 8413:30 <jnewbery> let's move on to behavioural changes
 8513:31 <ariard> hugohn: in fact in ConnectBlock transaction are mixed, you may have multiplee txs in queue
 8613:31 <ariard> IMO it's ok because you want atomic validity at block granularity
 8713:31 <ariard> and not a transaction granularity like in ATMP
 8813:32 <ariard> *at
 8913:32 <hugohn> ariard: it does makes sense for ConnectBlock I think? because block is invalid as long as any tx in the block is invalid
 9013:32 <hugohn> right
 9113:32 <jnewbery> hugohn ariard: please, let's move on
 9213:32 <ariard> yes we agree on this
 9313:32 <digi_james> As jonatack mentioned, RunCheckInputsMaybeParallel can no longer differentiate between policy invalid and consensus invalid failures during input checks. Rather it just sets the invalid state to Consensus.
 9413:33 <digi_james> CheckInputs previously would run two script checks during failure. One with policy flags active, so checks for standardness, and if this failed, a second check would be run with consensus flags only.
 9513:33 <digi_james> Why cant we do this in the new parallelized case?
 9613:35 <digi_james> Ok, one way to look at is in the CheckQueue design, which was originally written to be called when blocks are connected, there only consensus verification matters, standardness applies to mempool acceptance only
 9713:36 <jnewbery> digi_james: I think we could call CheckInputs() twice after this PR, but it'd require some code rearrangement, and we'd lose some of the performance benefit of the PR
 9813:37 <digi_james> john: Interesting. So both with all inputs in vChecks?
 9913:37 <jnewbery> (ie call CheckInputs() with policy flags, and then a second time with consensus flags)
10013:37 <jnewbery> right
10113:38 <digi_james> I am thinking out loud. So this would potentially be more scriptchecks than in the single threaded case, albeit parallelized.
10213:38 <jnewbery> CheckInputs() is a bit weird right now. We call it and say "please run the scripts with these flags". CheckInputs() runs the scripts with the flags we pass it, but if those fails, it runs the scripts _again_ with different flags
10313:38 <michaelf_> <digi_james>: What does "when blocks are connected" mean?
10413:38 <jnewbery> this PR removes that second script verification
10513:38 <digi_james> michaelf_: I mean when a block is connected to the tip of the chain.
10613:39 <michaelf_> Ah ok thanks
10713:39 <lf> chain tip update*?
10813:39 <jnewbery> but it could also add back a second call to CheckInputs() with the consensus flags
10913:39 <jnewbery> The reason Suhas didn't do that is contained in this comment: https://github.com/bitcoin/bitcoin/pull/15169#issuecomment-492232216
11013:39 <ariard> jnewbery: or CheckInputs() may also push the script verification in pvChecks
11113:39 <jnewbery> "The motivation for this is to mitigate adversarial behavior: in order to determine whether a transaction is invalid according to consensus rules (rather than policy) we end up validating a transaction's scripts twice, so an adversary trying a CPU DoS could construct transactions that only fail policy checks, but otherwise tie up our CPU for twice as long."
11213:42 <digi_james> jnewbery: If we add a second CheckInputs run (in the new parallelized case) as you suggested above, how would this be resolved?
11313:42 <jnewbery> how would what be resolved?
11413:43 <jonatack> Unsure how it relates but Suhas also wrote: "AcceptToMemoryPoolWorker() already invokes CheckInputs multiple times, with different script flags, to detect certain kinds of segwit validation errors."
11513:43 <digi_james> Transactions failing policy checks, but tying up CPU for a second round of checks
11613:43 <jnewbery> the fact that a malicious peer can make us verify twice at no cost to themselves?
11713:43 <jonatack> in the commit message of https://github.com/bitcoin/bitcoin/pull/15169/commits/5f4e514
11813:43 <digi_james> jnewbery: exactly
11913:43 <jnewbery> that's on open question. We probably want to deprioritize traffic from such peers
12013:44 <jnewbery> jonatack: yes, that's right. There are 4 places that ATMP can call CheckInputs() I believe:
12113:44 <jnewbery> first, when verifying against policy
12213:45 <digi_james> btw I realize this PR review may be difficult to follow, it's not easy to understnad and as you can see there are remaining nuances I haven't fully processed myself. Sorry about that.
12313:45 <jnewbery> if that fails, it calls CheckInputs twice more, to see if the failure was due to a segwit witness mutation
12413:46 <jnewbery> if it succeeds, we call CheckInputs() again from CheckInputsFromMempoolAndCache()
12513:46 <jnewbery> digi_james: don't apologise. It's a great PR to review!
12613:47 <jonatack> jnewbery: thank you!
12713:47 <digi_james> Are there any remaining questions in regards to the current PR state which we can cover?
12813:47 <ariard> digi_james: that's fine you pick up a really nice one :)
12913:47 <sosthene> I wonder why we check against policy before consensus rules? Wouldn't it make more sense the other way around?
13013:47 <digi_james> policy is tighter than consensus
13113:48 <digi_james> so consensus is a second check, with loosened rules if you will
13213:48 <jnewbery> exactly right digi_james. The mainline case is that policy checking succeeds, in which case we don't need to check against consensus
13313:48 <digi_james> a consensus failure will always be a policy failure, but not the other way around
13413:48 <hugohn> nit: the (nScriptCheckThreads && !single_threaded_validation) check is used 3 times in RunCheckInputsMaybeParallel(). it seems suboptimal & makes things a bit hard to read.
13513:49 <jonatack> digi_james: agree, great choice of PR :+1:
13613:49 <sosthene> oh ok, I don't know why intuitively I was thinking it was the other way around (consensus tighter than policy)
13713:49 <digi_james> jonatack: Thanks for joinin gus
13813:49 <digi_james> *us
13913:49 <jnewbery> (but in fact we do also check against consensus rules later (in CheckInputsFromMempoolAndCache(), but that's for slightly different and obscure reasons, which I don't think we have time to get into!)
14013:50 <jnewbery> (the comment here: https://github.com/bitcoin/bitcoin/blob/459baa1756b7f2d10d261daa0d0f5f4b91cef21f/src/validation.cpp#L770 explains why)
14113:51 <sosthene> digi_james: thanks it was very interesting, it probably wasn't easy to get ready for this one :)
14213:51 <sosthene> gotta go now, goodby everyone
14313:51 <jonatack> jnewbery: how critical is the caching issue you mention: "I believe this breaks script caching - CheckInputs() will only cache script success if running non-parallel."
14413:51 <jnewbery> jonatack: it's pretty bad!
14513:52 <hugohn> I thought suhas addressed that?
14613:52 <jnewbery> script caching means that once we've accepted the tx to our mempool, we don't need to revalidate the scripts when we receive a block with that tx
14713:53 <jonatack> (link to jnewbery comment: https://github.com/bitcoin/bitcoin/pull/15169#discussion_r304044230)
14813:53 <ariard> jnewbery: assuming script flags don't change right ?
14913:53 <jnewbery> here's a blog post I wrote about script caching when it was introduced: https://bitcointechtalk.com/whats-new-in-bitcoin-core-v0-15-part-5-6a9cfa85821f
15013:54 <jnewbery> ariard: exactly correct. The cache entry includes which script flags were used in verification
15113:55 <jnewbery> that's one of the obscure reasons that we call CheckInputsFromMempoolAndCache() later - it's so we verify the scripts according to the consensus rules of the next block
15213:55 <jnewbery> so according to that blog post, block validation is ~40-50% faster with script caching
15313:56 <jonatack> How does this change affect RPC methods such as testmempoolaccept?
15413:56 <jnewbery> (or put differently, breaking script caching would ~double block validation time)
15513:57 <ariard> Hmmm verifying the scripts according to the consensus rules of the next block, it's really obscure it's in case of softfork ?
15613:57 <digi_james> jonatack: as of the current PR, it will return consensus failure
15713:57 <digi_james> with script errors, but the 2nd check for policy failure has been removed.
15813:58 <digi_james> But the rpc call adds a single_threaded_validation argument, so the inputs are checked in sequence
15913:58 <jonatack> thanks!
16013:58 <digi_james> ....which allows the caller to receive specific script errors, unlike the parallel case
16113:59 <digi_james> Previously, you would get policy or consensus failures
16213:59 <jnewbery> ariard: yes that's particularly obscure, but just the fact that we're calling CheckInputs() again seems pretty weird until you realise it's specifically for caching
16313:59 <digi_james> That no longer the case
16413:59 <ariard> I mean CheckInputsFromMempoolAndCache seems only an optimization in case of softfork activation, which doesn't happen often,so you add a burden for every tx getting into memppol
16513:59 <hugohn> jnewbery: the script caching is fixed with this commit, correct? https://github.com/bitcoin/bitcoin/pull/15169/commits/39dfbc9d5e58bdecac52267336ecf532c99de9a2
16614:00 <jnewbery> ariard: no, it's an optimization for all txs
16714:00 <hugohn> scriptExecutionCache.insert(hash_cache_entry); is called in the running parallel case
16814:00 <jnewbery> before CheckInputsFromMempoolAndCache we haven't validated the scripts according to consensus rules
16914:00 <ariard> but couldn't we cache at once in the first calls of CheckInputs ?
17014:01 <ariard> oh gotcha it's because the check against consensus rules is only in case of standardness check failure
17114:01 <jnewbery> ariard: no, because the first time round, if the scripts succeed verification with policy rules, we don't check against consensus rules
17214:01 <jnewbery> exactly
17314:01 <ariard> that's weird
17414:02 <fjahr> Before everyone leaves, question on code structure/naming: I would like to understand the reasoning for wrapping CheckInputs() in RunCheckInputsMaybeParallel(). Rather than that I would probably have put that logic into ChecksInputs() itself and then have it delegate to CheckInputsParallel() or so. I would like to understand if there are other factors at play that I am not seeing, like keeping the changes in
17514:02 <fjahr> CheckInputs() as minimal as possible for example. Then the chosen structure would make more sense to me.
17614:03 <digi_james> +1
17714:03 <jnewbery> I also thought that was a bit weird, but didn't look into alternative ways of doing it
17814:04 <digi_james> Ok, if you don't mind, Id suggest wrapping it up here.
17914:04 <jnewbery> Thanks digi_james!
18014:04 <PaulTroon> +1
18114:04 <digi_james> jnewbery: Cheer!
18214:04 <digi_james> *Cheers
18314:04 <jnewbery> Tough PR this week. Great job uncovering the subtleties.
18414:04 <hugohn> wouldn't one reason be that CheckInputs() is also called by BlockConnected(), which already does things in parallel?
18514:05 <jonatack> Thanks digi_james and everyone!