Benchmark Chainstate::ConnectBlock duration (resource usage, tests)

https://github.com/bitcoin/bitcoin/pull/31689

Host: davidgumberg  -  PR author: Eunovo

Notes

  • Bitcoin Core uses the nanobench library for a suite of “microbenchmarks” that measure the performance of individual components or functions in idealized conditions.

  • Chainstate::ConnectBlock() does double-duty: it is partly responsible for validating blocks being connected to the node’s tip, and partly responsible for applying their effects to the node’s view of the UTXO set (CCoinsViewCache).
    • One of the most “expensive” checks performed by ConnectBlock() is CheckInputScripts: which ensures that every input script of every transaction succeeds.
  • In the course of evaluating scripts, signature checks are often required, sometimes explicitly with opcodes like OP_CHECKSIG, OP_CHECKMULTISIG, and sometimes implicitly with Bitcoin output types like P2WPKH that have implicit signature checks.
    • In pre-SegWit and SegWit version 0 outputs, signatures are generated and validated using ECDSA over the secp256k1 curve. Taproot introduced the version 1 SegWit output type, which uses Schnorr signatures over the same curve. BIP-0340 describes the way signatures are generated and evaluated for taproot outputs.
      • One of the advantages of Schnorr signatures over ECDSA signatures is that they can be verified in batches. A simplified description of batch verification is that instead of needing to prove that signature $A$ is valid for input $X$, signature $B$ is valid for input $Y$, and that signature $C$ is valid for input $Z$, we can add up signatures $A$, $B$, and $C$, to produce signature $D$, and add inputs $X$, $Y$, and $Z$ to produce input $W$, and then only perform a single verification, that signature $D$ is valid for input $W$.
  • Although in principle schnorr signatures can be validated in batches, Bitcoin Core today validates them individually just like ECDSA signatures. There is a PR open, #29491, that implements Batch Validation in Bitcoin Core. The motivation for this PR is to establish a baseline for signature validation performance in Bitcoin Core today, which can then be used to validate and potentially quantify performance improvements of Batch validation.
    • #31689 introduces three ConnectBlock benchmarks, one for a block where all inputs that are spent use ECDSA signatures, one where all inputs are Schnorr signatures, and one where some are Schnorr and some are ECDSA.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. Did you run the benchmarks? What did you observe?

  3. What is TestChain100Setup? What does 100 mean? Why 100?

  4. The notes above suggest that CheckInputScripts() is “expensive”. Is it? Why?

  5. Some reviewers (and a code comment) observed that in their testing ConnectBlockMixed was the slowest of the three benchmarks. Is that possible?

  6. Some reviewers disagreed about whether or not the ‘mixed’ case should be 50/50 schnorr/ecdsa, or if it should be some mixture of Schnorr and ECDSA that would be likely to appear in a block, what are the tradeoffs of each approach?

  7. What is the purpose of the first transaction that gets created in CreateTestBlock()? Why couldn’t this transaction be created in the for loop like all the other transactions?

  8. ConnectBlock does a lot more than just checking input scripts. Is this PR introducing a ConnectBlock() benchmark or a signature validation benchmark? Why use ConnectBlock() instead of benchmarking CheckECDSASignature() and CheckSchnorrSignature() directly?

  9. Do you think the tests added here are sufficient in scope or are there other cases that should have been added in this PR? What additional benchmarks of ConnectBlock() would be good to have in a follow-up PR?

Meeting Log

  116:59 <dzxzg> #startmeeting
  216:59 <dzxzg> hi
  316:59 <glozow> hi
  417:00 <sliv3r__> hi
  517:00 <dzxzg> Hi everyone, thanks for coming to this benchmarking review club :)
  617:00 <monlovesmango> hey
  717:00 <stringintech> Hello
  817:00 <dzxzg> We're looking at #31689, and there are notes here: https://bitcoincore.reviews/31689
  917:00 <dzxzg> If you have a question or something to say, please feel free to jump in at any time
 1017:00 <stickies-v> hi
 1117:01 <enochazariah> hello everyone
 1217:01 <dzxzg> Diid you have a chance to review the PR, and/or look at the notes?
 1317:01 <dzxzg> s/you/anyone
 1417:01 <monlovesmango> yes
 1517:01 <janb84> somewhat :)
 1617:02 <sliv3r__> yes, tried to answer the questions in the notes
 1717:02 <Novo__> hi
 1817:02 <glozow> y
 1917:02 <oxfrank> y
 2017:02 <stringintech> y
 2117:03 <dzxzg> Awesome! I guess it's slightly unusual to review a PR *after* it's been merged, but I think it's still important and helpful, this code is just beginning it's life in Bitcoin Core!
 2217:04 <monlovesmango> I learned a lot about benchmarking :)
 2317:04 <dzxzg> Did you run the benchmarks? What did you observe?
 2417:04 <janb84> Yes, and my results are lot less verbose than the results shown in the comments
 2517:05 <glozow> Nothing wrong with reviewing a PR after merge! Presumably if you take a look at the batch validation PRs and use the benches to measure the performance changes, you should also know what the benches are doing :)
 2617:05 <monlovesmango> yes... I have a few comments but dont want to get ahead of the questions...
 2717:05 <sliv3r__> yes, mixed blocks a bit slower
 2817:05 <stringintech> mixed block the slowest and all schnorr fastest for me
 2917:05 <glozow> For me, in order from fastest to slowest: `ConnectBlockAllSchnorr`, `*AllEcdsa`, `*MixedEcdsaSchnorr`.
 3017:06 <dzxzg> janb84: what do you mean your results are less verbose?
 3117:06 <monlovesmango> yeah I did observe mixed was slowest consistently
 3217:06 <stickies-v> Same for me! But I got an instability warning for ConnectBlockMixedEcdsaSchnorr
 3317:06 <stickies-v> (I'm getting them for a fair amount of other benches too so it might be me)
 3417:06 <janb84> @dzxzg i'm missing the column cyc/block, IPC / , BRA/block
 3517:07 <sliv3r__> I have same results as glozow
 3617:07 <sliv3r__> I'm also missing thosecolumns @janb84
 3717:07 <oxfrank> MixedEcdsaSchnorr slower than all
 3817:07 <dzxzg> What is TestChain100Setup? What does 100 mean? Why 100?
 3917:08 <monlovesmango> sets up new chain and mines 100 blocks
 4017:08 <janb84> pre-creates a 100-block REGTEST-mode block chain
 4117:08 <monlovesmango> 100 bc thats how long it take coinbase to mature before you can spend it
 4217:08 <oxfrank> 100 mean no of blocks in test environment
 4317:09 <sliv3r__> as per a code comment: texting fixture that pre-creates 100 blocks in regtest mode to get the coinbase mature
 4417:09 <dzxzg> Yeah, all answers that make sense to me! Super useful, and appears all over the place in benchmarking and test code
 4517:10 <dzxzg> I'm going to quote something from the review notes in case anyone didn't have a chance to read them.
 4617:10 <dzxzg> "One of the most “expensive” checks performed by ConnectBlock() is CheckInputScripts: which ensures that every input script of every transaction succeeds."
 4717:10 <sipa> hi
 4817:10 <dzxzg> So, the author of the notes suggested that CheckInputScripts() is “expensive”. Is it? Why?
 4917:11 <sliv3r__> they normally contain signatures to verify and that's an expensive task
 5017:11 <monlovesmango> bc input scripts generally need signarture verification
 5117:12 <oxfrank> CheckInputScripts() most computationally intensive part of validation because of cryptographic signatures
 5217:13 <stickies-v> The CheckInputScripts docstring mentions "This involves ECDSA signature checks so can be computationally intensive." does it not do Schnorr signature checks or did the docstring just not get updated?
 5317:13 <sipa> That sounds outdated.
 5417:14 <dzxzg> +1
 5517:14 <sipa> All signature checks are done through the script interpreter, which is invoked from CheckInputScripts.
 5617:15 <dzxzg> Some reviewers (and a code comment) observed that in their testing ConnectBlockMixed was the slowest of the three benchmarks. Is that possible?
 5717:16 <dzxzg> I also noticed that there was a recent comment from someone present here on the PR
 5817:16 <janb84> I made the same observation in my test runs.
 5917:16 <dzxzg> That shed some new light
 6017:16 <sliv3r__> Yes! Because the two different types are used in the same transaction so they have to be hashed multiple times due to differences in the signature digest algorithm
 6117:17 <monlovesmango> I ahve a question about this one. when I run benchmarks as is I do see that mixed is slowest, but it also has 5 keys/outputs rather than 4 like the other 2. could that be why?
 6217:17 <monlovesmango> when I test all 3 with 5 keys/outputs they are all fairly similar
 6317:18 <janb84> @monlovesmango interesting !
 6417:18 <monlovesmango> but I was not able to run the 'pyperf system tune' bc it errored on my machine
 6517:18 <monlovesmango> so i was using the min-time arg to get consistent results
 6617:19 <sliv3r__> May I ask how did you realize that?
 6717:19 <monlovesmango> not sure if that is sufficient
 6817:20 <oxfrank> but why did they go with 5 keys/outputs in mixed?
 6917:20 <monlovesmango> bc in the code ConnectBlockAllSchnorr creates 4 schnorr keys/outputs, ConnectBlockAllEcdsa creats 4 ecdsa keys/outputs, and ConnectBlockMixedEcdsaSchnorr creates 1 schnorr and 4 ecdsa
 7017:20 <dzxzg> (https://github.com/bitcoin/bitcoin/blob/639279e86a6edd6cb68e8cf077d14337bcd13959/src/bench/connectblock.cpp#L110-L132)
 7117:20 <monlovesmango> oxfrank: bc they wanted 80/20 ratio
 7217:20 <monlovesmango> i'm assuming
 7317:21 <dzxzg> Yeah, but I think not bumping the others up to 5 was a mistake!
 7417:21 <sliv3r__> oh right is hardcoded
 7517:21 <dzxzg> I was able to reproduce the result monlovesmango had
 7617:21 <dzxzg> when I changed the number of inputs in all of the tests so that they all had 5 inputs, the mixed block didn't stand out any more as the slowest!
 7717:22 <sliv3r__> so the assumtiom that some users had about having to hash multiple times bc of the signature digest algorithm is wrong?
 7817:22 <monlovesmango> dzxzg: nice!
 7917:23 <oxfrank> monlovesmango I think so too
 8017:24 <dzxzg> sliv3r__: I'm not sure, I thought that explanation made sense when I wrote the notes, but it seems that at the very least even if it wasn't wrong about extra work needed for validating transactions with mixed inputs, but it seems to have been wrong about how significant that would be!
 8117:26 <dzxzg> Nice find monlovesmango, I think a PR to address this would be nice! Another feather in the cap of never trusting explanations for poor performance until you've measured them :)
 8217:26 <sipa> i haven't run the numbers, but i'm curious how the block verification times compare with the raw pubkey decompression + signature checking numbers
 8317:27 <monlovesmango> dzxzg: do you think it would be alright if I opened a pr for this?
 8417:28 <sliv3r__> manlovesmango: it make sense to fix
 8517:29 <dzxzg> monlovesmango: I don't know how other reviewers would feel, but it's probably a Concept ACK for me
 8617:29 <dzxzg> In the same vein as sipa's remark above: ConnectBlock does a lot more than just checking input scripts. Is this PR introducing a ConnectBlock() benchmark or a signature validation benchmark? Why use ConnectBlock() instead of benchmarking CheckECDSASignature() and CheckSchnorrSignature() directly?
 8717:31 <sipa> FWIW, the numbers i have on my system are 31.0 us/sig for ecdsa, 31.8 us/sig for schnorr, and ~3.1 us/key for pubkey decompression
 8817:31 <monlovesmango> it seemed like one goal was to assess performance with a mixed back of sig types, which can't be done with CheckECDSASignature() or CheckSchnorrSignature() alone
 8917:31 <monlovesmango> that was the best reason I could think of
 9017:32 <sliv3r__> also this will allow us to benchmark batch verification when it's implemented
 9117:32 <sliv3r__> so I guess we should see an improvement on schnorr
 9217:33 <sipa> yeah, the PR is a preparation for batch validation, which is applicable to schnorr signatures, but not ECDSA, so to get a realistic benchmark, it may make sense to see how it impacts a block with a mix of both (which, for the time being, is likely what we'll need to expect)
 9317:33 <sipa> i assume
 9417:35 <Novo__> batch verification implementation will modify connectblock a lot, so we also want to see if that our changes don't negatively impact overall conectblock performance even if it speeds up CheckSchnorrSignature
 9517:35 <sipa> Novo__: good point
 9617:36 <dzxzg> Some reviewers disagreed about whether or not the ‘mixed’ case should be 50/50 schnorr/ecdsa, or if it should be some mixture of Schnorr and ECDSA that would be likely to appear in a block, what are the tradeoffs of each approach?
 9717:37 <dzxzg> And to add to that question, or maybe this would be part of the tradeoffs, how would we decide what a "representative" mixture would be?
 9817:37 <sliv3r__> I don't have a strong opinion on that tbh but some argue that 80/20 is the actual ratio now while 50/50 is probably what we will have in a future
 9917:38 <monlovesmango> yeah what sliv3r said :)
10017:38 <janb84> sliv3r__: agree
10117:39 <dzxzg> What is the purpose of the first transaction that gets created in CreateTestBlock()? Why couldn’t this transaction be created in the for loop like all the other transactions?
10217:40 <monlovesmango> honestly might be good to have a few tiers, 80/20, 50/50, 20/80, just so we have a variety of benchmarks to compare changes against? or would this be redundant
10317:41 <monlovesmango> the first transaction is spending the coinbase and setting up the outputs that will used for the bench mark. so this tx is different than the others, and this way the benchmark is only measuring the specific sig checks we are interested in (bc first tx is excluded from bench)
10417:42 <sliv3r__> I guess if it was redundant there would not be discussion about the ratio so I guess it makes sense
10517:42 <dzxzg> monlovesmango: re: coinbase transaction, yep!
10617:43 <sliv3r__> agree with @monlovesmango also because testchain100setup is the one who decides the conditions on that tx
10717:43 <sliv3r__> so we don't have control on it
10817:43 <monlovesmango> yep that too
10917:46 <dzxzg> The "What ratio should we use question?" makes me think of a bigger question, when should your measurement try to as closely as possible approximate the real situation of interest, like in this case maybe, real nodes connecting blocks to their tips, and when should you try to create idealized conditions that might exaggerate, or be focused on some tiny element which rarely constitutes much of the real task
11017:47 <dzxzg> but you get the advantage of interpretability, when you exaggerate one element, it's really easy to interpret the outcome of a benchmark, if it's faster it's probably that thing, if it's slower it's probably that thing
11117:47 <monlovesmango> as far as benchmarking goes, I feel like that question should come after all the data is in
11217:48 <dzxzg> Okay, final question: Do you think the tests added here are sufficient in scope or are there other cases that should have been added in this PR? What additional benchmarks of ConnectBlock() would be good to have in a follow-up PR?
11317:49 <monlovesmango> like we should want to know each scenario performs, and then make decisions about whether real use or idealized use should be given more significance
11417:50 <monlovesmango> I think in the pr josie had mentioned testing mixed block composition (so instead of mixed transactions, each transaction would only have one type of signature but the block would have mixed bag of transactions)
11517:51 <monlovesmango> which might make sense depending on how batch verification is implemented
11617:51 <janb84> the mixed ratio seems arbitrary, would love to see if other ratios would change the outcome much
11717:51 <oxfrank> I think Possible follow-up benchmarks are still necessary i.e different script types (P2PKH, P2SH-wrapped SegWit), different block sizes, ..
11817:51 <sliv3r__> re: addition bencharmks - As this wants to benchmark batch validations I'm not sure how other parts of connectblock gets affected by that so...
11917:54 <sliv3r__> if we want to benchmark unrelated to batch validation we could get some numbers on how fast we update the utxo set or even how some of the changes from CC like nLockTime validation for coinbase tx affects here (that's not implemented yet)
12017:56 <dzxzg> Is there anything else that anyone wanted to say or ask that didn't fit into the topics we've talked about so far?
12117:56 <dzxzg> Or to add to any of the topics we have discussed?
12217:57 <Novo__> possible follow-up could also include TR script-path spend since batch verification should ultimately affect this too
12317:58 <monlovesmango> oh i did have one question actually, on line 55 the variable name is taproot_tx but it really is taproot or edcsa right?
12417:59 <monlovesmango> in connectblock.cpp
12517:59 <dzxzg> monlovesmango: I think you're right
12618:00 <dzxzg> That was awesome! Thank y'all for coming to this review club, have a peek at #29491 if this interested you!
12718:00 <janb84> dzxzg: thanks for hosting !
12818:00 <dzxzg> #endmeeting