O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation / Abstract out script execution out of VerifyWitnessProgram() (consensus, refactoring)

https://github.com/bitcoin/bitcoin/pulls/16902

Host: jnewbery  -  PR authors: sipa , ajtowns

The PR branch HEAD was e6e622e5a at the time of this review club meeting.

This week, we’ll review two (small) PRs:

Notes

Both of these PRs were pulled out of the Taproot/Schnorr demonstration branch in PR 17977. They both make small, non-behaviour-changing modifications to script execution.

Script execution is an area of the code that is very rarely modified. As the author notes in #18002 “As it touches consensus code, I don’t think this would ordinarily meet the bar for review cost vs benefit. However, it simplifies the changes for Taproot significantly, and [..] it’s going to be necessitated by inclusion of that code … “

Generally, for consensus code, the if it ain’t broke, don’t fix it rule prevails. We might also add if it is broke [in a non-critical way], probably don’t fix it either because you might break consensus.

O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation

  • Bitcoin Script is a stack language. A script consists of data elements that can be pushed onto the stack, and opcodes which manipulate the stack elements. See the Bitcoin wiki Script page for a good description of the language.

  • Script allows branching control flow by using the OP_IF/OP_NOTIF/OP_ELSE/OP_ENDIF opcodes. These branches can be nested multiple levels.

  • Script evaluation happens in EvalScript() in src/script/interpreter.cpp. The evaluation iterates through the script elements (while (pc < pend)) and switches on what that element is (switch (opcode)).

  • if/else branching is tracked by a boolean vector vfExec:

    • When execution encounters an OP_IF or OP_NOTIF opcode (and therefore descends a level of if/else nesting), a boolean is pushed onto this vector: true if we’re entering a branch that should be executed and false for a branch that shouldn’t be executed.

    • When execution encounters an OP_ELSE opcode, the topmost boolean in the vector is flipped.

    • When execution encounters an OP_ENDIF opcode, the topmost boolean in the vector is popped off.

  • At the beginning of the while (pc < pend) loop, we determine whether we’re in a branch that should be executed by examining the vfExec vector and setting a fExec boolean: true if all of the elements in the vector are true, and false otherwise. If fExec is false, then we’re in a branch that shouldn’t be executed and we continue to the next iteration of the while loop.

  • This check involves iterating through a vector which is the length of the nested if branches. The check is done for every element of the script. In the worst case the time to do all checks grows quadratically. See Sergio Demian Lerner’s blog post for a good description of the problem.

  • This quadratic behaviour is not currently a problem, since the maximum number of opcodes that can be included in a script is 201. In the pessimal case, the quadratic behaviour can be used to construct a block that takes 0.25s to validate. There are other expensive-to-validate scripts that can be used to construct a block that takes longer than that to validate.

  • Tapscript does not have a 201 opcode limit. If this behaviour is not fixed before taproot activation, it could therefore be exploited to construct blocks that are very slow to validate.

  • This PR changes the boolean vector to be a pair of integers, which makes the fExec check O(1) instead of O(n), and so removes the quadratic behaviour.

Abstract out script execution out of VerifyWitnessProgram()

  • This is a small refactor PR that abstracts out a part of the VerifyWitnessProgram() function into its own function called ExecuteWitnessProgram().

Questions

  1. Did you review the PRs? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)

  2. The functions in ConditionStack() contain asserts. What happens if these asserts are hit?

  3. How are we able to express a vector of booleans as a pair of integers? What observation about the use of the vector makes this possible?

  4. In the notes, I’ve written “If fExec is false, then we’re in a branch that shouldn’t be executed and we continue to the next iteration of the while loop.” That’s not quite the full story. When do we not continue to the next iteration while fExec is false?

  5. Did you run the verify_script() benchmark? Does this PR make a difference to performance?

Meeting Log

  113:00 <fode> hi
  213:00 <jnewbery> #startmeeting
  313:00 <ajonas> Hi
  413:00 <pinheadmz> hi
  513:00 <fode> hello
  613:00 <fjahr> hi
  713:00 <jonatack> hi
  813:00 <michaelfolkson> hi
  913:00 <lightlike> hi
 1013:00 <jnewbery> Hi folks. Welcome to this week's review club meeting!
 1113:01 <kanzure> hi
 1213:01 <jnewbery> Feel free to say hi to let everyone know you're here.
 1313:01 <jnewbery> Before we start, I'm always looking for volunteer hosts for these meetings. I'll help you choose a PR, and help you prepare notes and questions. Just let me know if you're interested and we'll slot you in.
 1413:01 <emzy> Hi
 1513:01 <michaelfolkson> DM you jnewbery?
 1613:02 <jnewbery> yes!
 1713:02 <fanquake> hi
 1813:02 <jnewbery> Hosting a review club meeting forces you to really understand a PR, so I think it's a really worthwhile learning experience. You can ask any of the previous hosts if you don't believe me
 1913:02 <jnewbery> Normal reminder: everyone is here to learn, and no questions are stupid. Feel free to jump in and ask questions at any point.
 2013:02 <jnewbery> I heard this nice quote from Steven Strogatz the other day:
 2113:02 <jnewbery> We're all confused; confusion is the normal state of affairs when you're trying something really hard, and when you're exploring the unknown, [...] you can trust us that we're all on the same team trying the figure this out together and don't worry about looking stupid, I'm confused over here too.
 2213:03 <jnewbery> I hope that's how people see review club. We're all just trying to be a little less confused
 2313:03 <michaelfolkson> Haha nice
 2413:03 <jnewbery> ok, first up: who managed to review the PR? (y/n)
 2513:03 <fode> n
 2613:03 <fjahr> y
 2713:03 <lightlike> y
 2813:03 <pinheadmz> cramming rn :-) But i have read through sipa taproot branch
 2913:03 <michaelfolkson> y
 3013:03 <jonatack> y
 3113:04 <nehan_> hi
 3213:04 <jnewbery> hi nehan_
 3313:04 <nehan_> y
 3413:04 <jnewbery> For the folks that looked at the PRs, any high-level thoughts? Concept ACK/NACKs?
 3513:05 <jonatack> acked both
 3613:05 <jnewbery> or any general questions about the script interpreter in Bitcoin Core?
 3713:06 <emzy> n
 3813:06 <lightlike> concept ack
 3913:06 <andrewtoth> hi
 4013:06 <michaelfolkson> Concept ACK. I like the PR being broken down into smaller PRs as Nicolas said. My only concern is that this approach kind of assumes Taproot is inevitable (which we really shouldn't do)
 4113:06 <SirRichard> hi
 4213:06 <fjahr> code review ack for the refactor one, the op_if one I have not found anything yet but want to put a little more time into it
 4313:07 <jnewbery> michaelfolkson: I don't think that's true. These PRs don't leave the code in a worse place, even if taproot doesn't happen
 4413:07 <michaelfolkson> That's fair. sipa said the motivation for touching consensus wouldn't be there without Taproot
 4513:07 <jnewbery> I'd agree if the intermediate state made the code less clear or performant, but these changes are both improvements
 4613:07 <lightlike> actually I was meaning to ask what the relation of #16902 to taproot/schnorr is?
 4713:08 <pinheadmz> i agree with jnewbery the fact that witness versions are extensible up to 31 makes it reasonable to abstract out executeWitnessProgram
 4813:08 <michaelfolkson> Just a thought. I really like splitting it (makes it much more understandable for me) so I'm not arguing against this approach
 4913:08 <jnewbery> right - the calculus of whether to spend time on this changes if we think taproot will happen, but the changes are all good I think
 5013:09 <jnewbery> lightlike: good question! Anyone know why the OP_IF change is relevant for taproot?
 5113:10 <jonatack> the 201 opcode limit
 5213:10 <pinheadmz> hm is it because you only "pay" for sigops that you execute?
 5313:10 <michaelfolkson> Because Taproot no longer has limits on script size, number of op codes
 5413:10 <pinheadmz> (in taproot)
 5513:10 <jnewbery> michaelfolkson: yes! exactly
 5613:10 <jonatack> from the BIP "The maximum non-push opcodes limit of 201 per script does not apply."
 5713:11 <jonatack> and footnote 10: https://github.com/bitcoin/bips/blob/master/bip-0342.mediawiki#cite_note-10
 5813:11 <jnewbery> exactly - in tapscript, there can be more than 201 opcodes in a script, so any quadratic performance behaviour could be much worse
 5913:12 <michaelfolkson> That was a great blog post of Sergio's. I should check some of his other blogs out
 6013:12 <michaelfolkson> https://bitslog.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/
 6113:12 <jnewbery> ok, let's get into some of the more low-level questions
 6213:12 <jnewbery> The fuctions in ConditionStack() contain asserts. What happens if these asserts are hit?
 6313:13 <michaelfolkson> Asserts generally terminate the program if false. Is that what you mean?
 6413:13 <jonatack> assertion failed -> Aborted
 6513:14 <jonatack> (right?)
 6613:14 <jonatack> https://en.cppreference.com/w/cpp/error/assert
 6713:15 <jnewbery> hmmm, don't they get caught?
 6813:15 <jnewbery> or is that just throws?
 6913:16 <jonatack> can reverse one to find out :p
 7013:17 <jnewbery> ok, I'm wrong. asserts don't get caught in C++
 7113:17 <jnewbery> (I told you I'm also confused)
 7213:17 <jnewbery> why do we expect these asserts never to be hit?
 7313:18 <lightlike> afaik the asserts are removed by the compiler if NDEBUG is not defined, so they should not be relied upon.
 7413:18 <jonatack> when i'm testing PRs I often add compile (static assert) or run time (assert) to check my assumptions and so far when they failed, it aborts
 7513:19 <jnewbery> lightlike: yes, I think you're right
 7613:19 <jonatack> lightlike: i'm not sure if that's a c++17 feature
 7713:20 <jonatack> (i mean, if it's c++17 and up or also with c++11)
 7813:20 <lightlike> jonatack: pretty sure that is a very old feature
 7913:20 <jnewbery> the asserts appear in `pop_back()` and `toggle_top()`, which are only called in one place each, where we've already checked the stack is non-empty
 8013:20 <jnewbery> https://github.com/bitcoin/bitcoin/pull/16902/files#diff-be2905e2f5218ecdbe4e55637dac75f3R561
 8113:20 <jnewbery> https://github.com/bitcoin/bitcoin/pull/16902/files#diff-be2905e2f5218ecdbe4e55637dac75f3R569
 8213:21 <jonatack> lightlike: agreed, i was thinking of this: "NDEBUG is defined at the point where assert is last defined or redefined (i.e., where the header <cassert> or <assert.h> was last included); (since c++17)"
 8313:21 <jnewbery> next question. This PR swaps out a vector of booleans for a pair of integers. How are we able to express a vector of booleans as a pair of integers? What observation about the use of the vector makes this possible?
 8413:21 <emzy> why using an assert insted of an error log entry?
 8513:22 <nehan_> why assert() instead of static_assert()?
 8613:23 <nehan_> as suggested in the developer docs
 8713:23 <jb55> lightlike: NDEBUG functionality is disabled in core. asserts happens with or without it enabled
 8813:23 <jonatack> jb55: ah
 8913:23 <lightlike> jb55: ok, thanks
 9013:24 <jonatack> nehan_: i think static asserts require a bool constexpr
 9113:24 <michaelfolkson> The two integers are size of implied stack (m_stack_size) and position of the first false value on the implied stack (m_first_false_pos)
 9213:24 <jnewbery> nehan_: I don't think it's possible to do a static_assert here because it's asserting on runtime behaviour
 9313:24 <jonatack> so they can't be dynamic values determined at run time... thought i could very well be confused, recently forgot that in a pr review i made
 9413:25 <jnewbery> emzy: I think this is a good description of what asserts are for: https://stackoverflow.com/questions/28973904/assert-in-try-catch-block/28974029#28974029
 9513:25 <nehan_> jnewbery: jonatack: ah thanks
 9613:25 <jnewbery> "The purpose of assert is the guarantee that the programmer doesn't make mistakes."
 9713:26 <lightlike> there was some trouble with misplaced asserts a some years ago with the Bitcoin Unlimited fork, see e.g. https://blog.erratasec.com/2017/03/assert-in-hands-of-bad-coders.html (with a comment of gmaxwell)
 9813:26 <jnewbery> michaelfolkson: yep, that's what they're being used for, but we're losing some information there. We don't know the state of all booleans on the stack. Why is that ok?
 9913:26 <michaelfolkson> So you'd use an assert over a test when it is critically important?
10013:27 <nehan_> might be good to discuss bitcoin-core policy on when to assert vs. log or bump up failure. is that documented anywhere?
10113:27 <jnewbery> asserts are a check on the programing logic. throws are for runtime error conditions that can be recovered from.
10213:28 <jnewbery> the expectation is that asserts are *never* hit when running the code in the wild
10313:28 <lightlike> so asserts, if they can somehow be triggered by our peers, have the potential to bring down large parts of the network.
10413:28 <emzy> as I understand it is a assert is triggert it is not anymore save to execute the program and better to exit.
10513:29 <fjahr> see also use of assert in the second pr where sipa explicitly comments that this code should never be hit
10613:29 <jnewbery> nehan_: I'm not sure it is documented anywhere. It's a good question
10713:29 <jnewbery> fjahr: I think that's slightly different, and is to quiet a build warning
10813:30 <jnewbery> here: https://github.com/bitcoin/bitcoin/pull/18002/files#diff-be2905e2f5218ecdbe4e55637dac75f3R1464
10913:30 <jonatack> agree, i see asserts as sanity checks that should never be hit under expected circumstances
11013:30 <nehan_> jnewbery: to answer your question about why you don't need all booleans on the stack, it's because once you hit a false you will not execute anything after that
11113:31 <MarcoFalke> It is documented a bit in ./src/util/check.h, I think
11213:31 <jnewbery> nehan_: yes, exactly. The calling code can only do one of the following:
11313:32 <jnewbery> - push/pop from the top of the stack
11413:32 <jnewbery> - toggle the top of the stack
11513:32 <jnewbery> - check if all values are true
11613:33 <jnewbery> because of that limited interface to the stack, internally the stack doesn't need to track more information than stack depth and first false entry
11713:33 <MarcoFalke> util/check.h defines a CHECK_NONFATAL, which actually throws, instead of terminating (like an assert)
11813:33 <jonatack> MarcoFalke: NonFatalCheckError/CHECK_NONFATAL?
11913:33 <jonatack> right
12013:34 <MarcoFalke> Obviously we don't want to use that in consensus or validation code
12113:34 <jnewbery> And to add a bit more confusion, we also have an 'AbortNode' function in logs and attempts to shutdown cleanly
12213:35 <jnewbery> *in validation which logs and attempts to shutdown cleanly
12313:35 <fjahr> jnewbery: I saw that but I understood it as the warning would be there "if the logic got messed up in future". I removed it now and did not see any new compiler warnings.
12413:36 <jnewbery> fjahr: oh, I think you're right
12513:37 <jnewbery> everyone happy with the optimized 'stack' implementation?
12613:38 <jnewbery> here: https://github.com/bitcoin/bitcoin/pull/16902/files#diff-be2905e2f5218ecdbe4e55637dac75f3R297
12713:39 <jnewbery> ok, next question: In the notes, I’ve written “If fExec is false, then we’re in a branch that shouldn’t be executed and we continue to the next iteration of the while loop.” That’s not quite the full story. When do we not continue to the next iteration while fExec is false?
12813:39 <jonatack> We don't continue if fExec is false, when we are inside an OP_IF...OP_ENDIF conditional?
12913:40 <jonatack> e.g. ... if (fExec || (OP_IF <= opcode && opcode <= OP_ENDIF))
13013:41 <jnewbery> jonatack: right. If there's we're in a non-executing branch and the next opcode is OP_IF, OP_NOTIF, OP_ELSE or OP_ENDIF, we drop into the switch statement
13113:42 <jnewbery> https://github.com/bitcoin/bitcoin/blob/2bdc476d4d23256d8396bb9051a511f540d87392/src/script/interpreter.cpp#L348-L349
13213:42 <jonatack> oof good, i can leave the ack :p
13313:43 <jnewbery> we also execute everything above that, even if we're in the unexecuted branch: https://github.com/bitcoin/bitcoin/blob/2bdc476d4d23256d8396bb9051a511f540d87392/src/script/interpreter.cpp#L310-L347
13413:44 <jnewbery> so if there's a non-existent opcode in the script, the script always fails (even if that non-existent opcode is in an unexecuted branch)
13513:45 <jnewbery> ok, final question: Did you run the verify_script() benchmark? Does this PR make a difference to performance?
13613:47 <jnewbery> Alright, anyone have any other questions? Thoughts?
13713:47 <michaelfolkson> Sorry I didn't get that far to do that
13813:48 <unbend> got here late is there a bouncer / log to check? ':0
13913:48 <nehan_> PRs like this are terrifying because they touch consensus code
14013:48 <jnewbery> nehan_: yeah, we don't touch this stuff much
14113:49 <jnewbery> hopefully these two are small and contained enough that they're ok to review
14213:49 <jnewbery> what would make it less terrifying?
14313:49 <michaelfolkson> Any chance of any more PRs being rolled out of sipa's Taproot PR?
14413:49 <lightlike> script seems to be even more terrifying than other consensus code like in validation
14513:50 <jnewbery> michaelfolkson: I expect so. I plan to discuss them here if more PRs are opened
14613:50 <jnewbery> lightlike: interesting. How so?
14713:50 <michaelfolkson> The Taproot PR would be a monstrous PR review club in its current form haha
14813:51 <lightlike> just my impression. there are quite some PRs changing validation code, while script is touched almost never. Possibly, the quality/accessibility of code is better in validation.cpp?
14913:52 <fjahr> did not run the benchmarks yet but will do so before finishing the review
15013:52 <jonatack> Sorry, the internet connection fitzed. I wanted to mention that I found the study notes really helpful.
15113:53 <fjahr> we could also review specific commits of taproot in the club
15213:53 <jnewbery> lightlike: maybe. There are ongoing projects that we want to do in validation.cpp (eg splitting net_processing/validation onto different threads). That stuff makes slow progression.
15313:53 <jonatack> And kudos to MarcoFalke for the related fuzzing test to check today's PR.
15413:53 <jnewbery> but we very rarely touch script except fo a softfork
15513:53 <MarcoFalke> lol, that fuzz test is really simple
15613:54 <MarcoFalke> The fuzzer should exhaust it in less than a minute. I was too lazy to write a unit test
15713:54 <jnewbery> I had a quick look for unit tests for the script interpreter and didn't find anything that tested the if/else opcodes
15813:54 <nehan_> question: suppose some nodes upgrade to use this new, faster code. is there any way the existence of the old nodes could cause a consensus failure, or upheaval on the network? they would take a longer time to validate a block that stressed this.
15913:55 <jnewbery> nehan_: great question!
16013:55 <jnewbery> does anyone have thoughts about that?
16113:56 <jnewbery> we have 4 minutes left. Now's your chance if you've been shy so far
16213:56 <fjahr> nehan_: you mean after taproot is activated?
16313:57 <jonatack> nehan_: Older software running non-tapscript? The limit is in place and should be backward compatible, no?
16413:57 <nehan_> I imagine old nodes won't understand taproot and so won't execute that script anyway, right? And perhaps the current opcode limit bounds the difference between the slowest un-upgraded node and the fastest upgraded node.
16513:58 <jonatack> It's a soft fork, not a hard fork, so a narrowing with backward compat preserved unless i'm confused
16613:58 <nehan_> old nodes won't execute taproot scripts which stressed that code, that is.
16713:59 <nehan_> yes, i'm not suggesting there would be a fork because of rules. but imagine if a performance improvement made new nodes many many times faster than old nodes, and the old nodes couldn't keep up.
16813:59 <jnewbery> nehan_: I think that's right. Pieter estimates in the PR notes that a block that fully exploits this slow behaviour can take up to 0.25 seconds to validate. That's not enough to cause any network partition concerns
16914:00 <jnewbery> ok, that's time folks. Thanks for coming. See y'all next week
17014:00 <jnewbery> #endmeeting
17114:00 <nehan_> has anyone thought about the impact of different verification times across the network? maybe it's not important?
17214:00 <nehan_> thanks!
17314:00 <emzy> thanks jnewbery and all others!
17414:01 <lightlike> thanks!
17514:02 <jonatack> yes, my thinking as well
17614:02 <fjahr> yeah, i think your right but it's interesting. the biggest difference in validation time would probably be between the fastest (hardware) non-taproot node and the slowest taproot node. but given that we have nodes with all kinds of different hardware on the network now and these have different validation times I think there is no real impact.
17714:03 <nehan_> fjahr: i think you're right
17814:04 <jonatack> Lost the connection again. Curious to catch up with the discussion. It seems to me though that these changes are strictly better.
17914:05 <jonatack> Great question though!