Bitcoin Core PR Review Club

O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation / Abstract out script execution out of VerifyWitnessProgram() (consensus, refactoring, taproot)

Feb 12, 2020

https://github.com/bitcoin/bitcoin/pull/16902

Host: jnewbery - PR authors: sipa , ajtowns

The PR branch HEAD was e6e622e5a at the time of this review club meeting.

This week, we’ll review two (small) PRs:

Notes

Both of these PRs were pulled out of the Taproot/Schnorr demonstration branch in PR 17977. They both make small, non-behaviour-changing modifications to script execution.

Script execution is an area of the code that is very rarely modified. As the author notes in #18002 “As it touches consensus code, I don’t think this would ordinarily meet the bar for review cost vs benefit. However, it simplifies the changes for Taproot significantly, and [..] it’s going to be necessitated by inclusion of that code … “

Generally, for consensus code, the if it ain’t broke, don’t fix it rule prevails. We might also add if it is broke [in a non-critical way], probably don’t fix it either because you might break consensus.

O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation

Bitcoin Script is a stack language. A script consists of data elements that can be pushed onto the stack, and opcodes which manipulate the stack elements. See the Bitcoin wiki Script page for a good description of the language.
Script allows branching control flow by using the OP_IF/OP_NOTIF/OP_ELSE/OP_ENDIF opcodes. These branches can be nested multiple levels.
Script evaluation happens in EvalScript() in src/script/interpreter.cpp. The evaluation iterates through the script elements (while (pc < pend)) and switches on what that element is (switch (opcode)).
if/else branching is tracked by a boolean vector vfExec:
- When execution encounters an OP_IF or OP_NOTIF opcode (and therefore descends a level of if/else nesting), a boolean is pushed onto this vector: true if we’re entering a branch that should be executed and false for a branch that shouldn’t be executed.
- When execution encounters an OP_ELSE opcode, the topmost boolean in the vector is flipped.
- When execution encounters an OP_ENDIF opcode, the topmost boolean in the vector is popped off.
At the beginning of the while (pc < pend) loop, we determine whether we’re in a branch that should be executed by examining the vfExec vector and setting a fExec boolean: true if all of the elements in the vector are true, and false otherwise. If fExec is false, then we’re in a branch that shouldn’t be executed and we continue to the next iteration of the while loop.
This check involves iterating through a vector which is the length of the nested if branches. The check is done for every element of the script. In the worst case the time to do all checks grows quadratically. See Sergio Demian Lerner’s blog post for a good description of the problem.
This quadratic behaviour is not currently a problem, since the maximum number of opcodes that can be included in a script is 201. In the pessimal case, the quadratic behaviour can be used to construct a block that takes 0.25s to validate. There are other expensive-to-validate scripts that can be used to construct a block that takes longer than that to validate.
Tapscript does not have a 201 opcode limit. If this behaviour is not fixed before taproot activation, it could therefore be exploited to construct blocks that are very slow to validate.
This PR changes the boolean vector to be a pair of integers, which makes the fExec check O(1) instead of O(n), and so removes the quadratic behaviour.

Abstract out script execution out of VerifyWitnessProgram()

This is a small refactor PR that abstracts out a part of the VerifyWitnessProgram() function into its own function called ExecuteWitnessProgram().

Questions

Did you review the PRs? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)
The functions in ConditionStack() contain asserts. What happens if these asserts are hit?
How are we able to express a vector of booleans as a pair of integers? What observation about the use of the vector makes this possible?
In the notes, I’ve written “If fExec is false, then we’re in a branch that shouldn’t be executed and we continue to the next iteration of the while loop.” That’s not quite the full story. When do we not continue to the next iteration while fExec is false?
Did you run the verify_script() benchmark? Does this PR make a difference to performance?

Meeting Log

13:00

<fode> hi

13:00

<jnewbery> #startmeeting

13:00

<ajonas> Hi

13:00

<pinheadmz> hi

13:00

<fode> hello

13:00

<fjahr> hi

13:00

<jonatack> hi

13:00

<michaelfolkson> hi

13:00

<lightlike> hi

13:00

<jnewbery> Hi folks. Welcome to this week's review club meeting!

13:01

<kanzure> hi

13:01

<jnewbery> Feel free to say hi to let everyone know you're here.

13:01

<jnewbery> Before we start, I'm always looking for volunteer hosts for these meetings. I'll help you choose a PR, and help you prepare notes and questions. Just let me know if you're interested and we'll slot you in.

13:01

<emzy> Hi

13:01

<michaelfolkson> DM you jnewbery?

13:02

<jnewbery> yes!

13:02

<fanquake> hi

13:02

<jnewbery> Hosting a review club meeting forces you to really understand a PR, so I think it's a really worthwhile learning experience. You can ask any of the previous hosts if you don't believe me

13:02

<jnewbery> Normal reminder: everyone is here to learn, and no questions are stupid. Feel free to jump in and ask questions at any point.

13:02

<jnewbery> I heard this nice quote from Steven Strogatz the other day:

13:02

<jnewbery> We're all confused; confusion is the normal state of affairs when you're trying something really hard, and when you're exploring the unknown, [...] you can trust us that we're all on the same team trying the figure this out together and don't worry about looking stupid, I'm confused over here too.

13:03

<jnewbery> I hope that's how people see review club. We're all just trying to be a little less confused

13:03

<michaelfolkson> Haha nice

13:03

<jnewbery> ok, first up: who managed to review the PR? (y/n)

13:03

<fode> n

13:03

<fjahr> y

13:03

<lightlike> y

13:03

<pinheadmz> cramming rn :-) But i have read through sipa taproot branch

13:03

<michaelfolkson> y

13:03

<jonatack> y

13:04

<nehan_> hi

13:04

<jnewbery> hi nehan_

13:04

<nehan_> y

13:04

<jnewbery> For the folks that looked at the PRs, any high-level thoughts? Concept ACK/NACKs?

13:05

<jonatack> acked both

13:05

<jnewbery> or any general questions about the script interpreter in Bitcoin Core?

13:06

<emzy> n

13:06

<lightlike> concept ack

13:06

<andrewtoth> hi

13:06

<michaelfolkson> Concept ACK. I like the PR being broken down into smaller PRs as Nicolas said. My only concern is that this approach kind of assumes Taproot is inevitable (which we really shouldn't do)

13:06

<SirRichard> hi

13:06

<fjahr> code review ack for the refactor one, the op_if one I have not found anything yet but want to put a little more time into it

13:07

<jnewbery> michaelfolkson: I don't think that's true. These PRs don't leave the code in a worse place, even if taproot doesn't happen

13:07

<michaelfolkson> That's fair. sipa said the motivation for touching consensus wouldn't be there without Taproot

13:07

<jnewbery> I'd agree if the intermediate state made the code less clear or performant, but these changes are both improvements

13:07

<lightlike> actually I was meaning to ask what the relation of #16902 to taproot/schnorr is?

13:08

<pinheadmz> i agree with jnewbery the fact that witness versions are extensible up to 31 makes it reasonable to abstract out executeWitnessProgram

13:08

<michaelfolkson> Just a thought. I really like splitting it (makes it much more understandable for me) so I'm not arguing against this approach

13:08

<jnewbery> right - the calculus of whether to spend time on this changes if we think taproot will happen, but the changes are all good I think

13:09

<jnewbery> lightlike: good question! Anyone know why the OP_IF change is relevant for taproot?

13:10

<jonatack> the 201 opcode limit

13:10

<pinheadmz> hm is it because you only "pay" for sigops that you execute?

13:10

<michaelfolkson> Because Taproot no longer has limits on script size, number of op codes

13:10

<pinheadmz> (in taproot)

13:10

<jnewbery> michaelfolkson: yes! exactly

13:10

<jonatack> from the BIP "The maximum non-push opcodes limit of 201 per script does not apply."

13:11

<jonatack> and footnote 10: https://github.com/bitcoin/bips/blob/master/bip-0342.mediawiki#cite_note-10

13:11

<jnewbery> exactly - in tapscript, there can be more than 201 opcodes in a script, so any quadratic performance behaviour could be much worse

13:12

<michaelfolkson> That was a great blog post of Sergio's. I should check some of his other blogs out

13:12

<michaelfolkson> https://bitslog.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/

13:12

<jnewbery> ok, let's get into some of the more low-level questions

13:12

<jnewbery> The fuctions in ConditionStack() contain asserts. What happens if these asserts are hit?

13:13

<michaelfolkson> Asserts generally terminate the program if false. Is that what you mean?

13:13

<jonatack> assertion failed -> Aborted

13:14

<jonatack> (right?)

13:14

<jonatack> https://en.cppreference.com/w/cpp/error/assert

13:15

<jnewbery> hmmm, don't they get caught?

13:15

<jnewbery> or is that just throws?

13:16

<jonatack> can reverse one to find out :p

13:17

<jnewbery> ok, I'm wrong. asserts don't get caught in C++

13:17

<jnewbery> (I told you I'm also confused)

13:17

<jnewbery> why do we expect these asserts never to be hit?

13:18

<lightlike> afaik the asserts are removed by the compiler if NDEBUG is not defined, so they should not be relied upon.

13:18

<jonatack> when i'm testing PRs I often add compile (static assert) or run time (assert) to check my assumptions and so far when they failed, it aborts

13:19

<jnewbery> lightlike: yes, I think you're right

13:19

<jonatack> lightlike: i'm not sure if that's a c++17 feature

13:20

<jonatack> (i mean, if it's c++17 and up or also with c++11)

13:20

<lightlike> jonatack: pretty sure that is a very old feature

13:20

<jnewbery> the asserts appear in `pop_back()` and `toggle_top()`, which are only called in one place each, where we've already checked the stack is non-empty

13:20

<jnewbery> https://github.com/bitcoin/bitcoin/pull/16902/files#diff-be2905e2f5218ecdbe4e55637dac75f3R561

13:20

<jnewbery> https://github.com/bitcoin/bitcoin/pull/16902/files#diff-be2905e2f5218ecdbe4e55637dac75f3R569

13:21

<jonatack> lightlike: agreed, i was thinking of this: "NDEBUG is defined at the point where assert is last defined or redefined (i.e., where the header <cassert> or <assert.h> was last included); (since c++17)"

13:21

<jnewbery> next question. This PR swaps out a vector of booleans for a pair of integers. How are we able to express a vector of booleans as a pair of integers? What observation about the use of the vector makes this possible?

13:21

<emzy> why using an assert insted of an error log entry?

13:22

<nehan_> why assert() instead of static_assert()?

13:23

<nehan_> as suggested in the developer docs

13:23

<jb55> lightlike: NDEBUG functionality is disabled in core. asserts happens with or without it enabled

13:23

<jonatack> jb55: ah

13:23

<lightlike> jb55: ok, thanks

13:24

<jonatack> nehan_: i think static asserts require a bool constexpr

13:24

<michaelfolkson> The two integers are size of implied stack (m_stack_size) and position of the first false value on the implied stack (m_first_false_pos)

13:24

<jnewbery> nehan_: I don't think it's possible to do a static_assert here because it's asserting on runtime behaviour

13:24

<jonatack> so they can't be dynamic values determined at run time... thought i could very well be confused, recently forgot that in a pr review i made

13:25

<jnewbery> emzy: I think this is a good description of what asserts are for: https://stackoverflow.com/questions/28973904/assert-in-try-catch-block/28974029#28974029

13:25

<nehan_> jnewbery: jonatack: ah thanks

13:25

<jnewbery> "The purpose of assert is the guarantee that the programmer doesn't make mistakes."

13:26

<lightlike> there was some trouble with misplaced asserts a some years ago with the Bitcoin Unlimited fork, see e.g. https://blog.erratasec.com/2017/03/assert-in-hands-of-bad-coders.html (with a comment of gmaxwell)

13:26

<jnewbery> michaelfolkson: yep, that's what they're being used for, but we're losing some information there. We don't know the state of all booleans on the stack. Why is that ok?

13:26

<michaelfolkson> So you'd use an assert over a test when it is critically important?

13:27

<nehan_> might be good to discuss bitcoin-core policy on when to assert vs. log or bump up failure. is that documented anywhere?

13:27

<jnewbery> asserts are a check on the programing logic. throws are for runtime error conditions that can be recovered from.

13:28

<jnewbery> the expectation is that asserts are *never* hit when running the code in the wild

13:28

<lightlike> so asserts, if they can somehow be triggered by our peers, have the potential to bring down large parts of the network.

13:28

<emzy> as I understand it is a assert is triggert it is not anymore save to execute the program and better to exit.

13:29

<fjahr> see also use of assert in the second pr where sipa explicitly comments that this code should never be hit

13:29

<jnewbery> nehan_: I'm not sure it is documented anywhere. It's a good question

13:29

<jnewbery> fjahr: I think that's slightly different, and is to quiet a build warning

13:30

<jnewbery> here: https://github.com/bitcoin/bitcoin/pull/18002/files#diff-be2905e2f5218ecdbe4e55637dac75f3R1464

13:30

<jonatack> agree, i see asserts as sanity checks that should never be hit under expected circumstances

13:30

<nehan_> jnewbery: to answer your question about why you don't need all booleans on the stack, it's because once you hit a false you will not execute anything after that

13:31

<MarcoFalke> It is documented a bit in ./src/util/check.h, I think

13:31

<jnewbery> nehan_: yes, exactly. The calling code can only do one of the following:

13:32

<jnewbery> - push/pop from the top of the stack

13:32

<jnewbery> - toggle the top of the stack

13:32

<jnewbery> - check if all values are true

13:33

<jnewbery> because of that limited interface to the stack, internally the stack doesn't need to track more information than stack depth and first false entry

13:33

<MarcoFalke> util/check.h defines a CHECK_NONFATAL, which actually throws, instead of terminating (like an assert)

13:33

<jonatack> MarcoFalke: NonFatalCheckError/CHECK_NONFATAL?

13:33

<jonatack> right

13:34

<MarcoFalke> Obviously we don't want to use that in consensus or validation code

13:34

<jnewbery> And to add a bit more confusion, we also have an 'AbortNode' function in logs and attempts to shutdown cleanly

13:35

<jnewbery> *in validation which logs and attempts to shutdown cleanly

13:35

<fjahr> jnewbery: I saw that but I understood it as the warning would be there "if the logic got messed up in future". I removed it now and did not see any new compiler warnings.

13:36

<jnewbery> fjahr: oh, I think you're right

13:37

<jnewbery> everyone happy with the optimized 'stack' implementation?

13:38

<jnewbery> here: https://github.com/bitcoin/bitcoin/pull/16902/files#diff-be2905e2f5218ecdbe4e55637dac75f3R297

13:39

<jnewbery> ok, next question: In the notes, I’ve written “If fExec is false, then we’re in a branch that shouldn’t be executed and we continue to the next iteration of the while loop.” That’s not quite the full story. When do we not continue to the next iteration while fExec is false?

13:39

<jonatack> We don't continue if fExec is false, when we are inside an OP_IF...OP_ENDIF conditional?

13:40

<jonatack> e.g. ... if (fExec || (OP_IF <= opcode && opcode <= OP_ENDIF))

13:41

<jnewbery> jonatack: right. If there's we're in a non-executing branch and the next opcode is OP_IF, OP_NOTIF, OP_ELSE or OP_ENDIF, we drop into the switch statement

13:42

<jnewbery> https://github.com/bitcoin/bitcoin/blob/2bdc476d4d23256d8396bb9051a511f540d87392/src/script/interpreter.cpp#L348-L349

13:42

<jonatack> oof good, i can leave the ack :p

13:43

<jnewbery> we also execute everything above that, even if we're in the unexecuted branch: https://github.com/bitcoin/bitcoin/blob/2bdc476d4d23256d8396bb9051a511f540d87392/src/script/interpreter.cpp#L310-L347

13:44

<jnewbery> so if there's a non-existent opcode in the script, the script always fails (even if that non-existent opcode is in an unexecuted branch)

13:45

<jnewbery> ok, final question: Did you run the verify_script() benchmark? Does this PR make a difference to performance?

13:47

<jnewbery> Alright, anyone have any other questions? Thoughts?

13:47

<michaelfolkson> Sorry I didn't get that far to do that

13:48

<unbend> got here late is there a bouncer / log to check? ':0

13:48

<nehan_> PRs like this are terrifying because they touch consensus code

13:48

<jnewbery> nehan_: yeah, we don't touch this stuff much

13:49

<jnewbery> hopefully these two are small and contained enough that they're ok to review

13:49

<jnewbery> what would make it less terrifying?

13:49

<michaelfolkson> Any chance of any more PRs being rolled out of sipa's Taproot PR?

13:49

<lightlike> script seems to be even more terrifying than other consensus code like in validation

13:50

<jnewbery> michaelfolkson: I expect so. I plan to discuss them here if more PRs are opened

13:50

<jnewbery> lightlike: interesting. How so?

13:50

<michaelfolkson> The Taproot PR would be a monstrous PR review club in its current form haha

13:51

<lightlike> just my impression. there are quite some PRs changing validation code, while script is touched almost never. Possibly, the quality/accessibility of code is better in validation.cpp?

13:52

<fjahr> did not run the benchmarks yet but will do so before finishing the review

13:52

<jonatack> Sorry, the internet connection fitzed. I wanted to mention that I found the study notes really helpful.

13:53

<fjahr> we could also review specific commits of taproot in the club

13:53

<jnewbery> lightlike: maybe. There are ongoing projects that we want to do in validation.cpp (eg splitting net_processing/validation onto different threads). That stuff makes slow progression.

13:53

<jonatack> And kudos to MarcoFalke for the related fuzzing test to check today's PR.

13:53

<jnewbery> but we very rarely touch script except fo a softfork

13:53

<MarcoFalke> lol, that fuzz test is really simple

13:54

<MarcoFalke> The fuzzer should exhaust it in less than a minute. I was too lazy to write a unit test

13:54

<jnewbery> I had a quick look for unit tests for the script interpreter and didn't find anything that tested the if/else opcodes

13:54

<nehan_> question: suppose some nodes upgrade to use this new, faster code. is there any way the existence of the old nodes could cause a consensus failure, or upheaval on the network? they would take a longer time to validate a block that stressed this.

13:55

<jnewbery> nehan_: great question!

13:55

<jnewbery> does anyone have thoughts about that?

13:56

<jnewbery> we have 4 minutes left. Now's your chance if you've been shy so far

13:56

<fjahr> nehan_: you mean after taproot is activated?

13:57

<jonatack> nehan_: Older software running non-tapscript? The limit is in place and should be backward compatible, no?

13:57

<nehan_> I imagine old nodes won't understand taproot and so won't execute that script anyway, right? And perhaps the current opcode limit bounds the difference between the slowest un-upgraded node and the fastest upgraded node.

13:58

<jonatack> It's a soft fork, not a hard fork, so a narrowing with backward compat preserved unless i'm confused

13:58

<nehan_> old nodes won't execute taproot scripts which stressed that code, that is.

13:59

<nehan_> yes, i'm not suggesting there would be a fork because of rules. but imagine if a performance improvement made new nodes many many times faster than old nodes, and the old nodes couldn't keep up.

13:59

<jnewbery> nehan_: I think that's right. Pieter estimates in the PR notes that a block that fully exploits this slow behaviour can take up to 0.25 seconds to validate. That's not enough to cause any network partition concerns

14:00

<jnewbery> ok, that's time folks. Thanks for coming. See y'all next week

14:00

<jnewbery> #endmeeting

14:00

<nehan_> has anyone thought about the impact of different verification times across the network? maybe it's not important?

14:00

<nehan_> thanks!

14:00

<emzy> thanks jnewbery and all others!

14:01

<lightlike> thanks!

14:02

<jonatack> yes, my thinking as well

14:02

<fjahr> yeah, i think your right but it's interesting. the biggest difference in validation time would probably be between the fastest (hardware) non-taproot node and the slowest taproot node. but given that we have nodes with all kinds of different hardware on the network now and these have different validation times I think there is no real impact.

14:03

<nehan_> fjahr: i think you're right

14:04

<jonatack> Lost the connection again. Curious to catch up with the discussion. It seems to me though that these changes are strictly better.

14:05

<jonatack> Great question though!