O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation / Abstract out script execution out of VerifyWitnessProgram() (consensus, refactoring)
The PR branch HEAD was e6e622e5a at the time of this review club meeting.
This week, we’ll review two (small) PRs:
- O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation
- Abstract out script execution out of VerifyWitnessProgram()
Both of these PRs were pulled out of the Taproot/Schnorr demonstration branch in PR 17977. They both make small, non-behaviour-changing modifications to script execution.
Script execution is an area of the code that is very rarely modified. As the author notes in #18002 “As it touches consensus code, I don’t think this would ordinarily meet the bar for review cost vs benefit. However, it simplifies the changes for Taproot significantly, and [..] it’s going to be necessitated by inclusion of that code … “
Generally, for consensus code, the if it ain’t broke, don’t fix it rule prevails. We might also add if it is broke [in a non-critical way], probably don’t fix it either because you might break consensus.
O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation
Bitcoin Script is a stack language. A script consists of data elements that can be pushed onto the stack, and opcodes which manipulate the stack elements. See the Bitcoin wiki Script page for a good description of the language.
Script allows branching control flow by using the OP_IF/OP_NOTIF/OP_ELSE/OP_ENDIF opcodes. These branches can be nested multiple levels.
Script evaluation happens in
src/script/interpreter.cpp. The evaluation iterates through the script elements (
while (pc < pend)) and switches on what that element is (
if/else branching is tracked by a boolean vector
When execution encounters an OP_IF or OP_NOTIF opcode (and therefore descends a level of if/else nesting), a boolean is pushed onto this vector:
trueif we’re entering a branch that should be executed and
falsefor a branch that shouldn’t be executed.
When execution encounters an OP_ELSE opcode, the topmost boolean in the vector is flipped.
When execution encounters an OP_ENDIF opcode, the topmost boolean in the vector is popped off.
At the beginning of the
while (pc < pend)loop, we determine whether we’re in a branch that should be executed by examining the
vfExecvector and setting a
fExecboolean: true if all of the elements in the vector are true, and false otherwise. If
fExecis false, then we’re in a branch that shouldn’t be executed and we continue to the next iteration of the while loop.
This check involves iterating through a vector which is the length of the nested if branches. The check is done for every element of the script. In the worst case the time to do all checks grows quadratically. See Sergio Demian Lerner’s blog post for a good description of the problem.
This quadratic behaviour is not currently a problem, since the maximum number of opcodes that can be included in a script is 201. In the pessimal case, the quadratic behaviour can be used to construct a block that takes 0.25s to validate. There are other expensive-to-validate scripts that can be used to construct a block that takes longer than that to validate.
Tapscript does not have a 201 opcode limit. If this behaviour is not fixed before taproot activation, it could therefore be exploited to construct blocks that are very slow to validate.
This PR changes the boolean vector to be a pair of integers, which makes the
fExeccheck O(1) instead of O(n), and so removes the quadratic behaviour.
Abstract out script execution out of VerifyWitnessProgram()
- This is a small refactor PR that abstracts out a part of the
VerifyWitnessProgram()function into its own function called
Did you review the PRs? Concept ACK, approach ACK, tested ACK, or NACK? (Don’t forget to put your PR review on GitHub.)
The functions in
ConditionStack()contain asserts. What happens if these asserts are hit?
How are we able to express a vector of booleans as a pair of integers? What observation about the use of the vector makes this possible?
In the notes, I’ve written “If
fExecis false, then we’re in a branch that shouldn’t be executed and we continue to the next iteration of the while loop.” That’s not quite the full story. When do we not continue to the next iteration while
Did you run the
verify_script()benchmark? Does this PR make a difference to performance?
13:00 < fode> hi 13:00 < jnewbery> #startmeeting 13:00 < ajonas> Hi 13:00 < pinheadmz> hi 13:00 < fode> hello 13:00 < fjahr> hi 13:00 < jonatack> hi 13:00 < michaelfolkson> hi 13:00 < lightlike> hi 13:00 < jnewbery> Hi folks. Welcome to this week's review club meeting! 13:01 < kanzure> hi 13:01 < jnewbery> Feel free to say hi to let everyone know you're here. 13:01 < jnewbery> Before we start, I'm always looking for volunteer hosts for these meetings. I'll help you choose a PR, and help you prepare notes and questions. Just let me know if you're interested and we'll slot you in. 13:01 < emzy> Hi 13:01 < michaelfolkson> DM you jnewbery? 13:02 < jnewbery> yes! 13:02 < fanquake> hi 13:02 < jnewbery> Hosting a review club meeting forces you to really understand a PR, so I think it's a really worthwhile learning experience. You can ask any of the previous hosts if you don't believe me 13:02 < jnewbery> Normal reminder: everyone is here to learn, and no questions are stupid. Feel free to jump in and ask questions at any point. 13:02 < jnewbery> I heard this nice quote from Steven Strogatz the other day: 13:02 < jnewbery> We're all confused; confusion is the normal state of affairs when you're trying something really hard, and when you're exploring the unknown, [...] you can trust us that we're all on the same team trying the figure this out together and don't worry about looking stupid, I'm confused over here too. 13:03 < jnewbery> I hope that's how people see review club. We're all just trying to be a little less confused 13:03 < michaelfolkson> Haha nice 13:03 < jnewbery> ok, first up: who managed to review the PR? (y/n) 13:03 < fode> n 13:03 < fjahr> y 13:03 < lightlike> y 13:03 < pinheadmz> cramming rn :-) But i have read through sipa taproot branch 13:03 < michaelfolkson> y 13:03 < jonatack> y 13:04 < nehan_> hi 13:04 < jnewbery> hi nehan_ 13:04 < nehan_> y 13:04 < jnewbery> For the folks that looked at the PRs, any high-level thoughts? Concept ACK/NACKs? 13:05 < jonatack> acked both 13:05 < jnewbery> or any general questions about the script interpreter in Bitcoin Core? 13:06 < emzy> n 13:06 < lightlike> concept ack 13:06 < andrewtoth> hi 13:06 < michaelfolkson> Concept ACK. I like the PR being broken down into smaller PRs as Nicolas said. My only concern is that this approach kind of assumes Taproot is inevitable (which we really shouldn't do) 13:06 < SirRichard> hi 13:06 < fjahr> code review ack for the refactor one, the op_if one I have not found anything yet but want to put a little more time into it 13:07 < jnewbery> michaelfolkson: I don't think that's true. These PRs don't leave the code in a worse place, even if taproot doesn't happen 13:07 < michaelfolkson> That's fair. sipa said the motivation for touching consensus wouldn't be there without Taproot 13:07 < jnewbery> I'd agree if the intermediate state made the code less clear or performant, but these changes are both improvements 13:07 < lightlike> actually I was meaning to ask what the relation of #16902 to taproot/schnorr is? 13:08 < pinheadmz> i agree with jnewbery the fact that witness versions are extensible up to 31 makes it reasonable to abstract out executeWitnessProgram 13:08 < michaelfolkson> Just a thought. I really like splitting it (makes it much more understandable for me) so I'm not arguing against this approach 13:08 < jnewbery> right - the calculus of whether to spend time on this changes if we think taproot will happen, but the changes are all good I think 13:09 < jnewbery> lightlike: good question! Anyone know why the OP_IF change is relevant for taproot? 13:10 < jonatack> the 201 opcode limit 13:10 < pinheadmz> hm is it because you only "pay" for sigops that you execute? 13:10 < michaelfolkson> Because Taproot no longer has limits on script size, number of op codes 13:10 < pinheadmz> (in taproot) 13:10 < jnewbery> michaelfolkson: yes! exactly 13:10 < jonatack> from the BIP "The maximum non-push opcodes limit of 201 per script does not apply." 13:11 < jonatack> and footnote 10: https://github.com/bitcoin/bips/blob/master/bip-0342.mediawiki#cite_note-10 13:11 < jnewbery> exactly - in tapscript, there can be more than 201 opcodes in a script, so any quadratic performance behaviour could be much worse 13:12 < michaelfolkson> That was a great blog post of Sergio's. I should check some of his other blogs out 13:12 < michaelfolkson> https://bitslog.com/2017/04/17/new-quadratic-delays-in-bitcoin-scripts/ 13:12 < jnewbery> ok, let's get into some of the more low-level questions 13:12 < jnewbery> The fuctions in ConditionStack() contain asserts. What happens if these asserts are hit? 13:13 < michaelfolkson> Asserts generally terminate the program if false. Is that what you mean? 13:13 < jonatack> assertion failed -> Aborted 13:14 < jonatack> (right?) 13:14 < jonatack> https://en.cppreference.com/w/cpp/error/assert 13:15 < jnewbery> hmmm, don't they get caught? 13:15 < jnewbery> or is that just throws? 13:16 < jonatack> can reverse one to find out :p 13:17 < jnewbery> ok, I'm wrong. asserts don't get caught in C++ 13:17 < jnewbery> (I told you I'm also confused) 13:17 < jnewbery> why do we expect these asserts never to be hit? 13:18 < lightlike> afaik the asserts are removed by the compiler if NDEBUG is not defined, so they should not be relied upon. 13:18 < jonatack> when i'm testing PRs I often add compile (static assert) or run time (assert) to check my assumptions and so far when they failed, it aborts 13:19 < jnewbery> lightlike: yes, I think you're right 13:19 < jonatack> lightlike: i'm not sure if that's a c++17 feature 13:20 < jonatack> (i mean, if it's c++17 and up or also with c++11) 13:20 < lightlike> jonatack: pretty sure that is a very old feature 13:20 < jnewbery> the asserts appear in `pop_back()` and `toggle_top()`, which are only called in one place each, where we've already checked the stack is non-empty 13:20 < jnewbery> https://github.com/bitcoin/bitcoin/pull/16902/files#diff-be2905e2f5218ecdbe4e55637dac75f3R561 13:20 < jnewbery> https://github.com/bitcoin/bitcoin/pull/16902/files#diff-be2905e2f5218ecdbe4e55637dac75f3R569 13:21 < jonatack> lightlike: agreed, i was thinking of this: "NDEBUG is defined at the point where assert is last defined or redefined (i.e., where the header <cassert> or <assert.h> was last included); (since c++17)" 13:21 < jnewbery> next question. This PR swaps out a vector of booleans for a pair of integers. How are we able to express a vector of booleans as a pair of integers? What observation about the use of the vector makes this possible? 13:21 < emzy> why using an assert insted of an error log entry? 13:22 < nehan_> why assert() instead of static_assert()? 13:23 < nehan_> as suggested in the developer docs 13:23 < jb55> lightlike: NDEBUG functionality is disabled in core. asserts happens with or without it enabled 13:23 < jonatack> jb55: ah 13:23 < lightlike> jb55: ok, thanks 13:24 < jonatack> nehan_: i think static asserts require a bool constexpr 13:24 < michaelfolkson> The two integers are size of implied stack (m_stack_size) and position of the first false value on the implied stack (m_first_false_pos) 13:24 < jnewbery> nehan_: I don't think it's possible to do a static_assert here because it's asserting on runtime behaviour 13:24 < jonatack> so they can't be dynamic values determined at run time... thought i could very well be confused, recently forgot that in a pr review i made 13:25 < jnewbery> emzy: I think this is a good description of what asserts are for: https://stackoverflow.com/a/28974029/933705 13:25 < nehan_> jnewbery: jonatack: ah thanks 13:25 < jnewbery> "The purpose of assert is the guarantee that the programmer doesn't make mistakes." 13:26 < lightlike> there was some trouble with misplaced asserts a some years ago with the Bitcoin Unlimited fork, see e.g. https://blog.erratasec.com/2017/03/assert-in-hands-of-bad-coders.html (with a comment of gmaxwell) 13:26 < jnewbery> michaelfolkson: yep, that's what they're being used for, but we're losing some information there. We don't know the state of all booleans on the stack. Why is that ok? 13:26 < michaelfolkson> So you'd use an assert over a test when it is critically important? 13:27 < nehan_> might be good to discuss bitcoin-core policy on when to assert vs. log or bump up failure. is that documented anywhere? 13:27 < jnewbery> asserts are a check on the programing logic. throws are for runtime error conditions that can be recovered from. 13:28 < jnewbery> the expectation is that asserts are *never* hit when running the code in the wild 13:28 < lightlike> so asserts, if they can somehow be triggered by our peers, have the potential to bring down large parts of the network. 13:28 < emzy> as I understand it is a assert is triggert it is not anymore save to execute the program and better to exit. 13:29 < fjahr> see also use of assert in the second pr where sipa explicitly comments that this code should never be hit 13:29 < jnewbery> nehan_: I'm not sure it is documented anywhere. It's a good question 13:29 < jnewbery> fjahr: I think that's slightly different, and is to quiet a build warning 13:30 < jnewbery> here: https://github.com/bitcoin/bitcoin/pull/18002/files#diff-be2905e2f5218ecdbe4e55637dac75f3R1464 13:30 < jonatack> agree, i see asserts as sanity checks that should never be hit under expected circumstances 13:30 < nehan_> jnewbery: to answer your question about why you don't need all booleans on the stack, it's because once you hit a false you will not execute anything after that 13:31 < MarcoFalke> It is documented a bit in ./src/util/check.h, I think 13:31 < jnewbery> nehan_: yes, exactly. The calling code can only do one of the following: 13:32 < jnewbery> - push/pop from the top of the stack 13:32 < jnewbery> - toggle the top of the stack 13:32 < jnewbery> - check if all values are true 13:33 < jnewbery> because of that limited interface to the stack, internally the stack doesn't need to track more information than stack depth and first false entry 13:33 < MarcoFalke> util/check.h defines a CHECK_NONFATAL, which actually throws, instead of terminating (like an assert) 13:33 < jonatack> MarcoFalke: NonFatalCheckError/CHECK_NONFATAL? 13:33 < jonatack> right 13:34 < MarcoFalke> Obviously we don't want to use that in consensus or validation code 13:34 < jnewbery> And to add a bit more confusion, we also have an 'AbortNode' function in logs and attempts to shutdown cleanly 13:35 < jnewbery> *in validation which logs and attempts to shutdown cleanly 13:35 < fjahr> jnewbery: I saw that but I understood it as the warning would be there "if the logic got messed up in future". I removed it now and did not see any new compiler warnings. 13:36 < jnewbery> fjahr: oh, I think you're right 13:37 < jnewbery> everyone happy with the optimized 'stack' implementation? 13:38 < jnewbery> here: https://github.com/bitcoin/bitcoin/pull/16902/files#diff-be2905e2f5218ecdbe4e55637dac75f3R297 13:39 < jnewbery> ok, next question: In the notes, I’ve written “If fExec is false, then we’re in a branch that shouldn’t be executed and we continue to the next iteration of the while loop.” That’s not quite the full story. When do we not continue to the next iteration while fExec is false? 13:39 < jonatack> We don't continue if fExec is false, when we are inside an OP_IF...OP_ENDIF conditional? 13:40 < jonatack> e.g. ... if (fExec || (OP_IF <= opcode && opcode <= OP_ENDIF)) 13:41 < jnewbery> jonatack: right. If there's we're in a non-executing branch and the next opcode is OP_IF, OP_NOTIF, OP_ELSE or OP_ENDIF, we drop into the switch statement 13:42 < jnewbery> https://github.com/bitcoin/bitcoin/blob/2bdc476d4d23256d8396bb9051a511f540d87392/src/script/interpreter.cpp#L348-L349 13:42 < jonatack> oof good, i can leave the ack :p 13:43 < jnewbery> we also execute everything above that, even if we're in the unexecuted branch: https://github.com/bitcoin/bitcoin/blob/2bdc476d4d23256d8396bb9051a511f540d87392/src/script/interpreter.cpp#L310-L347 13:44 < jnewbery> so if there's a non-existent opcode in the script, the script always fails (even if that non-existent opcode is in an unexecuted branch) 13:45 < jnewbery> ok, final question: Did you run the verify_script() benchmark? Does this PR make a difference to performance? 13:47 < jnewbery> Alright, anyone have any other questions? Thoughts? 13:47 < michaelfolkson> Sorry I didn't get that far to do that 13:48 < unbend> got here late is there a bouncer / log to check? ':0 13:48 < nehan_> PRs like this are terrifying because they touch consensus code 13:48 < jnewbery> nehan_: yeah, we don't touch this stuff much 13:49 < jnewbery> hopefully these two are small and contained enough that they're ok to review 13:49 < jnewbery> what would make it less terrifying? 13:49 < michaelfolkson> Any chance of any more PRs being rolled out of sipa's Taproot PR? 13:49 < lightlike> script seems to be even more terrifying than other consensus code like in validation 13:50 < jnewbery> michaelfolkson: I expect so. I plan to discuss them here if more PRs are opened 13:50 < jnewbery> lightlike: interesting. How so? 13:50 < michaelfolkson> The Taproot PR would be a monstrous PR review club in its current form haha 13:51 < lightlike> just my impression. there are quite some PRs changing validation code, while script is touched almost never. Possibly, the quality/accessibility of code is better in validation.cpp? 13:52 < fjahr> did not run the benchmarks yet but will do so before finishing the review 13:52 < jonatack> Sorry, the internet connection fitzed. I wanted to mention that I found the study notes really helpful. 13:53 < fjahr> we could also review specific commits of taproot in the club 13:53 < jnewbery> lightlike: maybe. There are ongoing projects that we want to do in validation.cpp (eg splitting net_processing/validation onto different threads). That stuff makes slow progression. 13:53 < jonatack> And kudos to MarcoFalke for the related fuzzing test to check today's PR. 13:53 < jnewbery> but we very rarely touch script except fo a softfork 13:53 < MarcoFalke> lol, that fuzz test is really simple 13:54 < MarcoFalke> The fuzzer should exhaust it in less than a minute. I was too lazy to write a unit test 13:54 < jnewbery> I had a quick look for unit tests for the script interpreter and didn't find anything that tested the if/else opcodes 13:54 < nehan_> question: suppose some nodes upgrade to use this new, faster code. is there any way the existence of the old nodes could cause a consensus failure, or upheaval on the network? they would take a longer time to validate a block that stressed this. 13:55 < jnewbery> nehan_: great question! 13:55 < jnewbery> does anyone have thoughts about that? 13:56 < jnewbery> we have 4 minutes left. Now's your chance if you've been shy so far 13:56 < fjahr> nehan_: you mean after taproot is activated? 13:57 < jonatack> nehan_: Older software running non-tapscript? The limit is in place and should be backward compatible, no? 13:57 < nehan_> I imagine old nodes won't understand taproot and so won't execute that script anyway, right? And perhaps the current opcode limit bounds the difference between the slowest un-upgraded node and the fastest upgraded node. 13:58 < jonatack> It's a soft fork, not a hard fork, so a narrowing with backward compat preserved unless i'm confused 13:58 < nehan_> old nodes won't execute taproot scripts which stressed that code, that is. 13:59 < nehan_> yes, i'm not suggesting there would be a fork because of rules. but imagine if a performance improvement made new nodes many many times faster than old nodes, and the old nodes couldn't keep up. 13:59 < jnewbery> nehan_: I think that's right. Pieter estimates in the PR notes that a block that fully exploits this slow behaviour can take up to 0.25 seconds to validate. That's not enough to cause any network partition concerns 14:00 < jnewbery> ok, that's time folks. Thanks for coming. See y'all next week 14:00 < jnewbery> #endmeeting 14:00 < nehan_> has anyone thought about the impact of different verification times across the network? maybe it's not important? 14:00 < nehan_> thanks! 14:00 < emzy> thanks jnewbery and all others! 14:01 < lightlike> thanks! 14:02 < jonatack> yes, my thinking as well 14:02 < fjahr> yeah, i think your right but it's interesting. the biggest difference in validation time would probably be between the fastest (hardware) non-taproot node and the slowest taproot node. but given that we have nodes with all kinds of different hardware on the network now and these have different validation times I think there is no real impact. 14:03 < nehan_> fjahr: i think you're right 14:04 < jonatack> Lost the connection again. Curious to catch up with the discussion. It seems to me though that these changes are strictly better. 14:05 < jonatack> Great question though!