Script execution is an area of the code that is very rarely modified. As the
author notes in #18002 “As it touches consensus code, I don’t think this would
ordinarily meet the bar for review cost vs benefit. However, it simplifies the
changes for Taproot significantly, and [..] it’s going to be necessitated by
inclusion of that code … “
Generally, for consensus code, the if it ain’t broke, don’t fix it rule
prevails. We might also add if it is broke [in a non-critical way], probably
don’t fix it either because you might break consensus.
O(1) OP_IF/NOTIF/ELSE/ENDIF script implementation
Bitcoin Script is a stack language. A script consists of data elements that
can be pushed onto the stack, and opcodes which manipulate the stack
elements. See the Bitcoin wiki Script
page for a good description of the
language.
Script allows branching control flow by using the
OP_IF/OP_NOTIF/OP_ELSE/OP_ENDIF opcodes. These branches can
be nested multiple levels.
Script evaluation happens in EvalScript() in src/script/interpreter.cpp. The
evaluation iterates through the script elements (while (pc < pend)) and switches
on what that element is (switch (opcode)).
if/else branching is tracked by a boolean vector vfExec:
When execution encounters an OP_IF or OP_NOTIF opcode (and
therefore descends a level of if/else nesting), a boolean is pushed onto
this vector: true if we’re entering a branch that should be executed and
false for a branch that shouldn’t be executed.
When execution encounters an OP_ELSE opcode, the topmost boolean
in the vector is flipped.
When execution encounters an OP_ENDIF opcode, the topmost boolean in
the vector is popped off.
At the beginning of the while (pc < pend) loop, we determine whether we’re in
a branch that should be executed by examining the vfExec vector and setting
a fExec boolean: true if all of the elements in the vector are true, and false
otherwise. If fExec is false, then we’re in a branch that shouldn’t be
executed and we continue to the next iteration of the while loop.
This check involves iterating through a vector which is the length of the nested
if branches. The check is done for every element of the script. In the worst case
the time to do all checks grows quadratically. See Sergio Demian Lerner’s
blog
post
for a good description of the problem.
This quadratic behaviour is not currently a problem, since the maximum number of
opcodes that can be included in a script is 201. In the pessimal case, the quadratic
behaviour can be used to construct a block that takes 0.25s to validate. There
are other expensive-to-validate scripts that can be used to construct a block
that takes longer than that to validate.
Tapscript does not have a 201 opcode
limit.
If this behaviour is not fixed before taproot activation, it could therefore
be exploited to construct blocks that are very slow to validate.
This PR changes the boolean vector to be a pair of integers, which makes the
fExec check O(1) instead of O(n), and so removes the quadratic behaviour.
Abstract out script execution out of VerifyWitnessProgram()
This is a small refactor PR that abstracts out a part of the
VerifyWitnessProgram() function into its own function called
ExecuteWitnessProgram().
The functions in ConditionStack() contain asserts. What happens if these
asserts are hit?
How are we able to express a vector of booleans as a pair of integers? What
observation about the use of the vector makes this possible?
In the notes, I’ve written “If fExec is false, then we’re in a branch that
shouldn’t be executed and we continue to the next iteration of the while
loop.” That’s not quite the full story. When do we not continue to the next
iteration while fExec is false?
Did you run the verify_script() benchmark? Does this PR make a difference
to performance?
<jnewbery> Before we start, I'm always looking for volunteer hosts for these meetings. I'll help you choose a PR, and help you prepare notes and questions. Just let me know if you're interested and we'll slot you in.
<jnewbery> Hosting a review club meeting forces you to really understand a PR, so I think it's a really worthwhile learning experience. You can ask any of the previous hosts if you don't believe me
<jnewbery> We're all confused; confusion is the normal state of affairs when you're trying something really hard, and when you're exploring the unknown, [...] you can trust us that we're all on the same team trying the figure this out together and don't worry about looking stupid, I'm confused over here too.
<michaelfolkson> Concept ACK. I like the PR being broken down into smaller PRs as Nicolas said. My only concern is that this approach kind of assumes Taproot is inevitable (which we really shouldn't do)
<jonatack> when i'm testing PRs I often add compile (static assert) or run time (assert) to check my assumptions and so far when they failed, it aborts
<jnewbery> the asserts appear in `pop_back()` and `toggle_top()`, which are only called in one place each, where we've already checked the stack is non-empty
<jonatack> lightlike: agreed, i was thinking of this: "NDEBUG is defined at the point where assert is last defined or redefined (i.e., where the header <cassert> or <assert.h> was last included); (since c++17)"
<jnewbery> next question. This PR swaps out a vector of booleans for a pair of integers. How are we able to express a vector of booleans as a pair of integers? What observation about the use of the vector makes this possible?
<michaelfolkson> The two integers are size of implied stack (m_stack_size) and position of the first false value on the implied stack (m_first_false_pos)
<jnewbery> michaelfolkson: yep, that's what they're being used for, but we're losing some information there. We don't know the state of all booleans on the stack. Why is that ok?
<nehan_> jnewbery: to answer your question about why you don't need all booleans on the stack, it's because once you hit a false you will not execute anything after that
<jnewbery> because of that limited interface to the stack, internally the stack doesn't need to track more information than stack depth and first false entry
<fjahr> jnewbery: I saw that but I understood it as the warning would be there "if the logic got messed up in future". I removed it now and did not see any new compiler warnings.
<jnewbery> ok, next question: In the notes, I’ve written “If fExec is false, then we’re in a branch that shouldn’t be executed and we continue to the next iteration of the while loop.” That’s not quite the full story. When do we not continue to the next iteration while fExec is false?
<jnewbery> jonatack: right. If there's we're in a non-executing branch and the next opcode is OP_IF, OP_NOTIF, OP_ELSE or OP_ENDIF, we drop into the switch statement
<lightlike> just my impression. there are quite some PRs changing validation code, while script is touched almost never. Possibly, the quality/accessibility of code is better in validation.cpp?
<jnewbery> lightlike: maybe. There are ongoing projects that we want to do in validation.cpp (eg splitting net_processing/validation onto different threads). That stuff makes slow progression.
<nehan_> question: suppose some nodes upgrade to use this new, faster code. is there any way the existence of the old nodes could cause a consensus failure, or upheaval on the network? they would take a longer time to validate a block that stressed this.
<nehan_> I imagine old nodes won't understand taproot and so won't execute that script anyway, right? And perhaps the current opcode limit bounds the difference between the slowest un-upgraded node and the fastest upgraded node.
<nehan_> yes, i'm not suggesting there would be a fork because of rules. but imagine if a performance improvement made new nodes many many times faster than old nodes, and the old nodes couldn't keep up.
<jnewbery> nehan_: I think that's right. Pieter estimates in the PR notes that a block that fully exploits this slow behaviour can take up to 0.25 seconds to validate. That's not enough to cause any network partition concerns
<fjahr> yeah, i think your right but it's interesting. the biggest difference in validation time would probably be between the fastest (hardware) non-taproot node and the slowest taproot node. but given that we have nodes with all kinds of different hardware on the network now and these have different validation times I think there is no real impact.