Implement 64 bit arithmetic op codes in the Script interpreter (consensus)

Mar 20, 2024

https://github.com/bitcoin/bitcoin/pull/29221

Host: christewart - PR author: Christewart

Many future extensions of the bitcoin protocol - such as OP_TLUV - want to create smart contracts based on the amount of satoshis in a bitcoin output.

Unfortunately, Satoshi values can be up to 51 bits in value, but we can only do math on 32 bit values in Script.

This means we cannot safely do math on Satoshi values in the interpreter without 64bit arithmetic!

This PR introduces 64bit arithmetic op codes and a new (to the interpreter) number encoding.

How arithmetic works currently in Script

Bitcoin has an embedded programming language called Script. Script has op codes such as OP_ADD and OP_SUB that allow you to pop 2 elements off of the stack, perform the arithmetic operation and push the resulting value back onto the stack. For instance, if my Script is using the old op codes

Example of how arithmetic currently works

OP_1 OP_2 OP_ADD OP_3 OP_EQUAL

[Stack: ] --(OP_1)--> [Stack: 1] --(OP_2)--> [Stack: 1, 2] --(OP_ADD)--> [Stack: 3] --(OP_3)--> [Stack: 3, 3] --(OP_EQUAL)--> [Stack: true]

Explanation:

The initial state of the stack is empty.
OP_1 pushes the number 1 onto the stack.
OP_2 pushes the number 2 onto the stack.
OP_ADD pops the top two elements (2 and 1), adds them, and pushes the result (3) onto the stack.
OP_3 pushes the number 3 onto the stack.
OP_EQUAL pops the top two elements (3 and 3), compares them, and if they are equal, verification succeeds. Otherwise, verification fails.

In this case, since the values are equal, the verification succeeds, and the final state of the stack is empty.

Simple enough, now lets create a Script with some larger number values that would not be possible without this PR.

In this example, we are going to assume we are doing math on 1,000 BTC. In satoshis, this number is 100,000,000,000. Encoded as CScriptNum the hex representation for 1,000 BTC is 0x00e8764817

0x00e8764817 0x00e8764817 OP_ADD 0x00d0ed902e OP_EQUAL

[Stack: ] --(0x00e8764817)--> [Stack: 0x00e8764817] --(0x00e8764817)--> [Stack: 0x00e8764817, 0x00e8764817] --(OP_ADD)--> [Stack: OP_ADD ERROR]

Explanation:

The initial state of the stack is empty.
0x00e8764817 pushes the hexadecimal value 0x00e8764817 onto the stack.
0x00e8764817 pushes another instance of the same value onto the stack.
OP_ADD consumes the two top stack elements and FAILS with an overflow exception

This version fails because OP_ADD can only consume 4 byte inputs. Even worse, this does not give the Script programmer the ability to handle the exception thrown by CScriptNum.

How arithmetic works with #29221

Three key differences exist in how 64-bit opcodes function compared to their previous counterparts:

Enhanced Precision: They support 64 bits of precision, enabling more accurate arithmetic operations.
Error Handling Capability: These opcodes provide error handling by pushing either true or false onto the stack, depending on whether the operation succeeds or fails.
Standardized Encoding: They utilize a consistent fixed-length 8-byte number encoding format, aligning with conventions elsewhere in the Bitcoin codebase, such as in CTxOut::nValue.

As an illustration of the third difference, consider the encoding of 1,000 BTC. It would now be represented in the same format as seen on a block explorer (0x00e8764817000000) rather than 0x00e8764817 which is the CScriptNum encoding.

Example: Adding 1,000 BTC together with OP_ADD64

Here’s the same example from above with OP_ADD64 rather than OP_ADD with our new little endian encoding format rather than CScriptNum:

0x000e876481700000 0x000e876481700000 OP_ADD64 OP_DROP 0x001d0ed902e00000 OP_EQUAL

[Stack: ] --(0x00e8764817000000)--> [Stack: 0x00e8764817000000]
          --(0x00e8764817000000)--> [Stack: 0x00e8764817000000, 0x00e8764817000000]
          --(OP_ADD64)--> [Stack: 0x01d0ed902e000000, true]
          --(OP_DROP)--> [Stack: 0x01d0ed902e000000]
          --(0x01d0ed902e000000)--> [Stack: 0x01d0ed902e000000, 0x01d0ed902e000000]
          --(OP_EQUAL)--> [Stack: true]

Explanation:

The initial state of the stack is empty.
0x00e8764817000000 pushes the hexadecimal value 0x00e8764817000000 onto the stack (representing 100,000,000,000 satoshis).
Another instance of 0x00e8764817000000 is pushed onto the stack.
OP_ADD64 attempts to pop the top two elements (0x00e8764817000000 and 0x00e8764817000000) to add them. The correct result of the addition 0x01d0ed902e000000 (representing 200,000,000,000 satoshis) is pushed onto the stack first, followed by true, indicating that the arithmetic executed correctly.
OP_DROP drops the true pushed onto the stack by OP_ADD64 indicating the arithmetic operation was successfull.
0x001d0ed902e00000 pushes the hexadecimal value 0x001d0ed902e00000 onto the stack (representing 200,000,000,000 satoshis).
OP_EQUAL compares the two top stack values 0x001d0ed902e00000 and pushes true onto the stack

Design questions

Signed vs unsigned arithmetic

Much of the implementation uses code from the elements blockchain. In elements they implemented new arithmetic opcodes as fixed size 64 bit signed integers. Do we have a use case for using signed math rather than unsigned math? The satoshi example would work with unsigned math (outputs can’t have negative value) even though sats are encoded as int64_t in the bitcoin protocol. Signed integer overflow is undefined behavior in the cpp spec

Existing opcode interop

What is the best way to interop with existing op codes such as OP_WITHIN, OP_SIZE, OP_CHECKSIGADD, etc? They may be explicitly or implicitly converted:

Explicit conversion op codes

Elements and, as a by product, this PR implement explicit casting op codes. They are OP_SCRIPTNUMTOLE64, OP_LE64TOSCRIPTNUM, OP_LE32TOLE64.

This means a Script programmer must explicitly cast stack tops in an opcode. For instance, from our example above

0x000e876481700000 0x000e876481700000 OP_ADD64 OP_DROP OP_LE64TOSCRIPTNUM OP_SIZE OP_8 OP_EQUALVERIFY OP_SCRIPTNUMTOLE64 0x001d0ed902e00000 OP_EQUAL

Implicit conversion opcodes

You could redefine opcodes such as OP_WITHIN, OP_SIZE, OP_CHECKSIGADD to be context dependent on the SigVersion. Lets look at a potential implementation for OP_SIZE

case OP_SIZE:
{
    // (in -- in size)
    if (stack.size() < 1)
        return set_error(serror, SCRIPT_ERR_INVALID_STACK_OPERATION);

    if (sigversion == SigVersion::BASE || sigversion == SigVersion::WITNESS_V0 || sigversion == SigVersion::TAPROOT || sigversion == SigVersion::TAPSCRIPT) {
	//this is for backwards compatability, we always want to use the old numbering
	//system for already deployed versions of the bitcoin protocol
        CScriptNum bn(stacktop(-1).size());
        stack.push_back(bn.getvch());
    } else {
        // All future soft forks assume 64-bit math.
        // Don't push variable length encodings onto
        // the stack when we are using SigVersion::TAPSCRIPT_64BIT.
        int64_t result = stacktop(-1).size();
        push8_le(stack, result);
    }
}

The key here is the else clause which assumes that every SigVersion that is NOT specified in the if clause uses 64bit signed integer fixed length numbers. This removes the need for conversion/casting op codes and makes the developer experience much nicer, IMO.

Encoding debate

There is a debate ongoing along 2 dimensions

Whether fixed size encodings will encumber us for features introduced in future soft forks (such as 256bit scalar arithmetic)
Whether moving away from CScriptNum will be too disruptive to the ecosystem and force everyone to update their tooling.

I’m not going to go into further detail about this debate as its been written about at length on delving bitcoin