Many future extensions of the bitcoin protocol - such as OP_TLUV - want to create smart contracts based on the amount of satoshis in a bitcoin output.
Unfortunately, Satoshi values can be up to 51 bits in value, but we can only do math on 32 bit values in Script.
This means we cannot safely do math on Satoshi values in the interpreter without 64bit arithmetic!
This PR introduces 64bit arithmetic op codes and a new (to the interpreter) number encoding.
How arithmetic works currently in Script
Bitcoin has an embedded programming language called Script. Script has op codes such as OP_ADD and OP_SUB
that allow you to pop 2 elements off of the stack, perform the arithmetic operation and push the
resulting value back onto the stack. For instance, if my Script is using the old op codes
OP_ADD pops the top two elements (2 and 1), adds them, and pushes the result (3) onto the stack.
OP_3 pushes the number 3 onto the stack.
OP_EQUAL pops the top two elements (3 and 3), compares them, and if they are equal, verification succeeds. Otherwise, verification fails.
In this case, since the values are equal, the verification succeeds, and the final state of the stack is empty.
Simple enough, now lets create a Script with some larger number values that would not be possible without this PR.
In this example, we are going to assume we are doing math on 1,000 BTC. In satoshis, this number is 100,000,000,000.
Encoded as CScriptNum the hex representation for 1,000 BTC is 0x00e8764817
Three key differences exist in how 64-bit opcodes function compared to their previous counterparts:
Enhanced Precision: They support 64 bits of precision, enabling more accurate arithmetic operations.
Error Handling Capability: These opcodes provide error handling by pushing either true or false onto the stack, depending on whether the operation succeeds or fails.
Standardized Encoding: They utilize a consistent fixed-length 8-byte number encoding format, aligning with conventions elsewhere in the Bitcoin codebase, such as in CTxOut::nValue.
As an illustration of the third difference, consider the encoding of 1,000 BTC. It would now be represented in the same format as seen on a block explorer (0x00e8764817000000) rather than 0x00e8764817 which is the CScriptNum encoding.
Example: Adding 1,000 BTC together with OP_ADD64
Here’s the same example from above with OP_ADD64 rather than OP_ADD with our new little endian encoding format rather than CScriptNum:
0x00e8764817000000 pushes the hexadecimal value 0x00e8764817000000 onto the stack (representing 100,000,000,000 satoshis).
Another instance of 0x00e8764817000000 is pushed onto the stack.
OP_ADD64 attempts to pop the top two elements (0x00e8764817000000 and 0x00e8764817000000) to add them. The correct result of the addition 0x01d0ed902e000000 (representing 200,000,000,000 satoshis) is pushed onto the stack first, followed by true, indicating that the arithmetic executed correctly.
OP_DROP drops the true pushed onto the stack by OP_ADD64 indicating the arithmetic operation was successfull.
0x001d0ed902e00000 pushes the hexadecimal value 0x001d0ed902e00000 onto the stack (representing 200,000,000,000 satoshis).
OP_EQUAL compares the two top stack values 0x001d0ed902e00000 and pushes true onto the stack
Design questions
Signed vs unsigned arithmetic
Much of the implementation uses code from the elements blockchain. In elements they implemented new arithmetic opcodes as fixed size 64 bit signed integers.
Do we have a use case for using signed math rather than unsigned math? The satoshi example would work with unsigned math (outputs can’t have negative value) even though sats are encoded
as int64_t in the bitcoin protocol. Signed integer overflow is undefined behavior in the cpp spec
Existing opcode interop
What is the best way to interop with existing op codes such as OP_WITHIN, OP_SIZE, OP_CHECKSIGADD, etc? They may be explicitly or implicitly converted:
Explicit conversion op codes
Elements and, as a by product, this PR implement explicit casting op codes. They are OP_SCRIPTNUMTOLE64, OP_LE64TOSCRIPTNUM, OP_LE32TOLE64.
This means a Script programmer must explicitly cast stack tops in an opcode. For instance, from our example above
You could redefine opcodes such as OP_WITHIN, OP_SIZE, OP_CHECKSIGADD to be context dependent on the SigVersion. Lets look at a potential implementation for OP_SIZE
caseOP_SIZE:{// (in -- in size)if(stack.size()<1)returnset_error(serror,SCRIPT_ERR_INVALID_STACK_OPERATION);if(sigversion==SigVersion::BASE||sigversion==SigVersion::WITNESS_V0||sigversion==SigVersion::TAPROOT||sigversion==SigVersion::TAPSCRIPT){//this is for backwards compatability, we always want to use the old numbering//system for already deployed versions of the bitcoin protocolCScriptNumbn(stacktop(-1).size());stack.push_back(bn.getvch());}else{// All future soft forks assume 64-bit math.// Don't push variable length encodings onto// the stack when we are using SigVersion::TAPSCRIPT_64BIT.int64_tresult=stacktop(-1).size();push8_le(stack,result);}}
The key here is the else clause which assumes that every SigVersion that is NOT specified in the if clause uses 64bit signed integer fixed length numbers.
This removes the need for conversion/casting op codes and makes the developer experience much nicer, IMO.
Encoding debate
There is a debate ongoing along 2 dimensions
Whether fixed size encodings will encumber us for features introduced in future soft forks (such as 256bit scalar arithmetic)
Whether moving away from CScriptNum will be too disruptive to the ecosystem and force everyone to update their tooling.
I’m not going to go into further detail about this debate as its been written about at length on delving bitcoin
<Chris_Stewart_5> I put some examples of how Script currently works on the bitcoincore.reviews webpage. I used chatGPT to generate some (hopefully) readable ASCII art to show how Script and the stack work together
<Chris_Stewart_5> Guest93: The interpreter takes in instructions (such as 'OP_ADD') and data (such as a encoded numbers) and manipulates the stack based on the given instruction.
<Chris_Stewart_5> stickies-v: I actually don't know why 32 bits specifically. That is a good historical Q that I will have to look up. I've done a bit of archaeology on where we got CScriptNum from (I believe openSSL, still confirmign though).
<Chris_Stewart_5> ion-: Thats why I've got you fine folks to review my (and the elements' team) work. This implementation is mostly pulled over from the elements blockchain: https://github.com/ElementsProject/elements/
<Chris_Stewart_5> stickies-v: Yes! So the key point here is we already have carve outs for specific opcodes we have implemented that are time sensitive. For others following along, I recommend reading the comments in the c++ codebase for 'why we need 5 byte inputs for numbers related to time'
<abubakarsadiq> Also I have a question while reading the BIP why are we introducing new opcodes that does the same thing with current opcodes but with 64 bit values, why not just upgrade the old ones to support both 32 and 64 bit?
<Chris_Stewart_5> abubakarsadiq: Ok this is a great question, and i'm currently prototyping an implementation that does just this. I've been asking myself the same question lately that it may not be necessary to make new opcodes, rather re-purpose old ones.
<Chris_Stewart_5> abubakarsadiq: I'm going to table discussion on that topic for now as it is not what is in #29221, but if you would like to see what that looks like follow my work on this branch: https://github.com/Christewart/bitcoin/tree/64bit-arith-implicit
<stickies-v> (those seem to be the only 2 use cases btw, everything compiles when patching those 2 lines in OP_CHECKLOCKTIMEVERIFY and OP_CHECKSEQUENCEVERIFY and then removing nMaxNumSize altogether)
<glozow> PR 5065 links to BIP62, which mentions "zero-padded number pushes" as a source of malleability. So we require minimal representation i.e. no zero-padding
<Chris_Stewart_5> glozow: Yes! Before hand we were vulnerable to malleability attacks. Since CScriptNum has a _variable length_ encoding, numbers can be represented multiple ways
<Chris_Stewart_5> A wise guy on on the p2p network would modify your zero encoding and change your txid making you (potentially) lose track of your transaction. This is allegedly what took MtGox down
<Chris_Stewart_5> stickies-v: Exactly. The current implementation in #29221 uses a _fixed length_ number encoding rather than a _variable length_ number encoding used by CScriptNum
<stickies-v> what's the rationale behind the fixed length approach? is it mostly to make implementation simpler (and thus more bug-proof), at the cost of (slightly? i don't know) higher scripts?
<Chris_Stewart_5> stickies-v: Exactly. In raw bytes, Scripts will be larger. I did an analysis of the mainnet blockchain and found that it would be ~1GB (0.17%) larger if this proposal was enacted from the genesis block.
<Chris_Stewart_5> stickies-v: In terms of analyzing blockchain usage patterns, it is difficult to say what the size increase would be. It would be smaller than the fixed length proposal, but since we cannot use 8 byte CScriptNums, we can't make assumptions about how much larger the chain would be since genesis
<Chris_Stewart_5> Q: The Script in the Explicit conversion op codes section will not work. Can you guess why? Hint: it has something to do with OP_LE64TOSCRIPTNUM.
<Chris_Stewart_5> glozow: Yes! So this presents a fundamental problem with the design of this PR currently, and is why i'm working on alternative designs
<Chris_Stewart_5> The problem is, 'How do I get the _new_ number format that supports 8 bytes to interop with legacy opcodes that only support 4 bytes (5 bytes in the case of the locktime op codes)
<Chris_Stewart_5> stickies-v: That is a great idea. Although it doesn't solve the fundamental problem, it at least introduces error handling capability
<Chris_Stewart_5> stickies-v: If you were writing a production Script, the OP_DROP should be replaced with OP_IF OP_ELSE and then you handle the failure case in the OP_ELSE clause. Since these are demo Scripts I cheated a bit :-)
<Chris_Stewart_5> Do people understand the fundamental problem introduced by a _larger_ (8 bytes, in our case) number format than the existing number format (4 bytes, occasionally 5 bytes)? This is a really key point and i'm happy to answer any more questions on the topic since it is absolutely crucial to understand imo
<stickies-v> sorry - unrelated question, is there any debate around whether these new opcodes should indeed return a success code? is this something that's absolutely required for the proposal to work?
<stickies-v> i think it's nice that we allow scripts to gracefully handle overflows btw, but the downside is that we're forcing the cost even for scripts that don't require it, so just wondering how essential that is?
<Chris_Stewart_5> stickies-v: I don't think its absolutely required, no. It provides better developer ergonomics, imo. But that is a personal preference ig. FWIW, that was a design choice I pulled over from elements.
<stickies-v> if we're modifying the current opcodes to allow both 32 and 64 bit arithmetic, as suggested earlier, perhaps we can skip the success codes and _potentially_ add OP_ADDSAFE, OP_SUBSAFE etc opcodes in the future if there's developer demand?
<Chris_Stewart_5> stickies-v: I don't believe it is necessary to retain _old_ semantics for _new_ soft forks. I'm working on this implementation, so I haven't 100% confirmed it yet.
<Chris_Stewart_5> For instance, pre TAPSCRIPT_64BIT we throw exceptions when there is overflows with OP_ADD, but if `sigversion == SigVersion::TAPSCRIPT_64BIT` we can redefine semantics to push true/false onto the stack, accept upto 8 byte numeric inputs etc
<Chris_Stewart_5> to add even more fuel to the fire, I believe (again, haven't coded to confirm) that we could use this same mechanism to extend OP_ADD in the future to accept even bigger inputs, such as 256bit scalars. This is something that people are already wanting it seems on delvingbitcoin
<Chris_Stewart_5> Thank you everyone for coming out and asking GREAT questions. I'm happy to keep the convo going on irc, twitter, github etc. Don't hesitate to reach out!
<stickies-v> i'm very unfamiliar with script, so apologies if this doesn't make sense, but i guess where i'm getting at is it seems like this PR is trying to do 2 things: introduce 64 bit arithmetic, and allow scripts to handle overflows through success codes, and maybe it's better to do those separately and just do 64 bit here?
<Chris_Stewart_5> stickies-v: That is a very reasonable take. Unfortunately with the pace we deploy soft forks (every ~4 years), you have a tendency to want to cram as much in as possible :-)