Apply strict verification flags for transaction tests and assert backwards compatibility (tests, consensus)

https://github.com/bitcoin/bitcoin/pull/19698

Host: glozow  -  PR author: glozow

The PR branch HEAD was 110239f at the time of this review club meeting.

Notes

  • Bitcoin script allows users to attach spending conditions to transaction outputs. A “locking” script in a transaction output specifies some spending condition, such as providing a public key that hashes to a specified digest and a valid signature for that public key. An “unlocking” script in a transaction input supplies the operations and/or data to satisfy that condition. Transaction validation includes interpreting these scripts and running their specified operations such as verifying signatures, hashing data, checking timestamps, etc.

  • The Bitcoin script interpreter includes code that understands all types of scripts and accepts script verification flags to adjust which rules to enforce. Determining which flags to apply is very important.

    • Example 1: When a node validates a block, it determines what the consensus rules are (see GetBlockScriptFlags, which also serves as a little tour of Bitcoin consensus history) before validation. During Initial sync, the consensus rules to apply will change as the block chain advances.

    • Example 2: Bitcoin Core nodes also enforce policy (or “standardness”) rules in addition to consensus rules when accepting transactions into their mempool by applying additional script verification flags.

    • The SCRIPT_VERIFY_DISCOURAGE_UPGRADEABLE_NOPS policy flag prevents a node from accepting a transaction into its mempool if that transaction uses an upgradeable OP_NOP. During activation for CHECKLOCKTIMEVERIFY, for example, an in-mempool transaction using NOP2 could have been invalidated between blocks. “Discouraging” transactions in this way is preferable to re-validating all transactions in the mempool after every block.

    • PR #10699 describes a procedure for adding new consensus rules via upgradable NOPs or witness versions which incorporates this idea.

  • PR #10699 established the precedent that all script verification flags must be soft forks, i.e. more script verification flags = more strict spending rules.

  • PR #19698 extends the current transaction unit tests by asserting that each test applies flags precisely by asserting an opposite result when flags are removed or added. The goal is to ensure that valid transaction tests don’t pass due to a lack of flags and invalid transaction tests don’t fail due to an excess of flags.

Questions

  1. What is the difference between PolicyScriptChecks and ConsensusScriptChecks?

  2. Since the code for taproot is already in interpreter.cpp, why aren’t taproot rules being enforced yet (what condition in the code would it exit on)? How would we invoke VerifyWitnessProgram to make it apply taproot spending rules?

  3. This PR edits some of the CheckLockTimeVerify tests by removing the 1 (push a 1 to the stack) at the end of the scripts. However, it doesn’t do this for all of them (example). Why? (Hint: try removing it and running the transaction tests, src/test/test_bitcoin --log_level=all --run_test=transaction_tests)

  4. What does TrimFlags do?

  5. What does it mean for script verify flags to be “minimal” and “maximal?”

  6. How do the tests check that valid transaction tests have maximal/minimal flags?

  7. What does it mean for script verify flags to be “soft forks?” How do the tests check this?

Meeting Log

  117:57 <glozow> ____ ____ ____ ____ _____
  217:57 <glozow> / __"| u U /"___| U | _"\ u ___ U| _"\ u |_ " _| 🏴‍☠️
  317:57 <glozow> <\___ \/ \| | \| |_) |/ |_"_| \| |_) |/ | |__/
  417:58 <glozow> u___) | | |___ | _ < | | | __/ /| |
  517:58 <glozow> |____/>> \____| |_| \_\ U/| |\U |_| u |_|
  617:58 <glozow> )) (__) // \\ // \\_ .-,_|___|_,-. ||>>_ _// \\_
  717:58 <glozow> (__) (__)(__) (__) (__) \_)-' '-(_/ (__)__) (__) (__)
  817:58 <glozow> (in 2 minutes)
  917:58 <jnewbery> \o/
 1017:59 <MarcoFalke> I think I am using the wrong IRC client to see ascii art
 1117:59 <willcl_ark> very nifty
 1218:00 <jnewbery> #startmeeting
 1318:00 <sequel> fixed width font helps
 1418:00 <MarcoFalke> copy-paste in vim helped :)
 1518:00 <glozow> hi everyone!
 1618:00 <willcl_ark> hi
 1718:00 <sequel> hi
 1818:00 <robot-dreams> hi
 1918:00 <MarcoFalke> hi
 2018:00 <elle> hi!
 2118:00 <mango> hi
 2218:00 <thomasb06> hi
 2318:00 <dhruvm> hi
 2418:00 <michaelfolkson> hi
 2518:00 <schulzemic> hi
 2618:00 <emzy> hi
 2718:00 <jnewbery> hi all. Welcome to review club! Feel free to say hi to let everyone know you're here
 2818:00 <cangr> hi
 2918:00 <jnewbery> anyone here for the first time?
 3018:01 <schulzemic> yes
 3118:01 <glozow> Welcome schulzemic! ☜(゚ヮ゚☜)
 3218:01 <emzy> MarcoFalke: hehe, that was what I also did.
 3318:01 <sequel> welcome schulzemic
 3418:01 <murch> hi
 3518:01 <cangr> yes
 3618:01 <jnewbery> welcome schulzemic :)
 3718:01 <schulzemic> Thanks!
 3818:01 <jnewbery> glozow is hosting today, but I just wanted to mention something first
 3918:01 <jnewbery> I'm currently looking for hosts for future review club meetings.
 4018:02 <jnewbery> I've got a lot of commitments right now, so anyone volunteering to host would be really helping me!
 4118:02 <jnewbery> I can help you prepare/review notes and questions, and help you get ready for the meeting.
 4218:02 <jnewbery> If you've been coming to the meetings for a few weeks/months, then you're ready to host! Take a look at all the people who've hosted before you here: https://bitcoincore.reviews/meetings-hosts/.
 4318:02 <jnewbery> I find that teaching something is one of the best ways of learning. Hosting a review club will force you to understand the PR that week. So if you're serious about improving your understanding of Bitcoin Core, then I'd highly recommend hosting a review club.
 4418:02 <jnewbery> Feel free to message me after the meeting in this channel or privately.
 4518:03 <jnewbery> ok, over to glozow for the fun part..
 4618:03 <glozow> woo thanks jnewbery! Today we’re talking about PR #19698 https://github.com/bitcoin/bitcoin/pull/19698
 4718:03 <robot-dreams> Welcome cangr and thanks jnewbery!
 4818:03 <glozow> Did anyone get the chance to review the PR? (y/n)
 4918:03 <anir> y
 5018:03 <jnewbery> y
 5118:03 <willcl_ark> y
 5218:03 <elle> y
 5318:03 <thomasb06> y
 5418:03 <glozow> (☞゚ヮ゚)☞ cangr welcome!
 5518:03 <robot-dreams> y-ish but want to take a closer look after the meeting
 5618:04 <murch> y
 5718:04 <cangr> thanks
 5818:04 <emzy> y (reading it and running make check)
 5918:04 <glozow> I just hope that everyone learns something, so feel free to ask any questions :) a lot of the questions today are also more conceptual than implementation
 6018:04 <glozow> Let’s get this party started! First warm-up question: What is the difference between `PolicyScriptChecks` and `ConsensusScriptChecks`?
 6118:05 <willcl_ark> As I understand it, policies are soft local rules regarding what our node will create and accept. Consensus rules by contrast are hard limits which define what we accept as valid transactions and blocks on the network.
 6218:05 <glozow> link to code btw: https://github.com/bitcoin/bitcoin/blob/a35b948836db20fab9b48d3b77cf9f23ffee109a/src/validation.cpp#L924
 6318:05 <pinheadmz> policy is the ruleset that validates entry into the mempool, consensus is the ruleset that validates entry into blocks -- confusing but mempool rules are always *more* restrictive!
 6418:05 <thomasb06> PolicyScriptChecks is executed first, and ConsensusScriptChecks only checks if the script doesn't pass because of flags
 6518:06 <dhruvm> Policy is the prerogative of the node, Consensus is what the network expects and considers honest.
 6618:06 <willcl_ark> and we use policy checks as a "cheap" DOS prevention before running expensive? consensus checks
 6718:06 <glozow> willcl_ark pinheadmz dhruvm correct about policy vs consensus! can anyone tell me what the differences are in the code?
 6818:06 <sequel> policy screens on the node level, consensus on the network level (should never make it into the chain)
 6918:07 <michaelfolkson> If a transaction doesn't follow consensus rules then it would fork the chain if included a block
 7018:07 <glozow> willcl_ark not really, they are both script checks and aren't that different in terms of computational complexity
 7118:07 <sipa> sdaftuar just wrote a great answer on this question: https://bitcoin.stackexchange.com/questions/100317/what-is-the-difference-between-policy-and-consensus-when-it-comes-to-a-bitcoin-c
 7218:08 <glozow> sipa: that's an awesome post
 7318:08 <michaelfolkson> That was amazing. Thanks for that sdaftuar
 7418:08 <pinheadmz> in the code, there are bitfields like this one https://github.com/bitcoin/bitcoin/blob/a35b948836db20fab9b48d3b77cf9f23ffee109a/src/policy/policy.h#L60 that define a set of rules to check
 7518:08 <sipa> glozow: it depends, i think
 7618:08 <willcl_ark> sipa: oh wow that is a nice answer
 7718:09 <sipa> some policy checks (specifically resource limit related ones, size of scripts, ...) let us avoid doing signature checks before it's needed
 7818:09 <sipa> policy is really a whole bunch of unrelated things
 7918:09 <glozow> ah that's true
 8018:09 <glozow> most of those are in PreChecks though, I believe
 8118:10 <murch> Policy checks only apply to unconfirmed transactinos, though, right?
 8218:10 <MarcoFalke> (mempool) policy even includes non-script reject reasons like tx dependencies
 8318:10 <michaelfolkson> And so we try to do checks from cheapest to most expensive in that order?
 8418:10 <pinheadmz> and IIRC sipa, for soft forks the rules apply to the mempool (policy) before the actual soft fork activation. Does that hold true for taproot? For example with the next release of bitcoind, taproot rules will actually apply in the mempool even though they are not enforced in blocks yet?
 8518:10 <sipa> glozow: oh, this is just about the script validation flags for policy? in that case you're right
 8618:10 <glozow> what I wanted to get to was `PolicyScriptChecks` and `ConsensusScriptChecks` both call `CheckInputScripts`, but with different script verification flags :)
 8718:10 <MarcoFalke> murch: yes
 8818:10 <robot-dreams> One question about the the code difference, why does Consensus use CheckInputsFromMempoolAndCache?
 8918:11 <pinheadmz> actually answer my own question: https://github.com/bitcoin/bitcoin/blob/a35b948836db20fab9b48d3b77cf9f23ffee109a/src/policy/policy.h#L76
 9018:11 <sipa> pinheadmz: actually, no: https://github.com/bitcoin/bitcoin/pull/20165
 9118:13 <murch> Hehe, is everyone busy reading Suhas's post or why did we stop? :D
 9218:13 <glozow> robot-dreams: I think it's to use cached coins or signature verification result, I'd have to check more closely
 9318:13 <glozow> ok yes, next question: Since the code for taproot is already in interpreter.cpp, why aren’t taproot rules being enforced yet (what condition in the code would it exit on)? How would we invoke `VerifyWitnessProgram` to make it apply taproot spending rules?
 9418:13 <glozow> Code we’re looking at: https://github.com/bitcoin/bitcoin/blob/2ee954daaee5758d04935492139808cb6fd95233/src/script/interpreter.cpp#L1885-L1926
 9518:14 <thomasb06> (ko)
 9618:15 <pinheadmz> well this is interesting, because if you look at SCRIPT_VERIFY_TAPROOT -- it is included in the standard verify flags in the code i posted, but then those txs are actually not relayed because of the code sipa posted -- right?
 9718:15 <willcl_ark> Looks like a Taproot tx would currently fail at src/validation.cpp#MemPoolAccept::PreChecks#L696
 9818:15 <jnewbery> robot-dreams: that's an excellent question. There's a comment here: https://github.com/bitcoin/bitcoin/blob/a35b948836db20fab9b48d3b77cf9f23ffee109a/src/validation.cpp#L959-L975 that explains why that function is being called
 9918:16 <willcl_ark> We should call VerifyWitnessProgram with `witversion=1` to make it apply taproot rules
10018:17 <robot-dreams> I'm guessing a taproot transaction would not make it into the mempool due to the policy flag `SCRIPT_VERIFY_DISCOURAGE_UPGRADABLE_TAPROOT_VERSION`
10118:18 <sipa> no, that's for unknown leaf versions
10218:18 <sipa> the only defined leaf version is 0xc0 for tapscript currently
10318:19 <glozow> is there a condition within that control block that would exit early if we... say... didn't have a script verification flag set? :P
10418:19 <elle> im assuming that the SCRIPT_VERIFY_TAPROOT bit is not yet added to the flags variable. Meaning that this line is false: if (VersionBitsState(pindex->pprev, consensusparams, Consensus::DEPLOYMENT_TAPROOT, versionbitscache) == ThresholdState::ACTIVE) {
10518:20 <elle> (https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L1861)
10618:20 <glozow> elle: ding ding ding!!! winner winner
10718:20 <glozow> today is all about script verification flags, in case y'all haven't noticed :)
10818:20 <pinheadmz> well but isnt it included in standard flags? https://github.com/bitcoin/bitcoin/blob/a35b948836db20fab9b48d3b77cf9f23ffee109a/src/policy/policy.h#L76
10918:21 <glozow> not consensus
11018:21 <pinheadmz> aha, missed the context :-P
11118:22 <glozow> and as elle pointed out, whenever we validate blocks, we get our consensus rules in the form of flags in `GetBlockScriptFlags`: https://github.com/bitcoin/bitcoin/blob/04670ef81ea2300fcba4e1a492c4c6b0e0752848/src/validation.cpp#L1822
11218:22 <glozow> okie dokie
11318:22 <sipa> in the mempool, taproot script validation currently happens in 0.21... except that as long as it isn't active on the network, a pre-script policy check will reject it before the script interpreter even runs
11418:22 <glozow> Let’s dive into the test JSON files :D This PR edits some of the CTV and CSV tests by removing the `1` (push a 1 to the stack) at the end of the scripts. However, it doesn’t do this for all of them. Why?
11518:22 <sipa> CLTV?
11618:22 <glozow> CLTV OOPS
11718:22 <glozow> no CTV yet
11818:22 <michaelfolkson> I missed the CTV soft fork happened ;)
11918:23 <jnewbery> glozow: is it something to do with the V part of CLTV?
12018:24 <robot-dreams> Wild guess, is it because you now want maximal flags and are enforcing `CLEANSTACK`?
12118:24 <murch> robot-dreams: That would explain why they are removed, but not why some were left in
12218:25 <glozow> btw here's an example of a test that _isn't_ getting its 1 removed: https://github.com/bitcoin/bitcoin/pull/19698/files#diff-7e4229911841f1d419c71a0d0df95feb07b77f90c0ff39f09182eb8ca50779b9L199
12318:25 <murch> Or maybe it does
12418:26 <glozow> what does the script "0 CHECKLOCKTIMEVERIFY 1" do?
12518:26 <glozow> vs a script "499999999 CHECKLOCKTIMEVERIFY 1" (which we're removing the 1 from)?
12618:27 <sipa> the first leaves "0 1" on the stack if it succeeds; the second leaves "499999999 1" on the stack
12718:27 <nehan> why are the policy script checks defined in STANDARD_SCRIPT_VERIFY_FLAGS in policy.h but the consensus script checks are enumerated separately in the function GetBlockScriptFlags?
12818:27 <pinheadmz> sipa CLTV doesnt pop a number off the stack when it compares to the tx locktime ?
12918:27 <willcl_ark> Well we need the script to end non-zero, which the first won't do without the final `1`
13018:28 <sipa> pinheadmz: it cannot! it's a redefined NOP
13118:28 <glozow> pinheadmz: nope! why :P
13218:28 <glozow> AW MAN SIPA U GAVE IT AWAY
13318:28 <jnewbery> if an opcode has verify in the name, then it leaves the stack unchanged
13418:28 <sipa> sowwwy
13518:28 <glozow> NOPs can't touch the stack!
13618:28 <sipa> jnewbery: wrong
13718:28 <jnewbery> oh really?
13818:28 <sipa> CHECKSIGVERIFY definitely modifies the stack
13918:28 <jnewbery> yes, totally wrong
14018:28 <jnewbery> ignore me
14118:28 <sipa> /ignore jnewbery
14218:28 <michaelfolkson> That's generally right though
14318:28 <michaelfolkson> Just some counterexamples
14418:29 <glozow> ok we can still use our brains a little bit... WHY can't upgradeable NOPs touch the stack?
14518:29 <pinheadmz> if the CLTV failed though, these examples would fail technically with n empty stack because the 1 at the end would never get pushed to the stack
14618:29 <murch> Because that would be a hardfork
14718:29 <willcl_ark> so that old clients don't fail the scripts
14818:29 <pinheadmz> glozow bc they were soft forked in! i shouldve known
14918:29 <robot-dreams> nehan: I think it's because consensus checks depends on which block you're on (so you can't just set hardcoded flags)
15018:29 <glozow> yaaaaaaas
15118:30 <pinheadmz> oops i mean they would fail without the 1 but still the first number bc of the NOPness
15218:30 <glozow> and yes, sorry nehan, i forgot to get back to you on your question - block script flags change from block to block, so we re-calculate them every time
15318:30 <sipa> robot-dreams: yeah, policy just always enables everything we know about; consensus flags depend on context
15418:30 <glozow> thanks robot-dreams :)
15518:30 <nehan> robot-dreams: thanks!
15618:30 <glozow> alrighty, so why don't we take off the 1 at the end of "0 CHECKLOCKTIMEVERIFY 1" ?
15718:31 <glozow> (this is a valid tx btw)
15818:31 <willcl_ark> we need non-zero stack at the end
15918:31 <glozow> willcl_ark: exactly!
16018:31 <MarcoFalke> some consensus flags can be and have been backdated to the genesis block, though
16118:32 <willcl_ark> but does `499999999 CHECKLOCKTIMEVERIFY` not leave us with nothing too?
16218:32 <glozow> well, what does 499999999 evaluate to?
16318:32 <sipa> pinheadmz: if the CLTV fails the script immediately aborts entirely with failure; no code after it is executed anymore, but that's not really relevant if we've already aborted
16418:32 <willcl_ark> ah it just gives 499999999 I see now
16518:32 <willcl_ark> was looking at the wrong opcode
16618:32 <glozow> 👍
16718:32 <glozow> NEXT! What does `TrimFlags` do? (Extra credit question: why does SCRIPT_VERIFY_CLEANSTACK need to be used with SCRIPT_VERIFY_P2SH and SCRIPT_VERIFY_WITNESS?)
16818:33 <pinheadmz> MarcoFalke example ?
16918:33 <MarcoFalke> pinheadmz: the witness v0 script flags
17018:33 <glozow> link to the TrimFlags code: https://github.com/bitcoin/bitcoin/blob/110239f2ff673eaea8f59c650792f3641855263d/src/test/transaction_tests.cpp#L138-L148
17118:33 <robot-dreams> I'm a bit behind... I understand why we cannot take off the 1 at the end of "0 CLTV 1", but just to confirm, why DO we take off the 1 at the end of "499999999 CLTV 1"?
17218:34 <MarcoFalke> pinheadmz: ok wait, they aren't enabled for the p2sh exception, so still depend on context
17318:34 <sipa> robot-dreams: what would the script do if you left the "1" off?
17418:34 <glozow> robot-dreams: we'd like a clean stack, and in that case we'd have 4999999 and 1 at the end
17518:34 <sipa> robot-dreams: oh, i misread
17618:34 <glozow> would anyone who's been quiet today like to tell us what CLEANSTACK means? :)
17718:35 <murch> robot-dreams: I am also still wondering that.
17818:35 <sipa> glozow: you could turn it into "0 CLTV DROP 1" if you wanted cleanstack for the first :)
17918:35 <felixweis> it means that after all opcodes have executed on the stack, the stack must be empty or 1 (i forgot)
18018:35 <willcl_ark> Single stack element remaining after execution
18118:35 <robot-dreams> sipa: Yes, that was my confusion. I didn't understand why `0 1` would be considered a claen stack
18218:36 <sipa> robot-dreams: it's not
18318:36 <murch> Ah, thanks.
18418:36 <pinheadmz> is the cleanstack rule just a malleability fix?
18518:37 <willcl_ark> So am I right in reading CLEANSTACk is not consensus critical before taproot, but is for taproot?
18618:37 <glozow> felixweis: yup! we don't want any extra items on the stack. I think 0 vs 1 depends on context too, sipa explains it here https://github.com/bitcoin/bitcoin/pull/20006#issuecomment-698487304
18718:37 <sipa> willcl_ark: CLEANSTACK really means two different things
18818:37 <felixweis> its required since segwit and p2sh
18918:37 <murch> So previously the test was merely considering the tx valid because we didn't enforce the cleanstack rule, and now that we're defaulting to all flags, it would fail on the cleanstack rule
19018:38 <willcl_ark> aha
19118:38 <sipa> the script flag CLEANSTACK is only for top-level script evaluation
19218:38 <willcl_ark> also: "Note: CLEANSTACK should never be used without P2SH or WITNESS."
19318:38 <sipa> witv0/taproot have an implicit cleanstack-like behavior as part of their consensus rules
19418:39 <willcl_ark> thanks sipa.
19518:39 <sipa> but that isn't controlled by the CLEANSTACK flag
19618:39 <glozow> right
19718:39 <willcl_ark> Just to keep devs on their toes :P
19818:39 <felixweis> so if the taproot stack after execution is empty and cleanstack is enabled the input still fails?
19918:40 <glozow> ok everybody, I don't want to lose people so let's just go back to the first question :) what does TrimFlags do?
20018:40 <robot-dreams> `TrimFlags` enforces constraints between flags (e.g. if CLEANSTACK is on, WITNESS must be on), and does so by turning off flags.
20118:40 <glozow> robot-dreams: correct!
20218:41 <glozow> what does `~` do in this context?
20318:41 <michaelfolkson> bitwise NOT
20418:41 <glozow> michaelfolkson: yup!
20518:41 <robot-dreams> Do we expect many more such constraints in the future?
20618:41 <murch> felixweis: Empty stack is also a problem, I think it should end on a single element.
20718:41 <glozow> (not a flag destructor, in some countries that is illegal)
20818:42 <felixweis> single element != 0
20918:42 <glozow> Arriving at our main course today: What does it mean for script verify flags to be “minimal” and “maximal?”
21018:42 <pinheadmz> ha
21118:42 <glozow> robot-dreams: not sure. depends on sipa i guess
21218:43 <sipa> help
21318:43 <sipa> i sincerely hope not
21418:43 <murch> glozow: Valid txes should pass with all flags that are not explicitly excluded in this specific transactino, and invalid txes should fail on exactly the specific flag that's tested
21518:44 <glozow> murch: yes! we don't want transaction tests to be passing/failing because we omitted/added extra flags
21618:44 <willcl_ark> yes, maximal should mean that adding any extra flag will make the transaction fail
21718:44 <murch> felixweis: I'm not sure what you mean. Do you mean to express that "0" is empty stack?
21818:45 <glozow> willcl_ark: yes! thanks for elaborating more concretely
21918:45 <glozow> so how do we implement this? what does the PR do to test for maximal/minimal flags?
22018:45 <murch> And minimal means that no flags can be removed without making the tx pass as valid
22118:45 <glozow> Code at https://github.com/bitcoin/bitcoin/blob/110239f2ff673eaea8f59c650792f3641855263d/src/test/transaction_tests.cpp#L241-L246
22218:45 <felixweis> murch: i thought it has to be a single element value. does the value have to be boolan true?
22318:46 <sipa> felixweis: [] is an empty stack, [""] is not
22418:46 <sipa> OP_0 pushes "" onto the stack
22518:46 <robot-dreams> For a valid transaction, we try every unset flag and make sure adding it invalidates the transaction.
22618:46 <felixweis> thanks
22718:47 <glozow> robot-dream: correct 👌
22818:47 <murch> sipa: But does a stack have to be empty or end on a positive single element in the end. I thought it was the latter for valid txes.
22918:47 <robot-dreams> I had a question about that, would it be helpful to try every valid *combination* of unset flags, or would that blow up the test time way too much?
23018:47 <glozow> robot-dreams: that's the next question :P
23118:47 <sipa> murch: you have to have a stack with a single element in it, which has to be nonzero
23218:47 <willcl_ark> stack has to terminate with any element not `0`
23318:48 <murch> Thx
23418:48 <sipa> murch: (for CLEANSTACK)
23518:48 <felixweis> what about NaN? /s
23618:48 <murch> Right
23718:48 <glozow> ok and what about testing that invalid transaction tests are minimal? how's that implemented?
23818:48 <sipa> without CLEANSTACK, a non-empty stack whose top element is nonzero is enough
23918:48 <glozow> have minimal flags* i mean
24018:48 <murch> glozow: Presumably by trying to remove every set flag in turn and checking that this makes the tx vaild
24118:48 <glozow> murch: correct
24218:49 <glozow> but hey, is it possible that we're removing a flag and getting an invalid combination? e.g. CLEANSTACK without P2SH?
24318:49 <felixweis> isnt that what trimflags is for?
24418:49 <murch> No, because if CLEANSTACK is set P2SH is turned back on by trimFlags()
24518:49 <robot-dreams> Yeah, I think glozow addresses this with FillFlags / TrimFlags calls
24618:50 <glozow> felixweis murch robot-dreams yah it's almost like last question feeds into this question
24718:50 <michaelfolkson> Just to confirm sipa. For CLEANSTACK you have to have a stack with a single element in it which has to be nonzero. Without CLEANSTACK a non-empty stack with multiple elements is ok as long as the top element is nonzero?
24818:50 <glozow> WOOT
24918:50 <murch> glozow: You've set us up
25018:50 <glozow> last question! What does it mean for script verify flags to be “soft forks?” How do the tests check this?
25118:50 <sipa> michaelfolkson: correct
25218:51 <glozow> hint: https://github.com/bitcoin/bitcoin/pull/10699
25318:51 <emzy> michaelfolkson: thx for the sum up.
25418:52 <glozow> bigger hint: https://github.com/bitcoin/bitcoin/blob/3caee16946575e71e90ead9ac531f5a3a1259307/src/script/interpreter.h#L36-L40
25518:52 <murch> glozow: I think it means that no flags should increase the set of acceptable txes when they're turned on
25618:52 <felixweis> these flags only apply to mempool policy. so older nodes will not accept and forward new transaction features.
25718:52 <robot-dreams> michaelfolkson: Thanks! Just to confirm, in this PR would it be necessary to change `0 CLTV 1` into `0 CLTV DROP 1` as sipa mentioned above?
25818:53 <michaelfolkson> sipa: And so without CLEANSTACK you could have a horrible non-empty stack with TRUE FALSE 3 and it would pass. I don't think I understand this...
25918:53 <felixweis> because we can't verify the new rules, people might be ddosing us with invalid transactions under new (yet to be definded) soft fork rules
26018:53 <murch> glozow: TBH, I'm only getting that from interpreting your commit message, though (^_^)/
26118:53 <glozow> robot-dreams: yes, this PR enforces that the tests do CLEANSTACK
26218:53 <glozow> (when they can)
26318:53 <sipa> michaelfolkson: yes; what's hard to understand about that?
26418:54 <michaelfolkson> Why should it pass? It is a mess haha
26518:54 <sipa> why wouldn't it?
26618:54 <dhruvm> glozow: script verify flags are soft forks because each new flag can only be more restrictive. The tests check that here: https://github.com/bitcoin/bitcoin/pull/19698/files#diff-cc0c6a9039a1c9fe38b8a21fe28391fffbac9b8531dfda0f658919a9f74b46baR229 by making sure that the flags are maximal after `TrimFlags`?
26718:54 <sipa> the computation succeeded
26818:54 <felixweis> the final check just looks at the top element
26918:54 <glozow> murch dhruvm: yup, adding a flag can never increase the space of valid scripts
27018:55 <sipa> michaelfolkson: say the script is a boring "<pubkey> OP_CHECKSIG"; normally you'd pass it a [<sig>] as input, and it'd leave [1] on the stack
27118:55 <sipa> if you pass it [<garbage1>,<garbage2>,<sig>], it'll leave [<garbage1>,<garbage2>,1] on the stack instead
27218:55 <glozow> so we add/subtract individual flags to check this, but what about combinations of flags like robot-dreams mentioned earlier?
27318:55 <felixweis> michaelfolkson: maybe some fancy script designers wanted to do multiple conditional execution paths and the shortest definition to do both didn't leave the stack clean
27418:55 <sipa> whatever the script was enforcing worked just fine, but there is garbage left
27518:56 <glozow> hint: https://github.com/bitcoin/bitcoin/pull/19698/files#diff-cc0c6a9039a1c9fe38b8a21fe28391fffbac9b8531dfda0f658919a9f74b46baR328
27618:56 <sipa> michaelfolkson: i think you have to see bitcoin script as a programming language, and that language used to be defined without cleanstack... so you write scripts that do what you want given those semantics
27718:57 <sipa> now the language does have cleanstack, and in theory that means you need slightly different scripts to do what you want
27818:57 <sipa> in practice... not really
27918:57 <dhruvm> glozow: since the flags have dependencies, you probably can't do all combinations?
28018:57 <michaelfolkson> Ok I guess I'm surprised because I always thought a dirty stack (in the context of CLEANSTACK) would always fail. I misunderstood the other ways a script could still pass
28118:57 <glozow> dhruvm: yes! so what do we do instead?
28218:58 <dhruvm> glozow: TrimFlags?
28318:58 <glozow> er i mean, we don't do all of them because that'd be 2^n
28418:58 <michaelfolkson> But thanks sipa felixweis
28518:58 <felixweis> was there ever a transaction using "p2sh" before p2sh activated? in other words would p2sh soft fork from genesis fork the chain?
28618:58 <glozow> that's a lotta testing
28718:58 <robot-dreams> glozow pulls out the RNG
28818:58 <glozow> yeah! we just do random combinations :P
28918:58 <sdaftuar> felixweis: one i think
29018:58 <glozow> sorry, rushed the last question a little
29118:58 <sipa> sdaftuar: oh, really?
29218:59 <glozow> We've reached the end of our program. Thanks everyone! And now, a word from our sponsor, jnewbery
29318:59 <jnewbery> it's the reason we can't apply P2SH/segwit back to genesis, right?
29418:59 <dhruvm> glozow: Does that makes the test failures hard to reproduce?
29518:59 <felixweis> sdaftuar: thanks, was it someone just experimenting with all the opcodes?
29618:59 <MarcoFalke> (slightly off-topic) There is also a fuzz test that does random combinations: https://github.com/bitcoin/bitcoin/blob/3caee16946575e71e90ead9ac531f5a3a1259307/src/test/fuzz/script_flags.cpp#L54
29718:59 <sdaftuar> jnewbery: yes, trying to find my notes on it
29818:59 <jnewbery> thanks glozow. That was great. Really good dive into some pretty complex parts of the code.
29918:59 <jnewbery> Hope everyone enjoyed!
30018:59 <glozow> dhruvm: yeah i guess, but these are mostly sanity checks for making sure the test you wrote was correct. meta-testing 🤷
30118:59 <willcl_ark> thanks glozow!
30218:59 <jnewbery> I just wanted to repeat my request for hosts from the start. I have a slot open next week for anyone who's keen.
30319:00 <felixweis> thanks glozow!!
30419:00 <sipa> dhruvm: i expect that in practice this is not a problem; if there were an actual softforkability bug somewhere that these tests catch, you'll suddenly have tons of combinations fail
30519:00 <cangr> thanks a lot
30619:00 <robot-dreams> sdaftaur: Is this related to the comment at the start of `GetBlockScriptFlags` that says "one historical block violated the P2SH rules"?
30719:00 <thomasb06> thanks glozow
30819:00 <jnewbery> (and every week after that)
30919:00 <elle> Thanks glozow! that was awesome. really enjoy learning about Script
31019:00 <sipa> glozow: thanks!
31119:00 <robot-dreams> Thanks glozow, great session!
31219:00 <pinheadmz> good show glozow and jnewbery !
31319:00 <emzy> Thanks glozow and all. I lerned a lot!
31419:00 <glozow> thanks everyone ^_^ tough review club today, thanks for sticking with me
31519:00 <darius27> thanks glozow!! I learned a lot
31619:00 <MarcoFalke> dhruvm: I think it makes it harder to reproduce, as the randomness seed is unknown
31719:00 <dhruvm> sipa: that makes sense.
31819:00 <jnewbery> just message me if you're interested in hosting and I can put you on the roster
31919:00 <jnewbery> #endmeeting