Miniscript support in Output Descriptors (part 2) (wallet)

https://github.com/bitcoin/bitcoin/pull/24148

Host: stickies-v  -  PR author: darosior

The PR branch HEAD was ec72f35 at the time of this review club meeting.

Notes

  • This is a 2-part Review Club. See the notes of the first part for an introduction. If you weren’t able to attend, please go through the meeting logs as we won’t discuss those Miniscript basic concepts in this second session again.

  • In this second part, we’ll look at the Miniscript Output Descriptor implementation. We’ll focus on the last 9 commits from “miniscript: tiny doc fixups” to “qa: functional test Miniscript watchonly support”.

  • Output script descriptors are strings that contain all the information necessary to allow a wallet or other program to track payments made to or spent from a particular script or set of related scripts (i.e. an address or a set of related addresses such as in an HD wallet).

  • Descriptors combine well with Miniscript in allowing a wallet to handle tracking and signing for a larger variety of scripts. Since Bitcoin Core 23.0 descriptor wallets have become the default wallet type.

  • This PR #24148 introduces watch-only support for Miniscript descriptors, extending the already existing descriptor language. You’ve probably noticed that both languages have very similar syntax; this is intentional.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. Which function is responsible for parsing the output descriptor strings? How does it determine whether the string represents a MiniscriptDescriptor, instead of any other type (including a WSHDescriptor)

  3. Does MiniscriptDescriptor accept Miniscript policy or Miniscript or both?

  4. Node<Key>::ContainsDuplicateKey returns a bool. What is the return type of TreeEvalMaybe<std::set<Key>>(upfn), and how does it get cast to a bool? What does Key represent, and why is it templated?

  5. Why does ScriptMaker use a vector of CPubKey and StringMaker a vector of PubkeyProvider pointers? What’s the difference between the two?

  6. In MiniscriptDescriptor::MakeScripts, why do we store a mapping of keys from their IDs in the FlatSigningProvider provider?

  7. When choosing between two available satisfactions, why should the one that involves less or no signatures be preferred? For example, consider the policy or(and(older(21), pk(B)), thresh(2, pk(A), pk(B))) which can always be spent when both A and B sign, and can be spent after 21 blocks when just B signs. After 21 blocks, both satisfactions are available, but why would the satisfaction that involves just B’s signature be preferable?

  8. In your own words, how does Node::TreeEvalMaybe() work? (Note: there is a helpful example further down the method)

  9. In Node<Key>::FindInsaneSub(), for a given Node& node in the tree, what do we expect Span<const Node*> subs to be? In practice, in if (sub) return sub;, what value would sub have in order to return sub? (Or: what value would it not have?)

Meeting Log

  117:00 <stickies-v> #startmeeting
  217:00 <larryruane> hi
  317:00 <kevkevin> Hey guys new here, will be mostly lurking today
  417:00 <theStack> hi
  517:00 <schmidty> hi
  617:01 <svav> Hi
  717:01 <stickies-v> welcome everyone! We'll be continuing our review of #24148 (https://bitcoincore.reviews/24148-2). Last week we covered some of the Miniscript fundamentals, this week we'll dive deeper into the new MiniscriptDescriptor output descriptor implementation.
  817:01 <stickies-v> hey kevkevin, that's great! don't hesitate to ask any questions though, we're always happy to explain
  917:01 <stickies-v> do we have any other first timers around? even if you're just lurking, feel free to say hi!
 1017:01 <paul_c> Hey guys
 1117:02 <OliverOff> hi
 1217:02 <svav> I'd be interested to hear from the newcomers how they found out about this meeting, if they don't mind sharing.
 1317:03 <stickies-v> today's session will be more code heavy than last week, but feel free to ask about general concepts too (please check if we didn't cover it last week already to avoid repetition: https://bitcoincore.reviews/24148-2 )
 1417:03 <paul_c> was shilled this group from onstage at BTC 2022
 1517:03 <b10c> hi
 1617:03 <kevkevin> svav I found this irc after reading Amiti Uttarwar's Onboarding to bitcoin core
 1717:03 <svav> OK thanks paul_c
 1817:03 <stickies-v> who got the chance to review the PR or read the notes?
 1917:04 <svav> OK thanks kevkevin and welcome to the newcomers
 2017:04 <OliverOff> reviewed the code
 2117:04 <__gotcha> read parts of the notes
 2217:05 <paul_c> yes, reviewed before meeting
 2317:05 — __gotcha still need to not forget to prepare the meeting better
 2417:05 <svav> I had a quick read of the notes
 2517:06 <stickies-v> nice to see some code review this time, it's quite a bit to go through I'm aware! but kudos for reading the notes too
 2617:06 <stickies-v> for those of you who were able to review, would you give it a Concept ACK, Approach ACK, Tested ACK, or NACK?
 2717:06 <stickies-v> __gotcha: that's alright, as long as it's interesting, you're having fun and you're learning, do keep coming back!
 2817:07 <__gotcha> stickies-v: I will, thanks
 2917:08 <stickies-v> alright feel free to post your (N)ACK later on but let's dive into the questions, we've got a lot to cover
 3017:08 <stickies-v> first up: which function is responsible for parsing the output descriptor strings? How does it determine whether the string represents a `MiniscriptDescriptor`, instead of any other type (including a `WSHDescriptor`)?
 3117:08 <OliverOff> Approach ACK (caveat: I'm still a n00b)
 3217:09 <theStack> there is a function `ParseScript` in script/descriptor.cpp, which in turn calls `miniscript::FromString`
 3317:10 <stickies-v> theStack: yes exactly! to anyone interested, this `ParseScript` function is here: https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/script/descriptor.cpp#L1228
 3417:11 <stickies-v> so then for the follow up question... how does ParseScript distinguish between all the different descriptors functions that we support?
 3517:12 <theStack> seems like it tries to parse descriptor functions first and miniscript quite at the end
 3617:12 <sipa> Actually something I wonder about, related to this, which I don't remember despite writing it: if you have a wsh(pk(...)) descriptor, how does it get parsed? Miniscript, or pk?
 3717:12 <sipa> And does it matter that both would be possible?
 3817:13 <stickies-v> theStack: exactly, it tries all the other descriptor functions first, and if all fail then we move on to MiniscriptDescriptor
 3917:13 <sipa> I guess my question is answered.
 4017:13 <stickies-v> sipa: I believe it would get parsed as pk
 4117:14 <stickies-v> bonus question: what's the behaviour of `Func("func_name", expr)` that is used quite frequently in `ParseScript`? are there side effects?
 4217:17 <__gotcha> I do not find the definition of that `Func` function.
 4317:17 <stickies-v> sipa just to confirm, first we would remove the "wsh()" wrapper here: https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/script/descriptor.cpp#L1353; and then since the very first check we do in `ParseScript` is for "pk()" (https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/script/descriptor.cpp#L1233), I'm quite confident that as said before
 4417:17 <stickies-v> it would be parsed as pk and not as Miniscript
 4517:17 <sipa> That sounds correct.
 4617:18 <theStack> i think it extracts the arguments of a function call expression... e.g. Func("foo", "foo(bar,xyz)") would result in "bar,xyz"
 4717:18 <sipa> @__gotcha It's in spanparsing
 4817:18 <stickies-v> __gotcha: it's here: https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/util/spanparsing.cpp#L23, or the header here: https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/util/spanparsing.h#L28
 4917:19 <theStack> though the result is not returned directly, but the passed expression span is modified
 5017:19 <__gotcha> for the newbie, is grep your only friend to help to find code ?
 5117:19 <stickies-v> yes exactly! if the "func_name" and a parenthesis wrapper is found, they get removed from expr and Func returns true. if it is not found, Func returns false and expr remains unchanged
 5217:20 <sipa> @__gotcha `git grep` works pretty well.
 5317:20 <stickies-v> __gotcha: I know it's not OG but personally I rely quite a bit on vscode's intellisense, it allows you to jump through the code very quickly once you know the hotkeys
 5417:21 <stickies-v> alright, moving on, but as always feel free to continue the discussion on previous questions - we can manage async!
 5517:21 <stickies-v> does `MiniscriptDescriptor` accept Miniscript policy or Miniscript or both?
 5617:21 <OliverOff> Is there a reason for naming it "Func" instead of something like "ExtractArgs"?
 5717:21 — __gotcha will remember to check how to setup C++ LSP for neovim
 5817:22 <stickies-v> OliverOff: I would guess that's because it was specifically designed to parse function calls, but I wasn't there...
 5917:22 <theStack> __gotcha: some people have made good experiences with using vim + ctags (i guess with neovim that also works)
 6017:23 <__gotcha> stickies-v: are there tests for Func function ?
 6117:23 <OliverOff> __gotcha: src/test/util_tests.cpp
 6217:23 <__gotcha> well, the Func function does not return args
 6317:24 <theStack> __gotcha: yes, there are unit tests, see ./test/util_tests.cpp
 6417:24 <stickies-v> __gotcha: in other parts of the code there are quite a few functions called "Extract..." or "Parse..." that don't return whatever is parsed, but rather use an out argument
 6517:25 <theStack> OliverOff: ah didn't see, you were faster :D
 6617:26 <stickies-v> anyone got an idea for the second question? "does `MiniscriptDescriptor` accept Miniscript policy or Miniscript or both?"
 6717:26 <__gotcha> not sure what you call an out argument, but afaics, here the args are removed from expr
 6817:27 <theStack> i wrote down that there is only miniscript supported (no miniscript policy), but don't remember how i came to that conclusion
 6917:28 <sipa> That's correct.
 7017:28 <stickies-v> __gotcha: only the function wrapper is removed from expr, so essentially expr becomes the arguments. Like theStack gave as an example: Func("foo", "foo(bar,xyz)") would change expr to become "bar,xyz"
 7117:28 <sipa> The PR just doesn't include a policy compiler.
 7217:29 <__gotcha> Oops, misunderstood
 7317:29 <stickies-v> alright that was a short one, moving on
 7417:29 <stickies-v> `Node<Key>::ContainsDuplicateKey` returns a bool. What is the return type of `TreeEvalMaybe<std::set<Key>>(upfn)`, and how does it get cast to a `bool`? What does `Key` represent, and why is it templated?
 7517:30 <stickies-v> hmm let's break that up in a first part to not get too confusing
 7617:30 <stickies-v> `Node<Key>::ContainsDuplicateKey` returns a bool. What is the return type of `TreeEvalMaybe<std::set<Key>>(upfn)`, and how does it get cast to a `bool`?
 7717:30 <OliverOff> Hm sorry I can't find "ContainsDuplicateKey" anywhere in the codebase/PR
 7817:31 <stickies-v> link: https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/script/miniscript.h#L781
 7917:31 <stickies-v> (note: I always include links in the notes as well: https://bitcoincore.reviews/24148-2, the formatting there is a bit easier)
 8017:34 <__gotcha> Is considered bad etiquette to complain about naming ?
 8117:35 <__gotcha> iow, is review a chance to hint where code is hard to read ?
 8217:36 <stickies-v> __gotcha: personally I find that quite important. It's such a waste of time to have to go read the docs when a better func/arg name would have explained it right away. Obviously it's personal preference and not everyone may agree with your suggestion, but if you have a better alternative you should definitely suggest that in a review
 8317:37 <stickies-v> I've got a couple of those lined up in my review too, FYI. Note that sometimes "bad" names are chosen for consistency, which is also important. Trade-offs...
 8417:37 <larryruane> I think that an expression of type std::optional can be coerced to a boolean, and it will be true if the expression yields a value, or else false if std::nullopt
 8517:37 <larryruane> (all that is from memory)
 8617:37 <otech> __gotcha maybe bad etiquette if you do not include suggestions for alternative names and leave it up to the author to fix if they like. I use `NIT` for "nit-picking" for that kind of thing
 8717:38 <stickies-v> larryruane: yes exactly! so TreeEvalMaybe returns an std::optional<Result>, where `Result` is a templated typename
 8817:38 <stickies-v> and as per https://en.cppreference.com/w/cpp/utility/optional: when cast to a bool, `std::optional` becomes true if the object contains a value and false if it does not contain a value
 8917:40 <stickies-v> this is actually a question for later on, but maybe now is a better time to cover it. We use `TreeEvalMaybe()` (and `TreeEval()`) quite a lot throughout this part of the codebase. Could anyone explain how that works? Or are there specific parts that are unclear and we can cover here?
 9017:40 <larryruane> in a way it's consistent with what std::optional is often replacing, a pointer that can be NULL (nullptr) ... lots of old code would say !p (where p is a pointer) to test if p is not valid
 9117:41 <stickies-v> (link: https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/script/miniscript.h#L338)
 9217:41 <stickies-v> larryruane: yeah, I like that with new code we're using std::optional though, much clearer and helpful interface with value_or() etc
 9317:42 <OliverOff> For me, the interesting thing about `TreeEvalMaybe` is that the way it was implemented allows for a recursive walk of the three without actual recursive function calls. When you think of it, it's just parsing left-to-right and accumulating values. Not much different than when I'm writing LISP and need to count the parentheses ;)
 9417:44 <stickies-v> yep, that is a nice feature. The abstraction and templating does make it a bit harder to wrap your head around exactly how it works, but it is nice how flexibly it can be used
 9517:44 <theStack> first a "state" is computed from root down the leaves, then a "result" is computed up back from the leaves to the root... for that one needs to pass a down- and an up-function
 9617:45 <theStack> or not pass, but rather specify in the template instantiation
 9717:45 <stickies-v> the lambda functions are actually passed, it's just that their signature is templated
 9817:45 <theStack> ah, yes that makes sense
 9917:46 <sipa> @larryruane Note that std::optional only has an *explicit* operator bool, so it won't get automatically converted to bool or int, only in places where only a bool makes sense (such as in an if condition).
10017:47 <OliverOff> is TreeEval ever called with a "custom" `downfn`?
10117:47 <larryruane> sipa: +1 thanks
10217:47 <sipa> @OliverOff I don't know what you mean by "custom"; a downfn just has to be provided.
10317:48 <sipa> (in the overload that provides the ability to specify one; if you don't have state, you don't need one)
10417:49 <stickies-v> OliverOff: see e.g. https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/script/miniscript.h#L483 for an example of how downfn and upfn are constructed and passed to TreeEval (which calls TreeEvalMaybe)
10517:49 <OliverOff> Thank you
10617:50 <stickies-v> next up: when choosing between two available satisfactions, why should the one that involves less or no signatures be preferred?
10717:50 <stickies-v> for example, consider the policy `or(and(older(21), pk(B)), thresh(2, pk(A), pk(B)))` which can always be spent when both A and B sign, and can be spent after 21 blocks when just B signs. After 21 blocks, both satisfactions are available, but why would the satisfaction that involves just B’s signature be preferable?
10817:50 <sipa> @OliverOff See miniscript::Node::ToScript for an example.
10917:51 <stickies-v> sipa: yup that's the one I linked a few lines earlier
11017:51 <sipa> Oops, I was being slow.
11117:51 <theStack> less signatures -> less witness data -> smaller tx -> less fees to pay
11217:52 <sipa> @theStack What if somehow the smaller one involved no signatures?
11317:52 <stickies-v> theStack: yes, probably the most important one to most users! but, there is another less obvious reason
11417:52 <__gotcha> less specific ?
11517:52 <sipa> (You can construct pathological cases for which this is case)
11617:53 <stickies-v> hint: it's got to do with malleability
11717:54 <sipa> More hint: 3rd parties on the network can remove signatures, but not add them.
11817:54 <OliverOff> I'd guess it's because in order to prevent malleability we need satisfaction to be deterministic
11917:54 <__gotcha> sipa: remove signatures from transaction data ?
12017:55 <stickies-v> OliverOff: what do you mean with being deterministic?
12117:55 <svav> Can someone remind me what malleability is?
12217:55 <sipa> Imagine in @[stickies-v]'s example that the 21 blocks have passed, so B can sign alone, but instead the path that involves a signature from both A and B is used.
12317:55 <OliverOff> stickies-v: mean that all nodes should agree on what the preferable satisfaction is
12417:56 <theStack> sipa: not sure if i understand your question. did you mean "if the _larger_ one involved no signatures?" if the smaller one involved no signatures, that's what i would expect
12517:56 <stickies-v> __gotcha: witness data is not covered by signatures, so third parties (e.g. when relaying a tx) can change this at will. In most cases, this will make the tx invalid because the witness doesn't satisfy the scriptPubKey anymore, but there are cases where the witness remains valid even after it is modified
12617:57 <sipa> @theStack Imagine a situation where you have the choice between two satisfaction paths. One involves no signatures, but is bigger. One involves signature but is smaller. You should *still* prefer the bigger, no signatures one. Why?
12717:57 <OliverOff> sipa:  that scenario, a 3rd party could drop A and force the script to go through the first path. However, the other way around is not possible. If you only announce A, the 3rd party couldn't possibly drop B.
12817:57 <sipa> @OliverOff Not really; it has to do with third parties not being able to change which satisfaction path was used.
12917:58 <sipa> @OliverOff Exactly.
13017:58 <stickies-v> svav: we mean third-party malleability here, which is, as explained in my previous example to __gotcha, the ability for someone not participating in the tx (not holding the required keys to sign) to modify the witness of a transaction
13117:58 <theStack> sipa: ah, now i understand the question at least (still don't know the answer though xD)
13217:58 <stickies-v> as we're getting close to time:
13317:58 <stickies-v> if the second branch is satisfied, a third party can malleate the satisfaction by removing the signature for A, since having the signature for B is enough
13417:59 <stickies-v> (the second branch is `thresh(2, pk(A), pk(B))`)
13517:59 <sipa> @theStack A third party can't add signatures but can drop them. So if the honest signers use a construction with a "redundant" signature, even if it is smaller, third parties can mutate it into the bigger no-sig variant.
13618:00 <stickies-v> #endmeeting