Miniscript support in Output Descriptors (part 2) (wallet)

May 25, 2022

https://github.com/bitcoin/bitcoin/pull/24148

Host: stickies-v - PR author: darosior

The PR branch HEAD was ec72f35 at the time of this review club meeting.

Notes

This is a 2-part Review Club. See the notes of the first part for an introduction. If you weren’t able to attend, please go through the meeting logs as we won’t discuss those Miniscript basic concepts in this second session again.
In this second part, we’ll look at the Miniscript Output Descriptor implementation. We’ll focus on the last 9 commits from “miniscript: tiny doc fixups” to “qa: functional test Miniscript watchonly support”.
Output script descriptors are strings that contain all the information necessary to allow a wallet or other program to track payments made to or spent from a particular script or set of related scripts (i.e. an address or a set of related addresses such as in an HD wallet).
Descriptors combine well with Miniscript in allowing a wallet to handle tracking and signing for a larger variety of scripts. Since Bitcoin Core 23.0 descriptor wallets have become the default wallet type.
This PR #24148 introduces watch-only support for Miniscript descriptors, extending the already existing descriptor language. You’ve probably noticed that both languages have very similar syntax; this is intentional.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?
Which function is responsible for parsing the output descriptor strings? How does it determine whether the string represents a MiniscriptDescriptor, instead of any other type (including a WSHDescriptor)
Does MiniscriptDescriptor accept Miniscript policy or Miniscript or both?
Node<Key>::ContainsDuplicateKey returns a bool. What is the return type of TreeEvalMaybe<std::set<Key>>(upfn), and how does it get cast to a bool? What does Key represent, and why is it templated?
Why does ScriptMaker use a vector of CPubKey and StringMaker a vector of PubkeyProvider pointers? What’s the difference between the two?
In MiniscriptDescriptor::MakeScripts, why do we store a mapping of keys from their IDs in the FlatSigningProvider provider?
When choosing between two available satisfactions, why should the one that involves less or no signatures be preferred? For example, consider the policy or(and(older(21), pk(B)), thresh(2, pk(A), pk(B))) which can always be spent when both A and B sign, and can be spent after 21 blocks when just B signs. After 21 blocks, both satisfactions are available, but why would the satisfaction that involves just B’s signature be preferable?
In your own words, how does Node::TreeEvalMaybe() work? (Note: there is a helpful example further down the method)
In Node<Key>::FindInsaneSub(), for a given Node& node in the tree, what do we expect Span<const Node*> subs to be? In practice, in if (sub) return sub;, what value would sub have in order to return sub? (Or: what value would it not have?)

Meeting Log

17:00

<stickies-v> #startmeeting

17:00

<larryruane> hi

17:00

<kevkevin> Hey guys new here, will be mostly lurking today

17:00

<theStack> hi

17:00

<schmidty> hi

17:01

<svav> Hi

17:01

<stickies-v> welcome everyone! We'll be continuing our review of #24148 (https://bitcoincore.reviews/24148-2). Last week we covered some of the Miniscript fundamentals, this week we'll dive deeper into the new MiniscriptDescriptor output descriptor implementation.

17:01

<stickies-v> hey kevkevin, that's great! don't hesitate to ask any questions though, we're always happy to explain

17:01

<stickies-v> do we have any other first timers around? even if you're just lurking, feel free to say hi!

17:01

<paul_c> Hey guys

17:02

<OliverOff> hi

17:02

<svav> I'd be interested to hear from the newcomers how they found out about this meeting, if they don't mind sharing.

17:03

<stickies-v> today's session will be more code heavy than last week, but feel free to ask about general concepts too (please check if we didn't cover it last week already to avoid repetition: https://bitcoincore.reviews/24148-2 )

17:03

<paul_c> was shilled this group from onstage at BTC 2022

17:03

<b10c> hi

17:03

<kevkevin> svav I found this irc after reading Amiti Uttarwar's Onboarding to bitcoin core

17:03

<svav> OK thanks paul_c

17:03

<stickies-v> who got the chance to review the PR or read the notes?

17:04

<svav> OK thanks kevkevin and welcome to the newcomers

17:04

<OliverOff> reviewed the code

17:04

<__gotcha> read parts of the notes

17:05

<paul_c> yes, reviewed before meeting

17:05

— __gotcha still need to not forget to prepare the meeting better

17:05

<svav> I had a quick read of the notes

17:06

<stickies-v> nice to see some code review this time, it's quite a bit to go through I'm aware! but kudos for reading the notes too

17:06

<stickies-v> for those of you who were able to review, would you give it a Concept ACK, Approach ACK, Tested ACK, or NACK?

17:06

<stickies-v> __gotcha: that's alright, as long as it's interesting, you're having fun and you're learning, do keep coming back!

17:07

<__gotcha> stickies-v: I will, thanks

17:08

<stickies-v> alright feel free to post your (N)ACK later on but let's dive into the questions, we've got a lot to cover

17:08

<stickies-v> first up: which function is responsible for parsing the output descriptor strings? How does it determine whether the string represents a `MiniscriptDescriptor`, instead of any other type (including a `WSHDescriptor`)?

17:08

<OliverOff> Approach ACK (caveat: I'm still a n00b)

17:09

<theStack> there is a function `ParseScript` in script/descriptor.cpp, which in turn calls `miniscript::FromString`

17:10

<stickies-v> theStack: yes exactly! to anyone interested, this `ParseScript` function is here: https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/script/descriptor.cpp#L1228

17:11

<stickies-v> so then for the follow up question... how does ParseScript distinguish between all the different descriptors functions that we support?

17:12

<theStack> seems like it tries to parse descriptor functions first and miniscript quite at the end

17:12

<sipa> Actually something I wonder about, related to this, which I don't remember despite writing it: if you have a wsh(pk(...)) descriptor, how does it get parsed? Miniscript, or pk?

17:12

<sipa> And does it matter that both would be possible?

17:13

<stickies-v> theStack: exactly, it tries all the other descriptor functions first, and if all fail then we move on to MiniscriptDescriptor

17:13

<sipa> I guess my question is answered.

17:13

<stickies-v> sipa: I believe it would get parsed as pk

17:14

<stickies-v> bonus question: what's the behaviour of `Func("func_name", expr)` that is used quite frequently in `ParseScript`? are there side effects?

17:17

<__gotcha> I do not find the definition of that `Func` function.

17:17

<stickies-v> sipa just to confirm, first we would remove the "wsh()" wrapper here: https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/script/descriptor.cpp#L1353; and then since the very first check we do in `ParseScript` is for "pk()" (https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/script/descriptor.cpp#L1233), I'm quite confident that as said before

17:17

<stickies-v> it would be parsed as pk and not as Miniscript

17:17

<sipa> That sounds correct.

17:18

<theStack> i think it extracts the arguments of a function call expression... e.g. Func("foo", "foo(bar,xyz)") would result in "bar,xyz"

17:18

<sipa> @__gotcha It's in spanparsing

17:18

<stickies-v> __gotcha: it's here: https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/util/spanparsing.cpp#L23, or the header here: https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/util/spanparsing.h#L28

17:19

<theStack> though the result is not returned directly, but the passed expression span is modified

17:19

<__gotcha> for the newbie, is grep your only friend to help to find code ?

17:19

<stickies-v> yes exactly! if the "func_name" and a parenthesis wrapper is found, they get removed from expr and Func returns true. if it is not found, Func returns false and expr remains unchanged

17:20

<sipa> @__gotcha `git grep` works pretty well.

17:20

<stickies-v> __gotcha: I know it's not OG but personally I rely quite a bit on vscode's intellisense, it allows you to jump through the code very quickly once you know the hotkeys

17:21

<stickies-v> alright, moving on, but as always feel free to continue the discussion on previous questions - we can manage async!

17:21

<stickies-v> does `MiniscriptDescriptor` accept Miniscript policy or Miniscript or both?

17:21

<OliverOff> Is there a reason for naming it "Func" instead of something like "ExtractArgs"?

17:21

— __gotcha will remember to check how to setup C++ LSP for neovim

17:22

<stickies-v> OliverOff: I would guess that's because it was specifically designed to parse function calls, but I wasn't there...

17:22

<theStack> __gotcha: some people have made good experiences with using vim + ctags (i guess with neovim that also works)

17:23

<__gotcha> stickies-v: are there tests for Func function ?

17:23

<OliverOff> __gotcha: src/test/util_tests.cpp

17:23

<__gotcha> well, the Func function does not return args

17:24

<theStack> __gotcha: yes, there are unit tests, see ./test/util_tests.cpp

17:24

<stickies-v> __gotcha: in other parts of the code there are quite a few functions called "Extract..." or "Parse..." that don't return whatever is parsed, but rather use an out argument

17:25

<theStack> OliverOff: ah didn't see, you were faster :D

17:26

<stickies-v> anyone got an idea for the second question? "does `MiniscriptDescriptor` accept Miniscript policy or Miniscript or both?"

17:26

<__gotcha> not sure what you call an out argument, but afaics, here the args are removed from expr

17:27

<theStack> i wrote down that there is only miniscript supported (no miniscript policy), but don't remember how i came to that conclusion

17:28

<sipa> That's correct.

17:28

<stickies-v> __gotcha: only the function wrapper is removed from expr, so essentially expr becomes the arguments. Like theStack gave as an example: Func("foo", "foo(bar,xyz)") would change expr to become "bar,xyz"

17:28

<sipa> The PR just doesn't include a policy compiler.

17:29

<__gotcha> Oops, misunderstood

17:29

<stickies-v> alright that was a short one, moving on

17:29

<stickies-v> `Node<Key>::ContainsDuplicateKey` returns a bool. What is the return type of `TreeEvalMaybe<std::set<Key>>(upfn)`, and how does it get cast to a `bool`? What does `Key` represent, and why is it templated?

17:30

<stickies-v> hmm let's break that up in a first part to not get too confusing

17:30

<stickies-v> `Node<Key>::ContainsDuplicateKey` returns a bool. What is the return type of `TreeEvalMaybe<std::set<Key>>(upfn)`, and how does it get cast to a `bool`?

17:30

<OliverOff> Hm sorry I can't find "ContainsDuplicateKey" anywhere in the codebase/PR

17:31

<stickies-v> link: https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/script/miniscript.h#L781

17:31

<stickies-v> (note: I always include links in the notes as well: https://bitcoincore.reviews/24148-2, the formatting there is a bit easier)

17:34

<__gotcha> Is considered bad etiquette to complain about naming ?

17:35

<__gotcha> iow, is review a chance to hint where code is hard to read ?

17:36

<stickies-v> __gotcha: personally I find that quite important. It's such a waste of time to have to go read the docs when a better func/arg name would have explained it right away. Obviously it's personal preference and not everyone may agree with your suggestion, but if you have a better alternative you should definitely suggest that in a review

17:37

<stickies-v> I've got a couple of those lined up in my review too, FYI. Note that sometimes "bad" names are chosen for consistency, which is also important. Trade-offs...

17:37

<larryruane> I think that an expression of type std::optional can be coerced to a boolean, and it will be true if the expression yields a value, or else false if std::nullopt

17:37

<larryruane> (all that is from memory)

17:37

<otech> __gotcha maybe bad etiquette if you do not include suggestions for alternative names and leave it up to the author to fix if they like. I use `NIT` for "nit-picking" for that kind of thing

17:38

<stickies-v> larryruane: yes exactly! so TreeEvalMaybe returns an std::optional<Result>, where `Result` is a templated typename

17:38

<stickies-v> and as per https://en.cppreference.com/w/cpp/utility/optional: when cast to a bool, `std::optional` becomes true if the object contains a value and false if it does not contain a value

17:40

<stickies-v> this is actually a question for later on, but maybe now is a better time to cover it. We use `TreeEvalMaybe()` (and `TreeEval()`) quite a lot throughout this part of the codebase. Could anyone explain how that works? Or are there specific parts that are unclear and we can cover here?

17:40

<larryruane> in a way it's consistent with what std::optional is often replacing, a pointer that can be NULL (nullptr) ... lots of old code would say !p (where p is a pointer) to test if p is not valid

17:41

<stickies-v> (link: https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/script/miniscript.h#L338)

17:41

<stickies-v> larryruane: yeah, I like that with new code we're using std::optional though, much clearer and helpful interface with value_or() etc

17:42

<OliverOff> For me, the interesting thing about `TreeEvalMaybe` is that the way it was implemented allows for a recursive walk of the three without actual recursive function calls. When you think of it, it's just parsing left-to-right and accumulating values. Not much different than when I'm writing LISP and need to count the parentheses ;)

17:44

<stickies-v> yep, that is a nice feature. The abstraction and templating does make it a bit harder to wrap your head around exactly how it works, but it is nice how flexibly it can be used

17:44

<theStack> first a "state" is computed from root down the leaves, then a "result" is computed up back from the leaves to the root... for that one needs to pass a down- and an up-function

17:45

<theStack> or not pass, but rather specify in the template instantiation

17:45

<stickies-v> the lambda functions are actually passed, it's just that their signature is templated

17:45

<theStack> ah, yes that makes sense

17:46

<sipa> @larryruane Note that std::optional only has an *explicit* operator bool, so it won't get automatically converted to bool or int, only in places where only a bool makes sense (such as in an if condition).

100

17:47

<OliverOff> is TreeEval ever called with a "custom" `downfn`?

101

17:47

<larryruane> sipa: +1 thanks

102

17:47

<sipa> @OliverOff I don't know what you mean by "custom"; a downfn just has to be provided.

103

17:48

<sipa> (in the overload that provides the ability to specify one; if you don't have state, you don't need one)

104

17:49

<stickies-v> OliverOff: see e.g. https://github.com/darosior/bitcoin/blob/ec72f351134bed229baaefc8ffaa1f72688c5435/src/script/miniscript.h#L483 for an example of how downfn and upfn are constructed and passed to TreeEval (which calls TreeEvalMaybe)

105

17:49

<OliverOff> Thank you

106

17:50

<stickies-v> next up: when choosing between two available satisfactions, why should the one that involves less or no signatures be preferred?

107

17:50

<stickies-v> for example, consider the policy `or(and(older(21), pk(B)), thresh(2, pk(A), pk(B)))` which can always be spent when both A and B sign, and can be spent after 21 blocks when just B signs. After 21 blocks, both satisfactions are available, but why would the satisfaction that involves just B’s signature be preferable?

108

17:50

<sipa> @OliverOff See miniscript::Node::ToScript for an example.

109

17:51

<stickies-v> sipa: yup that's the one I linked a few lines earlier

110

17:51

<sipa> Oops, I was being slow.

111

17:51

<theStack> less signatures -> less witness data -> smaller tx -> less fees to pay

112

17:52

<sipa> @theStack What if somehow the smaller one involved no signatures?

113

17:52

<stickies-v> theStack: yes, probably the most important one to most users! but, there is another less obvious reason

114

17:52

<__gotcha> less specific ?

115

17:52

<sipa> (You can construct pathological cases for which this is case)

116

17:53

<stickies-v> hint: it's got to do with malleability

117

17:54

<sipa> More hint: 3rd parties on the network can remove signatures, but not add them.

118

17:54

<OliverOff> I'd guess it's because in order to prevent malleability we need satisfaction to be deterministic

119

17:54

<__gotcha> sipa: remove signatures from transaction data ?

120

17:55

<stickies-v> OliverOff: what do you mean with being deterministic?

121

17:55

<svav> Can someone remind me what malleability is?

122

17:55

<sipa> Imagine in @[stickies-v]'s example that the 21 blocks have passed, so B can sign alone, but instead the path that involves a signature from both A and B is used.

123

17:55

<OliverOff> stickies-v: mean that all nodes should agree on what the preferable satisfaction is

124

17:56

<theStack> sipa: not sure if i understand your question. did you mean "if the _larger_ one involved no signatures?" if the smaller one involved no signatures, that's what i would expect

125

17:56

<stickies-v> __gotcha: witness data is not covered by signatures, so third parties (e.g. when relaying a tx) can change this at will. In most cases, this will make the tx invalid because the witness doesn't satisfy the scriptPubKey anymore, but there are cases where the witness remains valid even after it is modified

126

17:57

<sipa> @theStack Imagine a situation where you have the choice between two satisfaction paths. One involves no signatures, but is bigger. One involves signature but is smaller. You should *still* prefer the bigger, no signatures one. Why?

127

17:57

<OliverOff> sipa: that scenario, a 3rd party could drop A and force the script to go through the first path. However, the other way around is not possible. If you only announce A, the 3rd party couldn't possibly drop B.

128

17:57

<sipa> @OliverOff Not really; it has to do with third parties not being able to change which satisfaction path was used.

129

17:58

<sipa> @OliverOff Exactly.

130

17:58

<stickies-v> svav: we mean third-party malleability here, which is, as explained in my previous example to __gotcha, the ability for someone not participating in the tx (not holding the required keys to sign) to modify the witness of a transaction

131

17:58

<theStack> sipa: ah, now i understand the question at least (still don't know the answer though xD)

132

17:58

<stickies-v> as we're getting close to time:

133

17:58

<stickies-v> if the second branch is satisfied, a third party can malleate the satisfaction by removing the signature for A, since having the signature for B is enough

134

17:59

<stickies-v> (the second branch is `thresh(2, pk(A), pk(B))`)

135

17:59

<sipa> @theStack A third party can't add signatures but can drop them. So if the honest signers use a construction with a "redundant" signature, even if it is smaller, third parties can mutate it into the bigger no-sig variant.

136

18:00

<stickies-v> #endmeeting