#15443 Add getdescriptorinfo functional test (tests)

https://github.com/bitcoin/bitcoin/15443

Host: jnewbery

Notes

  • Output script descriptor are a language for describing individual scriptPubKeys or HD chains of scriptPubKeys. See the Bitcoin Core documentation.

  • PR 15368 added checksums to descriptors. Since descriptors contain private or public keys where a transcription error can lead to loss of funds, having to checksum to ensure correct transcription is a sensible precaution.

  • Those checksums are based on a similar BCH encoding scheme as the bech32 address format. There are very detailed comments in the implementation explaining how the checksum is calculated.

  • The PR also added a new RPC method getdescriptorinfo that can be called with a descriptor to analyse the descriptor and add the checksum. That RPC method:
    • is a pure function. It has no side effects.
    • doesn’t need access to any blockchain or mempool data
    • doesn’t have access to the wallet.
  • The PR also added unit tests and updated several functional tests to use the checksums. A python implementation of the checksum was also added.

  • PR #15443 adds functional tests specifically for the new getdescriptorinfo RPC method.

Questions

  • What type of testing is used in Bitcoin Core?

  • What are the uses for unit tests and functional tests?

  • What is meant by solvable when talking about a descriptor?

  • Why are address and raw type descriptors always unsolvable?

Meeting Log

13:00 < jnewbery> hi!
13:00 < lightlike> hi!
13:00 < ariard> hi!
13:00 < jonatack> hi!
13:00 < ccdle12> hi!
13:01 < jnewbery> whilst we give everyone a couple of minutes to get here, I suggest we all review the notes and questions at https://bitcoin-core-review-club.github.io/15443.html
13:01 < amiti> hi!
13:02 < jnewbery> This week's PR is adding new tests. There are no changes to the C++ code.
13:02 < jnewbery> I don't think we've covered a test-only PR yet, which is a bit of an oversight, since they're a good place to start contributing
13:03 < jnewbery> I thought this one was interesting because it gives us a chance to look at output script descriptors, which are going to be important in the Bitcoin Core wallet
13:04 < jnewbery> Before we start with the *Questions* section of https://bitcoin-core-review-club.github.io/15443.html, did everything make sense in the *Notes*? Did anyone have any questions about those points?
13:05 < jnewbery> ok, great. digi_james: I know you had some questions about solvability and spendability this week. Can you state what your question was and what you learned?
13:05 < sosthene> I wonder when the descriptors were added? I don't remember seeing it before upgrading to 0.18, but maybe I wasn't paying attention
13:06 < jnewbery> sosthene: look at the docs here: https://github.com/bitcoin/bitcoin/blob/master/doc/descriptors.md . The first RPC to add support for descriptors was scantxoutset in 0.17
13:06 < digi_james> jnewbery: a descriptor is solvable if the output script and sigscript (minus signatures) can be derived from the output descriptor.
13:07 < digi_james> jnewbery: a descriptor is spendable if the signature can also be derived if the private key is also known.
13:08 < jnewbery> digi_james: that's right. So if the descriptor allows us to derive the scriptPubKey but not the scriptSig to spend it, then it's not solvable
13:08 < digi_james> So for example, a addr(P2SH_addr) is not solvable because the P2SH preimage is not known.
13:08 < jnewbery> exactly
13:09 < jnewbery> these definitions of solvable / spendable were originally used in the wallet
13:09 < jnewbery> ok, next question: What type of testing is used in Bitcoin Core?
13:10 < sosthene> unit and functional
13:11 < jnewbery> sosthene: yeah, that's mostly it. I believe there's also some fuzz testing and property-based testing (https://github.com/bitcoin/bitcoin/pull/12775)
13:11 < lightlike> I've also seen code for fuzzing. Is this used continually?
13:12 < jonatack> (plus a bit of linting in the CI and an optional serving of fuzzing that can use more data and examples)
13:12 < jnewbery> lightlike: I've never used fuzz testing myself. I'm pretty sure it's not used in the travis CI.
13:13 < drbrule> Is the testing described in the /doc folder? Wasn't sure if it's documented or just self-documented.
13:13 < sosthene> sorry can we explain what's fuzz testing?
13:13 < jnewbery> ok, so this PR adds functional tests. What are the advantages/disadvantages of that for this testing?
13:13 < jnewbery> ariard: mind taking the question about fuzz testing?
13:14 < fjahr> functional test are slow since they test the full stack
13:14 < ccdle12> jnewbury: functional tests simulate behaviour of an end user making rpc calls
13:14 < jonatack> Fuzzing: see doc/fuzzing.md
13:15 < jnewbery> jonatack: thanks
13:15 < ariard> sosthene: basically you throw millons of random-generated inputs into full-node components to discover crashs or bugs
13:15 < jnewbery> other test documentation:
13:15 < jnewbery> - unit - https://github.com/bitcoin/bitcoin/blob/master/src/test/README.md
13:15 < jnewbery> - functional/integration - https://github.com/bitcoin/bitcoin/blob/master/test/README.md
13:16 < ariard> it may be somehow semi-structurated to incrase bugs discovering success
13:16 < lightlike> also https://en.wikipedia.org/wiki/Fuzzing for a general overview
13:16 < drbrule> jnewbury: Thank you.
13:16 < ariard> s/structurated/structured
13:16 < jonatack> also: test/functional/README.md
13:17 < sosthene> ariard: got it, thanks
13:17 < jnewbery> fjahr: ccdle12: yeah, that's right. Functional tests spin up one or more full running instances of the bitcoind node, so they test the full functional behaviour, but are much slower to execute
13:18 < jnewbery> any thoughts about whether this functionality should be tested with unit or functional tests?
13:19 < fjahr> the rpc should be tested but especially the different success paths should be unit tests
13:19 < fjahr> as ccdle12 noted funcitonal is about the interaction of the user, so it should test for responding with the correct error messages etc.
13:20 < ccdle12> maybe unit for the core logic? the descriptors seem to be almost "standalone" meaning the rpc calls don't touch any other part of the codebase/affect other parts of the running daemon
13:20 < jnewbery> fjahr: that was my initial reaction too. Why is getdescriptorinfo a good candidate for unit tests?
13:21 < jonatack> it has few dependencies and can be tested in isolation
13:21 < hugohn> because it doesn't require external dependencies
13:21 < jnewbery> ccdle12: right. Because it's a pure utility, it doesn't need any setup with the rest of the node.
13:23 < jnewbery> I'm definitely not a concept NACK (some tests are better than no tests), but I think it's something to be aware of
13:23 < jnewbery> my feeling is that we rely on functional tests a bit too much in Bitcoin Core where sometimes unit tests might be more appropriate
13:23 < lightlike> jnewbery: but then we probably wouldn't test getdescriptorinfo directly, just stumbled today over a comment (https://github.com/bitcoin/bitcoin/blob/1381ddbcfcb6429b1327fd3db91ef97d8603aef9/src/test/setup_common.cpp#L74) stating that RPC tests should be moved to functional.
13:24 < jnewbery> oh, interesting
13:25 < hugohn>  RE: disadvantages of functional/integration testing. If I might add, besides the fact that it's slow, IMO it might not be possible to have full 100% test coverage with functional/integration testing (unlike unit testing), because your functional tests usually test things in very specific orders and not all the possible ways the components can react. So it's helpful but, shouldn't be relied on exclusively.
13:25 < hugohn> *can interact
13:26 < jnewbery> lightlike: there's definitely room for both. I disagree that all RPC testing should be done using the functional test suite
13:26 < jnewbery> especially for pure utility functions like `getdescriptorinfo`
13:26 < jnewbery> but opinions vary!
13:27 < jnewbery> Any thoughts about Sjors comment here: https://github.com/bitcoin/bitcoin/pull/15443#pullrequestreview-205311643 ?
13:29 < jnewbery> ok, any other questions about the code itself?
13:29 < digi_james> I was confused about sjors comments.
13:30 < digi_james> descsum_create is part of the test framework, no?
13:30 < sosthene> do we have to use functional testing to test the output of a rpc call or can it be done with a unittest?
13:30 < lightlike> I haven't understood why does a (second) python implementation for checksum has to exist (and then needs to be tested).
13:30 < digi_james> I suppose we could pass the checksum as part of the input rather than generate it via a test function.
13:31 < jnewbery> digi_james: yes, descsum_create is part of the functional test framework
13:32 < jnewbery> These tests call `getdescriptorinfo` to get the checksum from the bitcoind node, and also call descsum_create() to get the checksum from the python implementation
13:32 < jnewbery> checking that the two implementations of the checksum match is a good way to test
13:33 < jnewbery> so that's an argument for doing this as a functional test: it forces us to use implementations in two different languages
13:33 < digi_james> Why arent the checksums hardcoded in the test?
13:33 < digi_james> I see...
13:33 < jnewbery> digi_james: they could be, but having a python implementation that can be re-used makes it easier to add more tests in future
13:34 < jnewbery> but also, where would you get the hard-coded values from if you didn't implement the checksum in the test code?
13:34 < fjahr> So there are tests for descsum_create() as well?
13:34 < digi_james> jnewbery: yup that makes sense to me now ...
13:35 < jnewbery> fjahr: I don't think so. We generally don't have tests for the test code.
13:35 < jnewbery> The tests test the code and the code tests the test
13:35 < fjahr> so if we introduce logic errors in both implementations they would not be noticed
13:35 < hugohn> that's def an interesting approach. introduce new logic in the test code itself...
13:36 < hugohn> yeah... you're testing for consistency not necessarily accurary
13:36 < jnewbery> one way to test the test code is to somehow break the node implementation and verify that the test starts failing
13:37 < jonatack> fjahr: with test code I think we depend a lot on review
13:37 < jonatack>  (Sjors comment seems good to me... I have a few general questions regarding writing/reviewing tests, when it's time)
13:37 < digi_james> I suppose Sjors is suggesting to test a test :)
13:37 < lightlike> jnewbery: but doesn't this reasoning apply to basically everything (e.g. all kind of cryptographic functions)? I'm not sure why it is worth reprogramming function in python just for testing purposes and then keep it in the codebase.
13:38 < lightlike> (as an initial proof of concept, of course)
13:38 < fjahr> I think having some hardcoded values to test against would be good to make regressions explicit
13:38 < jnewbery> digi_james: one way to test the test code would be to use python's unittest module
13:39 < jnewbery> but in most cases, I don't think that's worthwhile
13:39 < digi_james> :)
13:40 < jnewbery> lightlike: yes, I think it's useful to have python implementations of cryptographic functions in the test framework. See https://github.com/bitcoin/bitcoin/blob/master/test/functional/test_framework/key.py for example
13:42 < fjahr> I guess another way to deal with it is to never change the python and the c++ implementation in the same PR
13:42 < fjahr> but not sure how that could be enforced
13:42 < jnewbery> any other questions about the code in https://github.com/bitcoin/bitcoin/pull/15443/files
13:42 < jonatack> Could use some logging :p
13:43 < jnewbery> yeah, logging would be good!
13:44 < jonatack> perhaps separate into subtests too
13:44 < jnewbery> fjahr: I wonder if https://github.com/bitcoin/bitcoin/blob/master/doc/descriptors.md#examples should be updated to include the checksums so they can be used as test vectors for other implementations
13:45 < digi_james> I wonder why the test vectors did not include unsolvable and private keys = true
13:46 < digi_james> Or the vectors in descriptor.md should be updated with such, as fjahr suggest
13:46 < fjahr> yeah, that would make sense to me
13:46 < jnewbery> digi_james: I think because that's mainly a Bitcoin Core wallet concept
13:47 < jonatack> A general question, if ok, about test assertion argument order:
13:47 < jonatack> In some test frameworks there is a convention that can be useful for test error reports to make better sense.
13:47 < jonatack> In Bitcoin Core it looks like there is an implicit convention of assert_bla(actual, expected)...?
13:47 < jonatack> I didn't see it stipulated in the testing docs... perhaps I overlooked it.
13:47 < jonatack> Is there a strict (or loose) convention?
13:48 < jnewbery> jonatack: the convention is to use the assert_bla(actual, expected) rather than assert actual == expected
13:48 < jnewbery> because assert_bla(a,e) will print out a and e if they don't match
13:49 < jnewbery> whereas assert a == e will just fail with no helpful logging
13:49 < jonatack> agreed... my question pertains to the arg order
13:50 < jnewbery> oh right. I don't think there's a convention
13:50 < jnewbery> I think we have assert_greater_than() but not assert_lesser_than(), so the order depends on which you want to be greater for that one :)
13:50 < jonatack> it looks like actual, expected in practice
13:51 < jonatack> but unsure how hard or soft that is
13:51 < jonatack> for assert_equal in any case :)
13:51 < jnewbery> yeah, I think I've mostly seen actual, expected
13:51 < jonatack> Another question about running linters before pushing test commits
13:52 < jonatack> I could bit by not doing that this morning
13:52 < jonatack> got bit by not running linters before pushing
13:52 < jonatack> which do you run...
13:53 < jonatack> I automated lint-python.sh and lint-python-deadcode.sh
13:54 < jnewbery> I generally don't run them locally
13:54 < jnewbery> travis tells you pretty quickly when they fail
13:54 < lightlike> also a general question: does anyone know if there is a debug.log generated for unit tests (where the LogPrintf() go to)?
13:55 < jnewbery> lighlike: I don't know about debug.log for unit tests, but you can run with --log_level=all to get more verbose output: https://github.com/bitcoin/bitcoin/blob/master/src/test/README.md#running-individual-tests
13:56 < jonatack> jnewbery: thanks
13:56 < jnewbery> ok, 5 mkinutes until the hour. Any final questions before we wrap up?
13:56 < digi_james> jnewbery: Is there an formal/informal roadmap for descriptors? Whist there are very neat, I wonder if they could exist in PBSTs? I suppose they will play a larger role in wallets managing taproot paths?
13:56 < jnewbery> Also, any requests for PRs to cover next week and beyond?
13:57 < digi_james> Internal node/wallet interfaces?
13:57 < jnewbery> digi_james: we want the wallet to move entirely to descriptors. There's a project tracking that here: https://github.com/bitcoin/bitcoin/projects/12
13:57 < jnewbery> as for the specification of the language itself, yes, it would need to be extended to support taproot
13:58 < jnewbery> I'd also recommend watching the talk on miniscript here: https://www.youtube.com/watch?v=XM1lzN4Zfks&feature=youtu.be which is related
13:58 < digi_james> jnewberry: Ah thank you, I will have a look at the project, awesome!
13:59 < jnewbery> alright, let's wrap it up there. Thanks everyone!
13:59 < ariard> I need to think more about it but I think a lightning-prefix could be useful too
13:59 < hugohn> thanks jnewbery
13:59 < lightlike> thanks!
14:00 < digi_james> Thanks jnewbery, and everybody else!
14:00 < jonatack> Thanks jnewbery and everyone!
21:03 < wumpus> jnewbery: I think the reasoning is that RPC is a separate 'subsystem' from whatever is being tested (it needs to be specifically initialized/torn down), so anything testing RPC *and something else* is already a functional/integration test
21:05 < wumpus> the internal functions used by RPC methods can of course, independently, be tested in unit tests, especially if they're pure utility functions
21:05 < wumpus> but really calling RPC methods from unit tests is somewhat frowned upon, unless it's unit testing of the RPC system