The PR branch HEAD was cbc077e at the time of this review club meeting.
Notes
Benchmark tests help us monitor the performance of specific parts of the code by running it a large number of iterations and measuring how long it takes to run that piece of code, as well as how stable the results are.
Benchmark tests are defined in src/bench and are based on the nanobench framework. Instructions on how to compile and run the benchmarks can be found in doc/benchmarking.md. It is highly recommended to do this prior to the review club.
To ensure all benchmark tests work (i.e. can actually be ran without failing), make checkruns the bench_bitcoin binary with option -sanity-check which sets the number of iterations for each test to just 1.
During an IRC meeting(#topic acceptable runtimes of new benchmarks (achow101)), it was suggested that not all benchmarks need to be run or sanity checked all the time.
What is the initial motivation behind this proposed change? Does the approach in this PR sufficiently address that? Can you think of another, not necessarily (strictly) better, approach?
Are there any downsides to make check only running -priority-level=high benchmarks by default? Do you think the benefits outweigh the downsides?
In which sense is PriorityLevel an enum? In which sense is it a bit vector? What do you think about this approach?
Which levels can we pass to -priority-level? Are they all represented in map_priority_level? How are the other one(s) resolved? Can you think of a different approach?
Were you able to compile and run the benchmarks? How did you test that the -priority-level argument works as expected?
Which benchmarks do you think should/could be labeled as LOW or MEDIUM priority? What did you base that decision on?
Quiz: Which of the following are valid PriorityLevels:
<stacie> approach ACK. the IRC meeting linked in the PR review club page was helpful in understanding the motivation for this PR (new wallet benchmark tests taking long to set up)
<pablomartin> compiled the code changes, changed the priorities on some benchs, ran the benchs with the diff priorities (low, high, med, all and a combination)
<kouloumos> I understand how priority levels can be useful, what I think I haven't grasp yet is the tradeoff between not running all the benchmarks and time saved
<stacie> I like that this PR preserves existing behavior (running all the current benchmarks) but it does lead to an important question of what makes a benchmark test worth running during make check.
<pablomartin> classifying all the benchmarks as high, as the pr does, doesn't make any difference the current behaviour... I guess perhaps some of the current benchmarks would be re-evaluated/ discussed their priority classification
<stickies-v> stacie: the CI is pretty slow feedback when you're developing, it takes a while for tests to fail. Having `make check` fail locally is much much faster
<kouloumos> stickies-v couldnt we catch those sefgaults, along with other regressions during unit and functional tests? Do benchmarks offer any other benefit on that front?
<stickies-v> amovfx_ kouloumos: unit tests would catch segfaults in the functions they're testing, but the benchmark tests themselves can also have segfaults that we'd ideally catch before they're merged into master
<kouloumos> pablomartin: I believe that's not the case when we run them with `make check`, because of the`-sanity-check` flag, which is only to test if they are running, we don't measure performance at that point
<stickies-v> Why do we need to `static_cast<PriorityLevel>` the return value when `operator|(PriorityLevel a, PriorityLevel b)`'s return type is `PriorityLevel` already?
<stacie> I have a question about that line, the PR author furszy says this code "can make the software crash if I pass a combination of tags that isn't declared (e.g. "medium, low") and call PriorityToString(levels) (at the assertion point)." https://github.com/bitcoin/bitcoin/pull/26158#discussion_r982590159 Is this concern still valid?
<kouloumos> Also, a note, for anyone interested in this kind of stuff, nanobench (the benchmarking framework we are using) is a relatively new addition, and it's maintained by a Bitcoin Core contributor. It was added with https://github.com/bitcoin/bitcoin/pull/18011
<stickies-v> to which the answer is: neither scoped nor unscoped enums can be implicitly converted _from_ their underlying type, which is why we use the explicit `static_cast` here
<stickies-v> mmm I wouldn't call it essentially an int, it's an enum and every enum has an underlying type, in this case it's a uint8_t because we don't need a larger int
<furszy> stacie: on the latest push, changed the PriorityToString function to incrementally add the priority types to the string instead of only looking the enum value on the `map_priority_level` map (which isn't covering every possible combination).
<pablomartin> if understood the questions correctly... prioritylevel it's used to make it easy to understand/ setup for user/ dev... internally the mappings and the "all" priorities resolution to include the new categories that could be added uses the bit vector for practicality
<stickies-v> do you think it's confusing that we use the PriorityLevel both to represent an actual priority level (the enum use case), as well as the aggregation of multiple priority levels (the bit vector approach)?
<stickies-v> amovfx_: but I'd argue an enum is conceptually entirely different to a bitvector? an enum item is meant to represent one item of a limited, specified set of items
<stickies-v> personally, I'd prefer keeping the PriorityLevel enum for actual priority levels, and just using a uint8_t whenever we want to represent the aggregation - that way the intent is more clear imo
<stickies-v> stacie: and then also choices for all combinations of all the options :D in which case, you might as well just have people check off multiple boxes instead
<stacie> stickies-v: yes! same energy as trying to pick a radio button or a check box for a UI. It didn't initially stand out as confusing to me, but I see the discrepancy now. I'm just not sure if as a developer, I've been trained to think a certain way haha
<stickies-v> alright next question: Which levels can we pass to `-priority-level`? Are they all represented in `map_priority_level`? How are the other one(s) resolved? Can you think of a different approach?
<furszy> stickies-v: could be, but.. we might be over-engineering it with that. The same behavior is being used in the sources already for the net processing ServiceFlags field.
<kouloumos> Talking about those transformation functions, I'm still not sure about the usefulness of PriorityToString. What's the benefit of using it instead of hardcoding the strings in the help message and the default value?
<stickies-v> kouloumos: then you'd be hardcoding that in 2 separate places which incurs the risk of people forgetting to update both locations in future updates
<stickies-v> Which of the following are valid PriorityLevels? a. PriorityLevel{0x00} | b. PriorityLevel{"low"} | c. PriorityLevel::DEFAULT_PRIORITY_LEVEL | d. PriorityLevel{3} |e. PriorityLevel{4} | f. PriorityLevel{0xff} | g. auto static_cast<PriorityLevel>(8)
<stickies-v> c. is invalid because `DEFAULT_PRIORITY_LEVEL`is a const defined in `benchmark`, so it's `benchmark::DEFAULT_PRIORITY_LEVEL` - it's not a member of `PriorityLevel`