bench BlockAssembler on a mempool with packages (bench)

Dec 21, 2022

https://github.com/bitcoin/bitcoin/pull/26695

Host: glozow - PR author: glozow

The PR branch HEAD was b053472 at the time of this review club meeting.

Notes

The BlockAssembler is responsible for constructing a block template on top of the current chain tip. CreateNewBlock selects transactions from the node’s mempool to maximize fees and constructs the coinbase transaction based on a scriptPubKey passed in. As the last step, it calls TestBlockValidity() to verify that this block would be consensus-valid (apart from the missing proof of work).
Miners may use the getblocktemplate RPC to retrieve a block template, utilize external hardware to compute the proof of work, and then publish their block using the submitblock RPC.
The algorithm takes into account transaction ancestors and descendants in order to estimate the incentive compatibility of including or excluding some set of transactions in the block template. We have discussed the implementation of this algorithm in a previous review club meeting.
Bitcoin Core has a benchmarking framework to measure the performance of various tasks such as adding to and clearing a CRollingBloomFilter, accessing a CCoinsCache, deserializing and checking a block, and checking a mempool with complex contents.
Bitcoin Core is multi-threaded. Some operations require holding one or more mutexes, which means deadlock is a potential concern. One method of avoiding deadlock is to enforce a consistent order in which every thread acquires locks. Compile with -DDEBUG_LOCKORDER to help find instances in which locks are grabbed in inconsistent orders.

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?
Why might we care about the performance of BlockAssembler::CreateNewBlock()?
Examining the implementation of CreateNewBlock(), which tasks do you expect take the most time?
There are two BlockAssembler constructors: with and without the Options parameter. Given references to the active chainstate and mempool, chainstate and mempool respectively, what would be the difference between calling BlockAssembler(chainstate, mempool) and BlockAssembler(chainstate, mempool, Options())?
Given that block assembly time depends on what transactions there are in the mempool, is it a problem that PopulateMempool() randomly determines the transaction fees?
Commit 5021832 changes the order in which PopulateMempool() grabs two locks. Why is this necessary?
Can you think of other mempool-related activities or scenarios worth benchmarking?

Meeting Log

17:03

<glozow> #startmeeting

17:03

<stickies-v> hello!

17:03

<hernanmarino> Hi

17:03

<pablomartin> hello

17:03

<emzy> hi

17:03

<glozow> apologies for the late start. this is PR Review Club, and we're looking at #26695 today: https://bitcoincore.reviews/26695

17:04

<schmidty_> hi

17:04

<glozow> has anyone had a chance to review the PR and/or the notes? how about a y/n

17:04

<stickies-v> y

17:04

<LarryRuane> hi

17:04

<d33r_gee> y

17:04

<rozehnal_paul> y

17:05

<pablomartin> y

17:05

<emzy> n (I'm testing right now)

17:05

<LarryRuane> y

17:05

<hernanmarino> y

17:05

<glozow> amazing! would anybody like to summarize the commits in the PR?

17:07

<stickies-v> first we decouple the construction of `BlockAssembler` from the global `gArgs`

17:07

<hernanmarino> The first two deal with BlockAssembler Options

17:08

<hernanmarino> then there's the bypassing the TestBlockValidity

17:08

<stickies-v> then we introduce a new option to bypass block validation checks and make PopulateMempool more realistic by using random fees

17:09

<stickies-v> and finally add a benchmark to performance test more realistically sized/shaped packages (which is the purpose of the PR)

17:09

<hernanmarino> and after what stickies-v mentioned, there is a change in the LOCKs order

17:09

<glozow> stickies-v: hernanmarino: ⭐ ⭐ ⭐

17:09

<hernanmarino> :)

17:10

<glozow> so yes exactly, the point is to add a benchmark. Why might we care about the performance of `BlockAssembler::CreateNewBlock()`?

17:10

<hernanmarino> Because miners are interested in fast block generation

17:11

<pablomartin> + hernanmarino -> miners want to start mining asap after a block is found.

17:11

<schmidty_> One reason would be if its slow, miners wont use Bitcoin Core for block creation (or would use modified versions).

17:11

<rozehnal_paul> +1 pablomartin

17:11

<LarryRuane> hen a new block arrives, miners want to build on it and start mining as soon as possible (any delay is wasted hash power). Profit margins are super thin, so every millisecond counts!

17:11

<stickies-v> it minimizes the amount of empty blocks!

17:12

<LarryRuane> A miner could initially try to mine an empty block (we do see those sometimes), while assembling the new block in parallel, but then they lose out on the fees

17:12

<glozow> hernanmarino: pablomartin: schmidty_: LarryRuane: stickies-v: good answers!

17:12

<LarryRuane> There have been several empty blocks recently, wrote a little script to find them: https://gist.github.com/LarryRuane/35eb30cd2051e3629bbb768a19f0c320

17:12

<hernanmarino> LarryRuane: cool

17:12

<LarryRuane> (sorry about that, didn't know it would expand so much!)

17:12

<stickies-v> and also, if CreateNewBlock is really inefficient then large miners might fork and implement a faster version, giving them a slight benefit over smaller miners that can't afford that development cost

17:13

<LarryRuane> stickies-v: +1 (similar to @schmidty_'s answer)

17:13

<stickies-v> oh woops I missed that, yes schmidty_ was faster!

17:14

<glozow> very big brain answers 🧠

17:14

<LarryRuane> "implement a faster version" -- and not open-source it!

17:14

<glozow> Examining the implementation of `CreateNewBlock()` : https://github.com/bitcoin/bitcoin/blob/7386da7a0b08cd2df8ba88dae1fab9d36424b15c/src/node/miner.cpp#L106

17:14

<rozehnal_paul> wasn't this happening pre-segwit days?

17:14

<glozow> What does it do, and which tasks do you expect to take the most time?

17:15

<rozehnal_paul> wasn't it called antbleed or something? there was a nonce-exploit or something...was that something all miners were capable of or just higher development miners?

17:15

<rozehnal_paul> i may be way off

17:15

<rozehnal_paul> something i'll read about later

17:15

<schmidty_> My thought was that the TestBlockValidity check would be most time intensive since it looks like it calls CheckBlock which checks each transaction

17:16

<stickies-v> rozehnal_paul: I think you're referring to ASICboost?

17:16

<rozehnal_paul> stickies-v yup!

17:16

<schmidty_> rozehnal_paul: asicboost https://bitcoinops.org/en/topics/asicboost/

17:17

<pablomartin> creates a block from template, create coinbase transaction, fill in header, validates block state...

17:17

<pablomartin> +1 shmidty_

17:17

<LarryRuane> "which tasks do you expect to take the most time" (schmidty beat me to it) i *think* TestBlockValidity calls ConnectBlock

17:18

<glozow> rozehnal_paul: you may want to google "covert asic boost," empty blocks could make it more efficient https://blog.bitmex.com/an-overview-of-the-covert-asicboost-allegation-2/

17:18

<stickies-v> I think calling `addPackageTxs()` is also a slow task because that selects the transactions to be included?

17:19

<LarryRuane> stickies-v: +1 especially because it can't do that ahead of time (don't know which tx will be present in the new latest block)

17:19

<glozow> schmidty_: stickies-v: good thinking, I don't know the exact answer but I would expect that those 2 are the biggest bits. they definitely take more time than constructing the coinbase tx.

17:20

<hernanmarino> My two guesses were TestBlockValidity and addPackageTxs also, but I was unsure

17:21

<glozow> Just so we're all on the same page - what does `addPackageTxs()` do and why would it take time?

17:23

<rozehnal_paul> it fills the blocks with the highest fee transactions, which cannot be done ahead of time because a miner can't know for certain which txs will be left in the mempool once the previous block is mined...im not sure if addpackagetxs() also does merkleroot calculation...probably doesn't

17:23

<rozehnal_paul> definitely doesn't

17:23

<hernanmarino> it 's the transaction selection algorithm

17:23

<stickies-v> it looks at the txs in mempool and sorts them by their total (incl ancestors) feerate - I think the results are then stored in the mempool's `mapModifiedTxs` member?

17:24

<stickies-v> I was a bit confused how the transactions are actually added into the block though, I weirdly don't see that happening in `CreateNewBlock`

17:24

<rozehnal_paul> +1 stickies-v it's the sorting of the mempool that takes time...im not sure how ancestor-finding works but that seems non-trivial as well

17:25

<hernanmarino> and it's not a trivial problem to solve, in terms of complexity

17:26

<glozow> rozehnal_paul: hernanmarino: stickies-v: yep that's what it does! correct, we can't really precalculate it. `mapModifiedTx` is how we "update" the fee information on mempool entries without modifying `mapTx` itself. just fyi it isn't a member of `CTxMemPool`, it's a data structure local to `BlockAssembler`

17:27

<glozow> the sorting doesn't take too much time thanks to the beauty of the multi index container. but as we include transactions, we need to update the entries that depend on them

17:28

<rozehnal_paul> Just to confirm: Since block-mining is found on a poisson distribution, miners can update their prospective blocks to include higher fee tx's without lowering their chances at finding the next block...so i imagine they DO do this...or is there some reason why a miner would create a potential block and stick to it in a 1-and-done style?

17:29

<stickies-v> ohhh thanks glozow, I totally misunderstood the scopes there. had another look and now I get it 👍

17:29

<glozow> stickies-v: see `AddToBlock()` https://github.com/bitcoin/bitcoin/blob/7386da7a0b08cd2df8ba88dae1fab9d36424b15c/src/node/miner.cpp#L222-L231

17:29

<glozow> which updates `block.vtx`

17:30

<rozehnal_paul> sorry if im getting off-base, i can do my own research later

17:30

<glozow> rozehnal_paul: correct that there's no "progress" made trying to mine a particular block template, so you can definitely update it to include new transactions

17:30

<hernanmarino> rozehnal_paul: I don't think they do that , block creation takes time, as we are discussing

17:30

<glozow> as for whether they do it, i have no idea

17:32

<glozow> Next question. There are two BlockAssembler constructors: with and without the Options parameter. Given references to the active chainstate and mempool, `chainstate` and `mempool` respectively, what would be the difference between calling `BlockAssembler(chainstate, mempool)` and `BlockAssembler(chainstate, mempool, Options())`?

17:32

<schmidty_> Usually the mining pool creates the block right? So you have time to create the new block + time to propagate the instructions to the miners of the pool.

17:32

<LarryRuane> rozehnal_paul: "update their prospective blocks" -- I'm pretty sure most or all miners do this, about every 2 seconds (that's that I heard once) .. tradeoff between doing that too often (need to communicate to all ASICs) or not often enough (missing out on fees)

17:33

<glozow> links to the 2 ctors: https://github.com/bitcoin/bitcoin/blob/7386da7a0b08cd2df8ba88dae1fab9d36424b15c/src/node/miner.cpp#L90-L91

17:33

<glozow> https://github.com/bitcoin/bitcoin/blob/7386da7a0b08cd2df8ba88dae1fab9d36424b15c/src/node/miner.cpp#L65-L73

17:33

<hernanmarino> the first one calls the 2nd one with default Options

17:33

<pablomartin> you could change the options (weight and feerate) rather than using the defaults

17:34

<glozow> let's be more specific. when we say "default Options" I assume we mean an `Options` instance constructed using the default, no-params constructor?

17:34

<glozow> Or do we mean `DefaultOptions()`?

17:35

<schmidty_> That was confusing for me. The DefaultOptions actually reads from the args, if provided.

17:35

<hernanmarino> I was referring to DefaultOptions()

17:36

<hernanmarino> which is kind of a shallow answer, perhaps we cant talk about the details ...

17:37

<glozow> schmidty_: was confusing for me as well that the default `Options` ctor *doesn't* read gArgs

17:37

<pablomartin> if you call the first constructor, it will use the default options...

17:39

<schmidty_> In DefaultOptions(), why is "blockmaxweight" assigned in a straightforward call to GetIntArg(), while "blockmintxfee" has all the ceremony around its assignment?

17:39

<pablomartin> yeah and if you haven't passed by params... eg DEFAULT_BLOCK_MIN_TX_FEE

17:39

<schmidty_> Is it simply a matter of type? (int vs "money?(?))

17:41

<glozow> haha, i'll point out that there is some ceremony afterwards: https://github.com/bitcoin/bitcoin/blob/7386da7a0b08cd2df8ba88dae1fab9d36424b15c/src/node/miner.cpp#L72

17:41

<glozow> `nBlockMaxWeight` gets a lil sanitization later

17:43

<LarryRuane> schmidty_: I think you're right, it's just because of the need to ParseMoney on the blockmintxfee argument

17:44

<stickies-v> glozow: to answer your initial question, I'd say the first option constructs BlockAssembler taking startup options (e.g. -blockmaxweight) into account, or otherwise fallback to hardcoded defaults (e.g. DEFAULT_BLOCK_MAX_WEIGHT), whereas the second option will go straight to the hardcoded defaults and ignore user options

100

17:45

<glozow> stickies-v: yes exactly. the point of this question was mostly so everybody pays attention to the ctors. but this scared me, because I was imagining constructing a testing setup with -blockMinFeeRate set, but then I use `Options()` and the config doesn't apply 😅

101

17:46

<stickies-v> yeah, it's confusing. the name `DefaultOptions()` doesn't help

102

17:46

<hernanmarino> mmhh , interesting

103

17:47

<schmidty_> And does why is the value_or required here? https://github.com/bitcoin/bitcoin/blob/7386da7a0b08cd2df8ba88dae1fab9d36424b15c/src/node/miner.cpp#L83 Weve just checked for presence of the arg and parsed it?

104

17:47

<stickies-v> schmidty_: the amount provided could be an invalid CAmount amount

105

17:48

<glozow> just to embarass myself a little bit here. I once wrote a benchmark for mempool `check()` and had this exact issue, I set `-check_ratio=1` and the config was ignored, so `check()` was actually not run in the bench at all https://github.com/bitcoin/bitcoin/issues/24634

106

17:48

<glozow> which is why this spooked me so much

107

17:48

<schmidty_> stickies-v: Shouldnt something yell at me instead of defaulting to another fee?

108

17:48

<glozow> Ok next commit. Given that block assembly time depends on what transactions there are in the mempool, is it a problem that `PopulateMempool()` randomly determines the transaction fees?

109

17:49

<LarryRuane> schmidty_: does seem like it should do that

110

17:49

<stickies-v> schmidty_: at first sight, I would agree with that - not sure why this is silent

111

17:49

<schmidty_> glozow: The randomness could affect the timing based on if certain random values are chosen that would speed or slow things

112

17:50

<stickies-v> for benchmarks, we want to be able to compare across iterations (time, versions, ...) - so if the test is too variable, it's not a great benchmark

113

17:51

<stickies-v> luckily glozow had the foresight to use a deterministic randomness generator

114

17:51

<glozow> schmidty_: stickies-v: excellent deductions. random fee could mean variance in the benchmark results! good thing we use a deterministic random seed :P

115

17:51

<schmidty_> Does this random tx fee selection qualify as fuzz testing

116

17:51

<schmidty_> Oh ok :) Deterministic

117

17:51

<stickies-v> i'm wondering - is the randomness deterministic across environments, too?

118

17:52

<glozow> stickies-v: unsure. i'd assume so since it's a test we might want to reproduce results for.

119

17:52

<LarryRuane> Yeah I think a principle we're supposed to follow is that each iteration of a benchmark should do the same work... so any variation is due to slight variations in cpu speed or disk speed, etc.

120

17:53

<LarryRuane> (i just wrote a benchmark for that fee bump PR and almost made that mistake!

121

17:53

<glozow> Ok next question: one of the commits changes the order in which `PopulateMempool()` grabs locks. Why? https://github.com/bitcoin-core-review-club/bitcoin/commit/50218324dac18556d87688dc1a8e89bbe4d5f69e

122

17:55

<stickies-v> (I checked: it indeed seems to be deterministic across environments, since we just set the seed to 0: https://github.com/bitcoin/bitcoin/blob/6d40a1a7e7f09ff2c32e53237f968adf8300d028/src/random.cpp#L683-L684)

123

17:55

<LarryRuane> was that not just a bug fix? elsewhere, it's always cs_main then mempool.cs

124

17:56

<glozow> stickies-v: cool! thanks for checking!

125

17:56

<hernanmarino> Without digging deeper in the code, my guess is there was a deadlock . I tested this PR with enable-debug and found no problems, but perhaps before this commit, there were.

126

17:56

<LarryRuane> (i count 6 occurrences of the cs_main then mempool.cs)

127

17:56

<schmidty_> For those that missed the deterministic random like myself, I think its here which uses I believe uses a Bitcoin Core class: https://github.com/bitcoin/bitcoin/pull/26695/files#diff-6cecf3c89a982a4375a9112f3aff4d076760d84a2f3da41f5d862a6823b0b6c4R51

128

17:57

<glozow> LarryRuane: correct, it's a bug fix

129

17:57

<stickies-v> Assume two locks `L1` and `L2`, and two functions `F1` and `F2` where `F1` grabs locks `(L1, L2)` and `F2 grabs locks `(L2, L1)`. If one thread `T1` calls `F1` while another thread `T2` simultaneously calls `F2`, it’s possible that `T1` acquires `L1` and `T2` acquires `L2` but both threads cannot proceed because the second lock they need is already acquired.

130

17:57

<stickies-v> (dang, messed up the markdown - sorry)

131

17:57

<LarryRuane> stickies-v: +1

132

17:58

<glozow> yep exactly. ensuring that locks are acquired in a consistent order is a way to prevent deadlock

133

17:58

<LarryRuane> (better than messing down on the markup)