bench BlockAssembler on a mempool with packages (bench)

https://github.com/bitcoin/bitcoin/pull/26695

Host: glozow  -  PR author: glozow

The PR branch HEAD was b053472 at the time of this review club meeting.

Notes

  • The BlockAssembler is responsible for constructing a block template on top of the current chain tip. CreateNewBlock selects transactions from the node’s mempool to maximize fees and constructs the coinbase transaction based on a scriptPubKey passed in. As the last step, it calls TestBlockValidity() to verify that this block would be consensus-valid (apart from the missing proof of work).

  • Miners may use the getblocktemplate RPC to retrieve a block template, utilize external hardware to compute the proof of work, and then publish their block using the submitblock RPC.

  • The algorithm takes into account transaction ancestors and descendants in order to estimate the incentive compatibility of including or excluding some set of transactions in the block template. We have discussed the implementation of this algorithm in a previous review club meeting.

  • Bitcoin Core has a benchmarking framework to measure the performance of various tasks such as adding to and clearing a CRollingBloomFilter, accessing a CCoinsCache, deserializing and checking a block, and checking a mempool with complex contents.

  • Bitcoin Core is multi-threaded. Some operations require holding one or more mutexes, which means deadlock is a potential concern. One method of avoiding deadlock is to enforce a consistent order in which every thread acquires locks. Compile with -DDEBUG_LOCKORDER to help find instances in which locks are grabbed in inconsistent orders.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. Why might we care about the performance of BlockAssembler::CreateNewBlock()?

  3. Examining the implementation of CreateNewBlock(), which tasks do you expect take the most time?

  4. There are two BlockAssembler constructors: with and without the Options parameter. Given references to the active chainstate and mempool, chainstate and mempool respectively, what would be the difference between calling BlockAssembler(chainstate, mempool) and BlockAssembler(chainstate, mempool, Options())?

  5. Given that block assembly time depends on what transactions there are in the mempool, is it a problem that PopulateMempool() randomly determines the transaction fees?

  6. Commit 5021832 changes the order in which PopulateMempool() grabs two locks. Why is this necessary?

  7. Can you think of other mempool-related activities or scenarios worth benchmarking?

Meeting Log

  117:03 <glozow> #startmeeting
  217:03 <stickies-v> hello!
  317:03 <hernanmarino> Hi
  417:03 <pablomartin> hello
  517:03 <emzy> hi
  617:03 <glozow> apologies for the late start. this is PR Review Club, and we're looking at #26695 today: https://bitcoincore.reviews/26695
  717:04 <schmidty_> hi
  817:04 <glozow> has anyone had a chance to review the PR and/or the notes? how about a y/n
  917:04 <stickies-v> y
 1017:04 <LarryRuane> hi
 1117:04 <d33r_gee> y
 1217:04 <rozehnal_paul> y
 1317:05 <pablomartin> y
 1417:05 <emzy> n (I'm testing right now)
 1517:05 <LarryRuane> y
 1617:05 <hernanmarino> y
 1717:05 <glozow> amazing! would anybody like to summarize the commits in the PR?
 1817:07 <stickies-v> first we decouple the construction of `BlockAssembler` from the global `gArgs`
 1917:07 <hernanmarino> The first two deal with BlockAssembler Options
 2017:08 <hernanmarino> then there's the bypassing the TestBlockValidity
 2117:08 <stickies-v> then we introduce a new option to bypass block validation checks and make PopulateMempool more realistic by using random fees
 2217:09 <stickies-v> and finally add a benchmark to performance test more realistically sized/shaped packages (which is the purpose of the PR)
 2317:09 <hernanmarino> and after what stickies-v mentioned, there is a change in the LOCKs order
 2417:09 <glozow> stickies-v: hernanmarino: ⭐ ⭐ ⭐
 2517:09 <hernanmarino> :)
 2617:10 <glozow> so yes exactly, the point is to add a benchmark. Why might we care about the performance of `BlockAssembler::CreateNewBlock()`?
 2717:10 <hernanmarino> Because miners are interested in fast block generation
 2817:11 <pablomartin> + hernanmarino -> miners want to start mining asap after a block is found.
 2917:11 <schmidty_> One reason would be if its slow, miners wont use Bitcoin Core for block creation (or would use modified versions).
 3017:11 <rozehnal_paul> +1 pablomartin
 3117:11 <LarryRuane> hen a new block arrives, miners want to build on it and start mining as soon as possible (any delay is wasted hash power). Profit margins are super thin, so every millisecond counts!
 3217:11 <stickies-v> it minimizes the amount of empty blocks!
 3317:12 <LarryRuane> A miner could initially try to mine an empty block (we do see those sometimes), while assembling the new block in parallel, but then they lose out on the fees
 3417:12 <glozow> hernanmarino: pablomartin: schmidty_: LarryRuane: stickies-v: good answers!
 3517:12 <LarryRuane> There have been several empty blocks recently, wrote a little script to find them: https://gist.github.com/LarryRuane/35eb30cd2051e3629bbb768a19f0c320
 3617:12 <hernanmarino> LarryRuane: cool
 3717:12 <LarryRuane> (sorry about that, didn't know it would expand so much!)
 3817:12 <stickies-v> and also, if CreateNewBlock is really inefficient then large miners might fork and implement a faster version, giving them a slight benefit over smaller miners that can't afford that development cost
 3917:13 <LarryRuane> stickies-v: +1 (similar to @schmidty_'s answer)
 4017:13 <stickies-v> oh woops I missed that, yes schmidty_ was faster!
 4117:14 <glozow> very big brain answers 🧠
 4217:14 <LarryRuane> "implement a faster version" -- and not open-source it!
 4317:14 <glozow> Examining the implementation of `CreateNewBlock()` : https://github.com/bitcoin/bitcoin/blob/7386da7a0b08cd2df8ba88dae1fab9d36424b15c/src/node/miner.cpp#L106
 4417:14 <rozehnal_paul> wasn't this happening pre-segwit days?
 4517:14 <glozow> What does it do, and which tasks do you expect to take the most time?
 4617:15 <rozehnal_paul> wasn't it called antbleed or something? there was a nonce-exploit or something...was that something all miners were capable of or just higher development miners?
 4717:15 <rozehnal_paul> i may be way off
 4817:15 <rozehnal_paul> something i'll read about later
 4917:15 <schmidty_> My thought was that the TestBlockValidity check would be most time intensive since it looks like it calls CheckBlock which checks each transaction
 5017:16 <stickies-v> rozehnal_paul: I think you're referring to ASICboost?
 5117:16 <rozehnal_paul> stickies-v yup!
 5217:16 <schmidty_> rozehnal_paul: asicboost https://bitcoinops.org/en/topics/asicboost/
 5317:17 <pablomartin> creates a block from template, create coinbase transaction, fill in header, validates block state...
 5417:17 <pablomartin> +1 shmidty_
 5517:17 <LarryRuane> "which tasks do you expect to take the most time" (schmidty beat me to it) i *think* TestBlockValidity calls ConnectBlock
 5617:18 <glozow> rozehnal_paul: you may want to google "covert asic boost," empty blocks could make it more efficient https://blog.bitmex.com/an-overview-of-the-covert-asicboost-allegation-2/
 5717:18 <stickies-v> I think calling `addPackageTxs()` is also a slow task because that selects the transactions to be included?
 5817:19 <LarryRuane> stickies-v: +1 especially because it can't do that ahead of time (don't know which tx will be present in the new latest block)
 5917:19 <glozow> schmidty_: stickies-v: good thinking, I don't know the exact answer but I would expect that those 2 are the biggest bits. they definitely take more time than constructing the coinbase tx.
 6017:20 <hernanmarino> My two guesses were TestBlockValidity and addPackageTxs also, but I was unsure
 6117:21 <glozow> Just so we're all on the same page - what does `addPackageTxs()` do and why would it take time?
 6217:23 <rozehnal_paul> it fills the blocks with the highest fee transactions, which cannot be done ahead of time because a miner can't know for certain which txs will be left in the mempool once the previous block is mined...im not sure if addpackagetxs() also does merkleroot calculation...probably doesn't
 6317:23 <rozehnal_paul> definitely doesn't
 6417:23 <hernanmarino> it 's the transaction selection algorithm
 6517:23 <stickies-v> it looks at the txs in mempool and sorts them by their total (incl ancestors) feerate - I think the results are then stored in the mempool's `mapModifiedTxs` member?
 6617:24 <stickies-v> I was a bit confused how the transactions are actually added into the block though, I weirdly don't see that happening in `CreateNewBlock`
 6717:24 <rozehnal_paul> +1 stickies-v it's the sorting of the mempool that takes time...im not sure how ancestor-finding works but that seems non-trivial as well
 6817:25 <hernanmarino> and it's not a trivial problem to solve, in terms of complexity
 6917:26 <glozow> rozehnal_paul: hernanmarino: stickies-v: yep that's what it does! correct, we can't really precalculate it. `mapModifiedTx` is how we "update" the fee information on mempool entries without modifying `mapTx` itself. just fyi it isn't a member of `CTxMemPool`, it's a data structure local to `BlockAssembler`
 7017:27 <glozow> the sorting doesn't take too much time thanks to the beauty of the multi index container. but as we include transactions, we need to update the entries that depend on them
 7117:28 <rozehnal_paul> Just to confirm: Since block-mining is found on a poisson distribution, miners can update their prospective blocks to include higher fee tx's without lowering their chances at finding the next block...so i imagine they DO do this...or is there some reason why a miner would create a potential block and stick to it in a 1-and-done style?
 7217:29 <stickies-v> ohhh thanks glozow, I totally misunderstood the scopes there. had another look and now I get it πŸ‘
 7317:29 <glozow> stickies-v: see `AddToBlock()` https://github.com/bitcoin/bitcoin/blob/7386da7a0b08cd2df8ba88dae1fab9d36424b15c/src/node/miner.cpp#L222-L231
 7417:29 <glozow> which updates `block.vtx`
 7517:30 <rozehnal_paul> sorry if im getting off-base, i can do my own research later
 7617:30 <glozow> rozehnal_paul: correct that there's no "progress" made trying to mine a particular block template, so you can definitely update it to include new transactions
 7717:30 <hernanmarino> rozehnal_paul: I don't think they do that , block creation takes time, as we are discussing
 7817:30 <glozow> as for whether they do it, i have no idea
 7917:32 <glozow> Next question. There are two BlockAssembler constructors: with and without the Options parameter. Given references to the active chainstate and mempool, `chainstate` and `mempool` respectively, what would be the difference between calling `BlockAssembler(chainstate, mempool)` and `BlockAssembler(chainstate, mempool, Options())`?
 8017:32 <schmidty_> Usually the mining pool creates the block right? So you have time to create the new block + time to propagate the instructions to the miners of the pool.
 8117:32 <LarryRuane> rozehnal_paul: "update their prospective blocks" -- I'm pretty sure most or all miners do this, about every 2 seconds (that's that I heard once) .. tradeoff between doing that too often (need to communicate to all ASICs) or not often enough (missing out on fees)
 8217:33 <glozow> links to the 2 ctors: https://github.com/bitcoin/bitcoin/blob/7386da7a0b08cd2df8ba88dae1fab9d36424b15c/src/node/miner.cpp#L90-L91
 8317:33 <glozow> https://github.com/bitcoin/bitcoin/blob/7386da7a0b08cd2df8ba88dae1fab9d36424b15c/src/node/miner.cpp#L65-L73
 8417:33 <hernanmarino> the first one calls the 2nd one with default Options
 8517:33 <pablomartin> you could change the options (weight and feerate) rather than using the defaults
 8617:34 <glozow> let's be more specific. when we say "default Options" I assume we mean an `Options` instance constructed using the default, no-params constructor?
 8717:34 <glozow> Or do we mean `DefaultOptions()`?
 8817:35 <schmidty_> That was confusing for me. The DefaultOptions actually reads from the args, if provided.
 8917:35 <hernanmarino> I was referring to DefaultOptions()
 9017:36 <hernanmarino> which is kind of a shallow answer, perhaps we cant talk about the details ...
 9117:37 <glozow> schmidty_: was confusing for me as well that the default `Options` ctor *doesn't* read gArgs
 9217:37 <pablomartin> if you call the first constructor, it will use the default options...
 9317:39 <schmidty_> In DefaultOptions(), why is "blockmaxweight" assigned in a straightforward call to GetIntArg(), while "blockmintxfee" has all the ceremony around its assignment?
 9417:39 <pablomartin> yeah and if you haven't passed by params... eg DEFAULT_BLOCK_MIN_TX_FEE
 9517:39 <schmidty_> Is it simply a matter of type? (int vs "money?(?))
 9617:41 <glozow> haha, i'll point out that there is some ceremony afterwards: https://github.com/bitcoin/bitcoin/blob/7386da7a0b08cd2df8ba88dae1fab9d36424b15c/src/node/miner.cpp#L72
 9717:41 <glozow> `nBlockMaxWeight` gets a lil sanitization later
 9817:43 <LarryRuane> schmidty_: I think you're right, it's just because of the need to ParseMoney on the blockmintxfee argument
 9917:44 <stickies-v> glozow: to answer your initial question, I'd say the first option constructs BlockAssembler taking startup options (e.g. -blockmaxweight) into account, or otherwise fallback to hardcoded defaults (e.g. DEFAULT_BLOCK_MAX_WEIGHT), whereas the second option will go straight to the hardcoded defaults and ignore user options
10017:45 <glozow> stickies-v: yes exactly. the point of this question was mostly so everybody pays attention to the ctors. but this scared me, because I was imagining constructing a testing setup with -blockMinFeeRate set, but then I use `Options()` and the config doesn't apply πŸ˜…
10117:46 <stickies-v> yeah, it's confusing. the name `DefaultOptions()` doesn't help
10217:46 <hernanmarino> mmhh , interesting
10317:47 <schmidty_> And does why is the value_or required here? https://github.com/bitcoin/bitcoin/blob/7386da7a0b08cd2df8ba88dae1fab9d36424b15c/src/node/miner.cpp#L83 Weve just checked for presence of the arg and parsed it?
10417:47 <stickies-v> schmidty_: the amount provided could be an invalid CAmount amount
10517:48 <glozow> just to embarass myself a little bit here. I once wrote a benchmark for mempool `check()` and had this exact issue, I set `-check_ratio=1` and the config was ignored, so `check()` was actually not run in the bench at all https://github.com/bitcoin/bitcoin/issues/24634
10617:48 <glozow> which is why this spooked me so much
10717:48 <schmidty_> stickies-v: Shouldnt something yell at me instead of defaulting to another fee?
10817:48 <glozow> Ok next commit. Given that block assembly time depends on what transactions there are in the mempool, is it a problem that `PopulateMempool()` randomly determines the transaction fees?
10917:49 <LarryRuane> schmidty_: does seem like it should do that
11017:49 <stickies-v> schmidty_: at first sight, I would agree with that - not sure why this is silent
11117:49 <schmidty_> glozow: The randomness could affect the timing based on if certain random values are chosen that would speed or slow things
11217:50 <stickies-v> for benchmarks, we want to be able to compare across iterations (time, versions, ...) - so if the test is too variable, it's not a great benchmark
11317:51 <stickies-v> luckily glozow had the foresight to use a deterministic randomness generator
11417:51 <glozow> schmidty_: stickies-v: excellent deductions. random fee could mean variance in the benchmark results! good thing we use a deterministic random seed :P
11517:51 <schmidty_> Does this random tx fee selection qualify as fuzz testing
11617:51 <schmidty_> Oh ok :) Deterministic
11717:51 <stickies-v> i'm wondering - is the randomness deterministic across environments, too?
11817:52 <glozow> stickies-v: unsure. i'd assume so since it's a test we might want to reproduce results for.
11917:52 <LarryRuane> Yeah I think a principle we're supposed to follow is that each iteration of a benchmark should do the same work... so any variation is due to slight variations in cpu speed or disk speed, etc.
12017:53 <LarryRuane> (i just wrote a benchmark for that fee bump PR and almost made that mistake!
12117:53 <glozow> Ok next question: one of the commits changes the order in which `PopulateMempool()` grabs locks. Why? https://github.com/bitcoin-core-review-club/bitcoin/commit/50218324dac18556d87688dc1a8e89bbe4d5f69e
12217:55 <stickies-v> (I checked: it indeed seems to be deterministic across environments, since we just set the seed to 0: https://github.com/bitcoin/bitcoin/blob/6d40a1a7e7f09ff2c32e53237f968adf8300d028/src/random.cpp#L683-L684)
12317:55 <LarryRuane> was that not just a bug fix? elsewhere, it's always cs_main then mempool.cs
12417:56 <glozow> stickies-v: cool! thanks for checking!
12517:56 <hernanmarino> Without digging deeper in the code, my guess is there was a deadlock . I tested this PR with enable-debug and found no problems, but perhaps before this commit, there were.
12617:56 <LarryRuane> (i count 6 occurrences of the cs_main then mempool.cs)
12717:56 <schmidty_> For those that missed the deterministic random like myself, I think its here which uses I believe uses a Bitcoin Core class: https://github.com/bitcoin/bitcoin/pull/26695/files#diff-6cecf3c89a982a4375a9112f3aff4d076760d84a2f3da41f5d862a6823b0b6c4R51
12817:57 <glozow> LarryRuane: correct, it's a bug fix
12917:57 <stickies-v> Assume two locks `L1` and `L2`, and two functions `F1` and `F2` where `F1` grabs locks `(L1, L2)` and `F2 grabs locks `(L2, L1)`. If one thread `T1` calls `F1` while another thread `T2` simultaneously calls `F2`, it’s possible that `T1` acquires `L1` and `T2` acquires `L2` but both threads cannot proceed because the second lock they need is already acquired.
13017:57 <stickies-v> (dang, messed up the markdown - sorry)
13117:57 <LarryRuane> stickies-v: +1
13217:58 <glozow> yep exactly. ensuring that locks are acquired in a consistent order is a way to prevent deadlock
13317:58 <LarryRuane> (better than messing down on the markup)
13417:59 <stickies-v> (🀣)
13517:59 <glozow> Last question: can you think of other mempool-related activities or scenarios worth benchmarking?
13618:00 <stickies-v> reorgs!
13718:00 <rozehnal_paul> a reorg is what happens right after a block is found, right?
13818:01 <hernanmarino> stickies-v: yes ! I haven thought of that initially
13918:01 <rozehnal_paul> ie. reorganizing a mempool now that a chunk has been taken out?
14018:01 <rozehnal_paul> wait
14118:01 <rozehnal_paul> nvm
14218:01 <rozehnal_paul> it's a block-race condition
14318:01 <glozow> stickies-v: very good idea yes. inserting things back into the mempool that have descendants in the mempool πŸ‘€ πŸ‘€ πŸ‘€
14418:01 <rozehnal_paul> oh
14518:01 <LarryRuane> stickies-v: just to play devils advocate, are reorgs rare enough that we don't really care about their performance?
14618:01 <glozow> i'm going to end the meeting here but very interested to hear if people have more ideas
14718:02 <glozow> #endmeeting