The PR branch HEAD was b053472 at the time of this review club meeting.
Notes
The
BlockAssembler
is responsible for constructing a block template on top of the current chain tip. CreateNewBlock
selects transactions from the nodeβs mempool to maximize fees and constructs the coinbase
transaction based on a scriptPubKey passed in. As the last step, it calls TestBlockValidity() to
verify that this block would be consensus-valid (apart from the missing proof of work).
Miners may use the getblocktemplate RPC to retrieve a block template, utilize external hardware
to compute the proof of work, and then publish their block using the submitblock RPC.
The algorithm takes into account transaction ancestors and descendants in order to
estimate the incentive compatibility of including or excluding some set of transactions in the
block template. We have discussed the implementation of this algorithm in a previous review club
meeting.
Bitcoin Core has a benchmarking
framework to measure the
performance of various tasks such as adding to and clearing a CRollingBloomFilter, accessing a
CCoinsCache, deserializing and checking a block, and checking a mempool with complex contents.
Bitcoin Core is multi-threaded. Some operations require holding one or more mutexes, which means
deadlock is a potential concern. One method of avoiding
deadlock is to enforce a consistent order in which every thread acquires locks. Compile with
-DDEBUG_LOCKORDER to help find instances in which locks are grabbed in inconsistent orders.
Why might we care about the performance of BlockAssembler::CreateNewBlock()?
Examining the
implementation
of CreateNewBlock(), which tasks do you expect take the most time?
There are two BlockAssembler constructors: with and without the Options parameter. Given
references to the active chainstate and mempool, chainstate and mempool respectively, what would
be the difference between calling BlockAssembler(chainstate, mempool) and
BlockAssembler(chainstate, mempool, Options())?
Given that block assembly time depends on what transactions there are in the mempool, is it a
problem that PopulateMempool() randomly determines the transaction fees?
Commit
5021832
changes the order in which PopulateMempool() grabs two locks. Why is this necessary?
Can you think of other mempool-related activities or scenarios worth benchmarking?
<LarryRuane> hen a new block arrives, miners want to build on it and start mining as soon as possible (any delay is wasted hash power). Profit margins are super thin, so every millisecond counts!
<LarryRuane> A miner could initially try to mine an empty block (we do see those sometimes), while assembling the new block in parallel, but then they lose out on the fees
<stickies-v> and also, if CreateNewBlock is really inefficient then large miners might fork and implement a faster version, giving them a slight benefit over smaller miners that can't afford that development cost
<rozehnal_paul> wasn't it called antbleed or something? there was a nonce-exploit or something...was that something all miners were capable of or just higher development miners?
<schmidty_> My thought was that the TestBlockValidity check would be most time intensive since it looks like it calls CheckBlock which checks each transaction
<glozow> schmidty_: stickies-v: good thinking, I don't know the exact answer but I would expect that those 2 are the biggest bits. they definitely take more time than constructing the coinbase tx.
<rozehnal_paul> it fills the blocks with the highest fee transactions, which cannot be done ahead of time because a miner can't know for certain which txs will be left in the mempool once the previous block is mined...im not sure if addpackagetxs() also does merkleroot calculation...probably doesn't
<stickies-v> it looks at the txs in mempool and sorts them by their total (incl ancestors) feerate - I think the results are then stored in the mempool's `mapModifiedTxs` member?
<stickies-v> I was a bit confused how the transactions are actually added into the block though, I weirdly don't see that happening in `CreateNewBlock`
<rozehnal_paul> +1 stickies-v it's the sorting of the mempool that takes time...im not sure how ancestor-finding works but that seems non-trivial as well
<glozow> rozehnal_paul: hernanmarino: stickies-v: yep that's what it does! correct, we can't really precalculate it. `mapModifiedTx` is how we "update" the fee information on mempool entries without modifying `mapTx` itself. just fyi it isn't a member of `CTxMemPool`, it's a data structure local to `BlockAssembler`
<glozow> the sorting doesn't take too much time thanks to the beauty of the multi index container. but as we include transactions, we need to update the entries that depend on them
<rozehnal_paul> Just to confirm: Since block-mining is found on a poisson distribution, miners can update their prospective blocks to include higher fee tx's without lowering their chances at finding the next block...so i imagine they DO do this...or is there some reason why a miner would create a potential block and stick to it in a 1-and-done style?
<glozow> rozehnal_paul: correct that there's no "progress" made trying to mine a particular block template, so you can definitely update it to include new transactions
<glozow> Next question. There are two BlockAssembler constructors: with and without the Options parameter. Given references to the active chainstate and mempool, `chainstate` and `mempool` respectively, what would be the difference between calling `BlockAssembler(chainstate, mempool)` and `BlockAssembler(chainstate, mempool, Options())`?
<schmidty_> Usually the mining pool creates the block right? So you have time to create the new block + time to propagate the instructions to the miners of the pool.
<LarryRuane> rozehnal_paul: "update their prospective blocks" -- I'm pretty sure most or all miners do this, about every 2 seconds (that's that I heard once) .. tradeoff between doing that too often (need to communicate to all ASICs) or not often enough (missing out on fees)
<glozow> let's be more specific. when we say "default Options" I assume we mean an `Options` instance constructed using the default, no-params constructor?
<schmidty_> In DefaultOptions(), why is "blockmaxweight" assigned in a straightforward call to GetIntArg(), while "blockmintxfee" has all the ceremony around its assignment?
<stickies-v> glozow: to answer your initial question, I'd say the first option constructs BlockAssembler taking startup options (e.g. -blockmaxweight) into account, or otherwise fallback to hardcoded defaults (e.g. DEFAULT_BLOCK_MAX_WEIGHT), whereas the second option will go straight to the hardcoded defaults and ignore user options
<glozow> stickies-v: yes exactly. the point of this question was mostly so everybody pays attention to the ctors. but this scared me, because I was imagining constructing a testing setup with -blockMinFeeRate set, but then I use `Options()` and the config doesn't apply π
<glozow> just to embarass myself a little bit here. I once wrote a benchmark for mempool `check()` and had this exact issue, I set `-check_ratio=1` and the config was ignored, so `check()` was actually not run in the bench at all https://github.com/bitcoin/bitcoin/issues/24634
<glozow> Ok next commit. Given that block assembly time depends on what transactions there are in the mempool, is it a problem that `PopulateMempool()` randomly determines the transaction fees?
<stickies-v> for benchmarks, we want to be able to compare across iterations (time, versions, ...) - so if the test is too variable, it's not a great benchmark
<glozow> schmidty_: stickies-v: excellent deductions. random fee could mean variance in the benchmark results! good thing we use a deterministic random seed :P
<LarryRuane> Yeah I think a principle we're supposed to follow is that each iteration of a benchmark should do the same work... so any variation is due to slight variations in cpu speed or disk speed, etc.
<hernanmarino> Without digging deeper in the code, my guess is there was a deadlock . I tested this PR with enable-debug and found no problems, but perhaps before this commit, there were.
<stickies-v> Assume two locks `L1` and `L2`, and two functions `F1` and `F2` where `F1` grabs locks `(L1, L2)` and `F2 grabs locks `(L2, L1)`. If one thread `T1` calls `F1` while another thread `T2` simultaneously calls `F2`, itβs possible that `T1` acquires `L1` and `T2` acquires `L2` but both threads cannot proceed because the second lock they need is already acquired.