Add pool based memory resource (utxo db and indexes)
https://github.com/bitcoin/bitcoin/pull/25325
Host: larryruane -
The PR branch HEAD was d87cb99bb37637e26a9e00b9f7de4bc6f44cb79d at the time of this review club meeting.
Notes
-
A pool-based resource is a resource that is pre-allocated and managed by a pool.
-
A pool-based resource is typically a limited or expensive resource, such as a database connection, thread, or memory buffer. The pool keeps a number of pre-allocated resources ready for use, so that when a program requests a resource, it can be quickly and efficiently provided from the pool.
-
When a resource user is finished using a pool-based resource, it returns it to the pool rather than releasing it back to the system. This means that the resource can be reused by another request without the overhead of allocating a new resource.
-
(From the PR description:) This PR implements a memory resource similar to
std::pmr::unsynchronized_pool_resource
, but optimized for node-based containers. The goal is to be able to cache more coins with the same memory usage, and allocate/deallocate faster. -
The Bitcoin Core client,
bitcoind
, spends a large fraction of its time validating transactions. -
We validate a transaction when we receive it over the P2P network, before adding it to our mempool; and also when we receive it in a new block, if we haven’t already validated it (not already in our mempool).
-
An important part of validating a transaction is ensuring that each of its inputs refers (using a
COutPoint
) to an existing transaction output, and that this output is unspent. -
The speed of validating transactions critically depends on how quickly we can locate the UTXOs referred to by their inputs.
-
When a transaction is confirmed, the UTXOs its inputs refer to become spent, and must be removed from the UTXO set. The transaction’s outputs create new UTXOs which must be added to the UTXO set. So the UTXO set is modified very often, especially during Initial Block Download (IBD), when almost all we’re doing is validating transactions as we receive each historical block.
-
The entire UTXO set is stored on disk in the
chainstate
subdirectory of the data directory, in LevelDB format. Since reading from disk is orders of magnitude slower than reading from memory, we would like to have the UTXOs that we’re most likely to need to access cached in memory. -
This UTXO cache, also called the coins cache, is one of the most memory-intensive data structures in Bitcoin Core. Its CPU performance is also important, both for lookup, and for modification since entries are added to and deleted from this cache at a high rate.
-
The size of this memory cache can be controlled using the
-dbcache
configuration setting. -
A memory-resident unspent transaction output, UTXO, is represented in the codebase as a
Coin
object. The key information in aCoin
is aCTxOut
, which contains an amount (nValue
) and ascriptPubKey
, sometimes referred to as the locking script because it prevents the coin from being spent without the authorizing witness as an input in a later transaction. -
For inclusion in the coins cache, each
Coin
is wrapped within aCCoinsCacheEntry
-
The coins cache itself, containing the individual
CCoinsCacheEntry
items, is implemented as astd::unordered_map
container with the alias nameCCoinsMap
. This map is part of theCCoinsViewCache
class. -
To summarize, we’d like this
CCoinsMap
(unordered map) to have the minimum memory requirement and the highest possible performance. Reducing its memory requirement means that it can store more entries (UTXOs) within a given amount of physical memory, which increases the cache hit rate.
Questions
-
Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
-
At a high level, what is the goal of this PR?
-
Why is
CCoinsMap
implemented as anstd::unordered_map
instead of astd::map
? What are the advantages of each? -
How does the PR change the implementation of
CCoinsMap
? - Regarding
class PoolResource
- What does the
final
keyword do? Why is it needed here? - What does
static_assert
do? Why is it used here instead of a regular assert? - What does the expression
(ALIGN_BYTES & (ALIGN_BYTES - 1)) == 0
do? - Why is this class templated by
MAX_BLOCK_SIZE_BYTES
andALIGN_BYTES
, - Why do some methods specify the
[[nodiscard]]
attribute?
- What does the
-
The
PoolResource
takes one argument,chunk_size_bytes
. What does this argument do, and what are the tradeoffs when deciding on this value? -
Why is this allocator faster (for the coins cache) than the standard allocator?
-
Why does this allocator use less memory (for the coins cache) than the standard allocator?
-
Can you think of a usage pattern (perhaps outside of the coins cache) that might cause this allocator to use more memory than the standard allocator?
- Are there other data structures that can take advantage of this new allocator?
Meeting Log
1 | 17:00 | <LarryRuane> #startmeeting |
2 | 17:00 | <pablomartin> hello all! |
3 | 17:00 | <effexzi> Hello every1 |
4 | 17:00 | <glozow> hi |
5 | 17:00 | <DaveBeer> hi |
6 | 17:00 | <martinus> hi! |
7 | 17:00 | <abubakar> hi |
8 | 17:00 | <hernanmarino> Helloooo |
9 | 17:00 | <svav> Hi |
10 | 17:00 | <LarryRuane> hi everyone, please say hi so we know who's here! |
11 | 17:00 | <glozow> hi martinus, great to have you here! |
12 | 17:00 | <codo> hi |
13 | 17:00 | <LarryRuane> yes, thank you for being here, @martinus ! |
14 | 17:00 | <LarryRuane> any first-time attendees here today? |
15 | 17:00 | <jonatack1> hi |
16 | 17:00 | <martinus> thanks, and thanks LarryRuane for having my PR here! |
17 | 17:01 | <DaveBeer> yes, I'm here for first time |
18 | 17:01 | <LarryRuane> here's the write-up for today's review club session: https://bitcoincore.reviews/25325 |
19 | 17:02 | <LarryRuane> welcome, @DaveBeer ! Feel free to ask away, any questions, even if we've moved on, continuing with previous threads is fine! |
20 | 17:02 | <pakaro> hi |
21 | 17:03 | <LarryRuane> first question, our usual: Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach? |
22 | 17:03 | <DaveBeer> y, read notes, briefly looked at code |
23 | 17:03 | <abubakar> Yes, Approach ACK, read through the PR commits and have an understanding of the PR code |
24 | 17:03 | <LarryRuane> Also, does anyone have any questions on the Notes (intended to be background material)? We can discuss those first if there are any |
25 | 17:03 | <hernanmarino> 70% yes |
26 | 17:04 | <martinus> I'd ACK it because I wrote it ;) |
27 | 17:04 | <pakaro> I was wondering where the data, if any, for cache hit rate is |
28 | 17:04 | <LarryRuane> *whew* +1 @martinus |
29 | 17:04 | <pakaro> ie. how often we access utxo from cache and how often we reach to disk |
30 | 17:05 | <LarryRuane> that's a good question, I don't know if there's any tracking of that, does anyone know? |
31 | 17:05 | <pablomartin> yes, read pr, notes, answered some questions, pending running some benchs & testing before commenting on the pr |
32 | 17:06 | <jonatack> Reviewed both the original PR (https://github.com/bitcoin/bitcoin/pull/22702) and then this newer one. Have also been running them for a long time on my nodes. |
33 | 17:06 | <Steve68> Hello, this is my first time here, I may be a little lost. Is there a Bitcoin core meeting on wednesdays? |
34 | 17:07 | <LarryRuane> I was looking over the notes just before the meeting, and even though there are lots of notes, there may be a few concepts that people aren't too familiar with... maybe we can take a couple minutes to cover some basics... |
35 | 17:07 | <LarryRuane> were most of you aware that C++ standard library contains allow you to use a custom memory allocator? |
36 | 17:07 | <hernanmarino> jonatack: great, this deserves a long testing |
37 | 17:07 | <LarryRuane> for example, take a look at the `Allocator` template argument here https://en.cppreference.com/w/cpp/container/unordered_map |
38 | 17:07 | <jonatack> Steve68: https://github.com/jonatack/bitcoin-development/blob/master/bitcoin-core-dev-irc-meetings.txt |
39 | 17:08 | <LarryRuane> the default memory allocator is the standard system allocator (which basically just does `new` and `delete`), but you can override the default |
40 | 17:09 | <Steve68> thank you! |
41 | 17:09 | <LarryRuane> (note that in c++ if you override a template argument, you must specify all preceding template arguments, even if you just want the defaults for those) |
42 | 17:09 | <LarryRuane> when a container object (like an `std::unordered_map` instance) wants to allocate memory, it uses its configured allocator (standard or custom) |
43 | 17:10 | <LarryRuane> (@martinus correct me if I'm wrong on any of this!) |
44 | 17:10 | <LarryRuane> this PR changes the allocator for the `unordered_map` used in the coins cache, and _only_ that one (not all unordered maps) |
45 | 17:10 | <martinus> all good from my side :) |
46 | 17:11 | <LarryRuane> it changes it to one implemented by this PR (so the PR has two main parts, implementing a new allocator, and making this one particular unordered_map use it) |
47 | 17:11 | <LarryRuane> Any of that unclear, or want to discuss? |
48 | 17:12 | <pakaro> very well described |
49 | 17:13 | <jonatack> for info, martinus has a blog at https://martin.ankerl.com and wrote the benchmarking library that we use since a couple of years now in bitcoin core |
50 | 17:13 | <LarryRuane> Okay we can get into the questions then ... one question I forgot to ask, why is there no mutex locking in this PR? |
51 | 17:13 | <martinus> Maybe let me add the main reasons for this custom allocator: unordered_map does one allocation per entry, and this can be costly: malloc/free can be relatively slow, and it also has a memory overhead. This PR makes it faster and reduces that overhead. |
52 | 17:13 | <LarryRuane> jonatack: TIL -- thanks for the link!! |
53 | 17:14 | <jonatack> (on the blog you can also see his extensive work on hashmap data structures in C++) |
54 | 17:14 | <LarryRuane> is that like standard c++ (not just bitcoin core)? |
55 | 17:15 | <martinus> It's per the standard that std::unordered_map has to perform at least one allocation per node (entry), and there's no way around this. Other hashmap implementations do this differently so they can be faster |
56 | 17:16 | <martinus> But changing the whole hashmap of such an integral part can be dangerous (hashmaps are hard to get right), so this PR just changes the allocator to get most of the benefits |
57 | 17:17 | <LarryRuane> martinus: thanks, and you mentioned that malloc/free (the lowest-level dynamic memory allocation primitives) can be slow, this is question 7 (ok if we go out of order!) ... can you give us reasons why? |
58 | 17:18 | <pablomartin> perhaps it's worth mentioning that originally, the focus was on the hashmap (to make std::unordered_map faster...) - https://github.com/bitcoin/bitcoin/pull/16718 - by jamesob |
59 | 17:19 | <martinus> sure, I mean malloc is really well optimized, but it has to be generically implemented for whatever size whatever thread currently wants to allocate. But with the unordered_map we know that we need a lot of nodes, and we know that each of them has exactly the same size; so we can use that to optimize for this specific case. |
60 | 17:19 | <abubakar> The Allocator implementation avoids memory fragmentation. This ensures no gaps between allocated blocks of memory, also avoids the overhead of calling malloc() and free() for every allocation and deallocation. Instead, the allocator manages its own pool of memory and allocates and deallocates blocks from this pool as needed. reduces the number of system calls required for memory |
61 | 17:19 | <abubakar> allocation and deallocation |
62 | 17:21 | <LarryRuane> martinus: abubakar: yes, thanks ... I'm not sure about reducing the number of system calls, though... aren't most malloc and free just within the runtime library? |
63 | 17:22 | <DaveBeer> yes they manage some kind of tree like structure for memory chunks, as long as large enough chunk is available, system call should not be needed |
64 | 17:22 | <DaveBeer> in runtime library that is |
65 | 17:22 | <LarryRuane> what I was thinking about the performance improvement was that no mutex locking is required ... i mean, there may be locking at a higher level (around the entire map, if multiple threads can access it), but not for each individual memory alloc and dealloc |
66 | 17:24 | <abubakar> LarryRuane: yes thanks. |
67 | 17:24 | <LarryRuane> and for question 8, "Why does this allocator use less memory (for the coins cache) than the standard allocator?", yes, I think @abubakar hit on the main reason, all the allocations are tightly packed together ... as opposed to individual allocations, where there's some (hidden to us) overhead |
68 | 17:25 | <LarryRuane> so I think martinus answered question 2, the goal of this PR ... shall we go to question 3? |
69 | 17:25 | <jonatack> LarryRuane: re locking: that said, my understanding is that locking itself isn't usually expensive, but lock contention is |
70 | 17:26 | <LarryRuane> yes, and there would be a lot of contention for memory allocation and deallocation, is that right? |
71 | 17:27 | <jonatack> I don't know offhand |
72 | 17:27 | <jonatack> just making the distinction |
73 | 17:27 | <LarryRuane> i know there's a build that measures lock contention (`--enable-debug` i think), but i don't know if it measures dynamic memory lock contention |
74 | 17:28 | <LarryRuane> Here's question 3, might be kind of easy: Why is CCoinsMap implemented as an std::unordered_map used instead of a std::map? What are the advantages of each? |
75 | 17:28 | <hernanmarino> it's faster |
76 | 17:28 | <abubakar> for question 3.. Because Coins are not ordered, we want fast read time, std::unordered_map uses hash table while std::map uses BST. |
77 | 17:28 | <hernanmarino> unordered_map, I mean |
78 | 17:29 | <abubakar> Std::unordered is good to access element fast that dont care about order, whereas std::map. Is good for sorted entries. |
79 | 17:29 | <DaveBeer> unordered has O(1) access time while regular map has O(log(N)) access time and Coins don't need to be visited in any particular order, so unordered_map is enough |
80 | 17:29 | <pakaro> there's no need for order, since each utxo in the utxo is unique. +1 davebeer abubakar |
81 | 17:30 | <jonatack> yes, DEBUG_LOCKCONTENTION is now built into the debug build and you can enable the lock logging category to see it |
82 | 17:30 | <jonatack> see https://github.com/bitcoin/bitcoin/blob/master/doc/developer-notes.md#debug_lockcontention for more |
83 | 17:30 | <LarryRuane> hernanmarino: abubakar: DaveBeer: pakaro: +1 - good answers, those are my understandings too |
84 | 17:31 | <LarryRuane> i think i've noticed quite a few maps that could be unordered_maps ... but if they're not performance-critical, i guess it doesn't matter |
85 | 17:31 | <pablomartin> faster lookup, customizable hashing, faster insertion & removal, less memory usage... |
86 | 17:31 | <LarryRuane> i think someone said that when bitcoin core was first written, there was no `std::unordered_map` yet, only added later ... does anyone know if that's true? |
87 | 17:31 | <LarryRuane> so (if that's true), some are just leftover from the early days |
88 | 17:32 | <martinus> When maps are very small (say 20 elements or so) sometimes std::map can be faster, but in this case the CCoinMap is really huge so std::unordered_map is definitely faster |
89 | 17:32 | <DaveBeer> unordered_map was introduced only in C++11 I think |
90 | 17:32 | <hernanmarino> LarryRuane, I believe it s true |
91 | 17:32 | <hernanmarino> Davebeer: +1 |
92 | 17:32 | <jonatack> (but our lock contention logging is only related to the locks we call in src/sync.h) |
93 | 17:33 | <LarryRuane> jonatack: i see, not standard library locks |
94 | 17:33 | <hernanmarino> Also, the custom allocators where used in this PR where introduced in C++17 |
95 | 17:33 | <hernanmarino> that* |
96 | 17:33 | <LarryRuane> continue discussing but let me put question 6 out there... The PoolResource takes one argument, chunk_size_bytes. What does this argument do, and what are the tradeoffs when deciding on this value? |
97 | 17:35 | <martinus> hernanmarino: I think the std::allocators were introduced in C++11, in C++17 the pmr allocators were introduced. But they are still not implemented everywhere... |
98 | 17:35 | <hernanmarino> martinus: thanks for clarifying |
99 | 17:35 | <pakaro> the number of bytes to give to each request for memory-space? |
100 | 17:36 | <LarryRuane> martinus: oh that's actually something i was wondering, the PR description says "A memory resource similar to std::pmr::unsynchronized_pool_resource, but optimized for node-based containers" ... I'm unsure if `pmr` could have been used, but would not have been as efficient? Or not possible? |
101 | 17:36 | <pakaro> if it's too large, we are wasting memory, if it is too small, an entry would not fit, and i assume the program would crash...this is why we do the *4 iirc...i may be off here |
102 | 17:37 | <jonatack> Bitcoin Core began with C++98, so yes, unordered_map didn't arrive until later IIRC |
103 | 17:38 | <LarryRuane> pakaro: +1 on the first part, there's always some unused space in the most recent chunk ... but if it's too small, the PoolAllocator just does a regular new (and later delete) |
104 | 17:38 | <LarryRuane> won't crash |
105 | 17:38 | <martinus> LarryRuane: Actually I first implemented this PR by using the pmr allocators and that way it was quite a bit simpler, but unfortunately that did not compile everywhere because libc++ still doesn't implement it. I think on MacOS that didn't work. It worked locally though with my libstdc++ |
106 | 17:39 | <LarryRuane> oh i see... that must have been a frustrating day, if you had it all working on your machine! |
107 | 17:40 | <LarryRuane> so on the question of choosing the chunk size, if it's too large there's wasted memory, and if it's too small, then there are more system allocations (which we know are somewhat slow) |
108 | 17:42 | <LarryRuane> martinus: I was wondering, did you consider making `chunk_size_bytes` a template argument? What were the considerations there? |
109 | 17:44 | <martinus> LarryRuane: yes I thought about making it either a template or an argument in the constructor, but then I though I should try to keep the PR as simple as possible. It's already complex enough and requires quite a lot of knowledge of internals. It might be a future extension if the allocator is used more widely |
110 | 17:44 | <LarryRuane> I guess also, the more stuff you template, the more actual code memory usage there can be! |
111 | 17:45 | <LarryRuane> since each combination of template values creates a whole new instance of all the code (IIUC) |
112 | 17:46 | <LarryRuane> (but probably code memory usage is not important) |
113 | 17:47 | <martinus> Also, allocating a big chunk of memory is actually quite cheap, at least in Linux this is done lazily: the malloc of the chunk doesn't really do much, only when the memory is actually used (read / write) the operating system goes and makes that page available. |
114 | 17:47 | <Murch> Is that why you needed the two different variants of new? |
115 | 17:47 | <LarryRuane> As long as you mentioned "if the allocator is used more widely" -- question 10 is: Are there other data structures that can take advantage of this new allocator? |
116 | 17:48 | <Murch> To make sure it actually is made available? |
117 | 17:48 | <martinus> Murch: no this is completely hidden and done from the kernel as far as I know, this has nothing to do with the different versions of new |
118 | 17:48 | <Murch> Oh okay |
119 | 17:49 | <LarryRuane> yes, there's the operator new: https://en.cppreference.com/w/cpp/memory/new/operator_new and the expression new: https://en.cppreference.com/w/cpp/language/new ... which i never knew about before! |
120 | 17:49 | <Murch> TBH, this all seems like higher arcane mysteries to me ^^ |
121 | 17:49 | <LarryRuane> It's like the expression new can be given _already allocated_ memory, right? (they call it "placement-params") |
122 | 17:50 | <hernanmarino> LarryRuane: On q. 10 Mempool is the first that comes to mind , but i don't think it can benefit a lot from this... perhaps something more heavily used during IBD? Blockindexes ?? I am not sure if they really use a lot of memory ... |
123 | 17:50 | <LarryRuane> yes, the uses of `new` in this PR were unfamiliar |
124 | 17:50 | <jonatack> Murch: once I tweeted about enjoying C++, and a long-time Ruby friend replied "Jon, are you OK? Blink twice if under duress" :) |
125 | 17:51 | <martinus> ah yes, allocating with operator new and in-place construction with new(...) are completely different things, but they have the same name... It's C++ being as obfuscated as it can be |
126 | 17:51 | <LarryRuane> hernanmarino: +1 - I was thinking mempool also, but I'm unsure |
127 | 17:51 | <hernanmarino> I'm just guessing, didn thoroughly reviewd the code to answer this one |
128 | 17:51 | <Murch> jonatack: Yeah, I’m definitely still not in it because of C++ 😅 |
129 | 17:51 | <hernanmarino> :) |
130 | 17:52 | <Murch> 😀😆😀😆 |
131 | 17:52 | <LarryRuane> martinus: kudos for figuring out some of this very weird (but needed) c++! |
132 | 17:53 | <Murch> hernanmarino: I’m not sure. A lot of transactions have pretty similar sizes, and the big ones could just be allocated separately. Might be useful to cover the ~24% one input one output txs, and the 46% one input two output txs |
133 | 17:54 | <LarryRuane> I think this is a great PR, I really want it to merge ... it was really interesting to read about people's concerns with safety (i think that's why the earlier attempt didn't happen) |
134 | 17:55 | <LarryRuane> Okay only 5 minutes left, anyone want to take on question 9: Can you think of a usage pattern (perhaps outside of the coins cache) that might cause this allocator to use more memory than the standard allocator? |
135 | 17:55 | <LarryRuane> Also, if anyone would like to throw out any answers to the sub-questions in question 5... I think those are interesting c++ questions |
136 | 17:56 | <hernanmarino> Murch: Agree. I was also thinking about something more heavily used during IBD, and perhaps it's not the case with the mempool |
137 | 17:56 | <pakaro> my understanding of static_assert is that it will throw a compilation error in debug mode, when an 'assert' will not throw an error in debug mode. |
138 | 17:57 | <pakaro> re. q5 |
139 | 17:57 | <LarryRuane> pakaro: no actually, assert crashes the node even in a non-debug build |
140 | 17:58 | <pakaro> then im not sure of the difference between static_assert & assert |
141 | 17:58 | <hernanmarino> re q9 I'm just guessing, but perhaps allocating only a few big chunks of memory, instead of several small ones ... |
142 | 17:58 | <LarryRuane> it's interesting to look at *where* those `static_assert`s are ... they're not inside functions! (for fun I tried changing one of them to a regular assert, and it wouldn't compile) |
143 | 17:58 | <abubakar> pre allocated memory more than necessary might result to alot of unused memory |
144 | 17:58 | <jonatack> pakaro: static assert is checked at compilation time |
145 | 17:58 | <jonatack> vs runtime for assert |
146 | 17:58 | <pablomartin> q9: perhaps in scenarios where the data structure contains a mixture of large and small objects |
147 | 17:58 | <LarryRuane> yes, static_assert triggers for any kind of build |
148 | 17:59 | <LarryRuane> pablomartin: yes, that's what i was thinking ... like if you allocate a bunch of one size of objects, then free them all ... then allocate a bunch of other sizes but never that first size, then they're sort of stuck in their freelist |
149 | 18:00 | <LarryRuane> ok we're at time, thanks to all of you!! and especially @martinus ! Please go review the PR! |
150 | 18:00 | <martinus> static_asserts are great because it immediately gives you a compile error. So if it compiles you can be sure that all static_assert's are correct. assert() are only evaluated when that code is actually run, so ideally use static_assert when possible |
151 | 18:00 | <abubakar> final ensures that the PoolResource class can be modified outside it's scope |
152 | 18:00 | <pablomartin> thanks all! |
153 | 18:00 | <LarryRuane> #endmeeting |
154 | 18:00 | <effexzi> Thanks every1 |
155 | 18:00 | <martinus> Thanks a lot LarryRuane for moderating this meeting! |
156 | 18:00 | <abubakar> LarryRuane: thanks for hosting |
157 | 18:01 | <Murch> Thanks Larry |
158 | 18:01 | <hernanmarino> thanks Larry for hosting, and martinus for great improvement |
159 | 18:01 | <LarryRuane> yes, and a regular assert might not be hit ... until post-release, on user machines! |
160 | 18:01 | <Murch> Also thanks to martinus for joining and giving more details! |
161 | 18:02 | <DaveBeer> thanks Larry and martinus |
162 | 18:02 | <pakaro> computer crashed. assuming end has been called. thanks everyone! |
163 | 18:02 | <LarryRuane> pakaro: yes, thanks for being here! |
164 | 18:03 | <jonatack> LarryRuane: (ALIGN_BYTES & (ALIGN_BYTES - 1)) == 0 tests for being a multiple of two IIRC, and the reason to use it over modulo could be performance. If I'm not confused, I reviewed it a while ago. |
165 | 18:03 | <LarryRuane> abubakar: I think of `final` as meaning that you can't create a class that uses this one as its base class ... but I'm not really sure if there's any special reason it's used here |
166 | 18:04 | <Murch> jonatack: Yeah, that was pretty nifty |
167 | 18:04 | <jonatack> (IIRC modulo is a fairly slow operation) |
168 | 18:04 | <Amirreza> Hi everyone |
169 | 18:04 | <pakaro> +1 murch very nifty |
170 | 18:04 | <LarryRuane> jonatack: yes that's correct ... only a power of 2, or zero, will make that expression zero |
171 | 18:04 | <LarryRuane> but you can't even use modulo, right? |
172 | 18:05 | <jonatack> LarryRuane: was referring to the comment at https://github.com/bitcoin/bitcoin/pull/25325/files#r1124914676 |
173 | 18:05 | <Murch> Amirreza: Welcome, however the Bitcoin Core PR Review Club just finished |
174 | 18:05 | <LarryRuane> you essentially want to check of just 1 bit is set ... there are some instruction sets that have a "popcount" (number of bits set) instruction, but obviously that's pretty low-level |
175 | 18:05 | <jonatack> I recall messing around with adding static asserts with modulo as you suggested and checking |
176 | 18:06 | <martinus> yeah the "final" here is not really necessary, it's just that no one can use it as a base class. Actually I'm not sure why I have it here, but it doesn't hurt |
177 | 18:06 | <Amirreza> Murch, Oh, I'm too late. I just did a mistake in timing! |
178 | 18:06 | <LarryRuane> martinus: ok cool ... is there any special reason for the `[[nodiscard]]`? I think it's nice, but any special reason? |
179 | 18:07 | <jonatack> LarryRuane: nodiscard is good to add for getters to ensure that the result is in fact used |
180 | 18:08 | <jonatack> LarryRuane: TIL wrt to your popcount comment, thanks! |
181 | 18:08 | <LarryRuane> ok, makes sense, because it's a pure function i think it's called (no side-effects), so if you're not using the result why are you calling it?? |
182 | 18:08 | <martinus> I usually use [[nodiscard]] everywhere, because when I return something I usually want that to be used; and if one doesn't use the return value it usually means that someone called the method but didn't have to. Especially all const methods should be [[nodiscard]] |
183 | 18:09 | <LarryRuane> martinus: i see, good to know... almost seems like `const` functions can be nodiscard by default? |
184 | 18:09 | <martinus> LarryRuane: yes I think it would be better to have that as default, but It's C++ so it won't change :) |
185 | 18:09 | <jonatack> martinus: good points |
186 | 18:10 | <LarryRuane> but it is interesting i guess that even `const` methods can have side-effects -- the `mutable` variables ... but callers shouldn't know about those side-effects |
187 | 18:10 | <LarryRuane> martinus: thanks.. I think your code is really good for all of us to use as a model! |
188 | 18:11 | <martinus> thanks! |
189 | 18:12 | <jonatack> Thanks for your long-standing work and patience to improve the performance here martinus, and thank you LarryRuane for choosing to review this pull and host a review club about it. |
190 | 18:13 | <pakaro> +1 jonatack - reading the comments from 2019 puts in perspective the amount of deliberation that goes into a pr like this |
191 | 18:13 | <DaveBeer> yeah this one was hard for me to parse '=D   void* storage = ::operator new (m_chunk_size_bytes, std::align_val_t{ELEM_ALIGN_BYTES}); |
192 | 18:13 | <DaveBeer> Â Â Â Â m_available_memory_it = new (storage) std::byte[m_chunk_size_bytes]; |
193 | 18:14 | <LarryRuane> jonatack: +1 about martinus, and you're very welcome! I really enjoyed learning about this one! |
194 | 18:15 | <pakaro> +larryruane i learnt a lot from your notes and comments in the pr |
195 | 18:16 | <LarryRuane> DaveBeer: yes me too, but i think it's *almost* the same as a normal new, like `void* storage = new std::bytes[m_chunk_size_bytes]` except that we want it aligned (the default new with a one-byte size wouldn't be aligned by 8 bytes) |
196 | 18:16 | <abubakar> same really good notes Larry :) |
197 | 18:16 | <LarryRuane> pakaro: abubakar: thanks |
198 | 18:18 | <jonatack> +1 excellent notes. Oh and whoever PRs the meeting log could maybe also drop the word "used" from Question 3. |
199 | 18:19 | <martinus> right, I want it to be aligned correctly. Also note that I'm always using ::operator new (the initial ::), because otherwise it could be that we accidentally use an overloaded version of operator new |
200 | 18:19 | <DaveBeer> LarryRuane: I see, thanks, I wasn't even aware of the new expression until today |
201 | 18:20 | <LarryRuane> oh good catch @jonatack yes I'll fix that! |
202 | 18:21 | <LarryRuane> martinus: oh that's really interesting, i thought it was only so you could specify the alignment ... can you specify alignment without the `::operator`? |
203 | 18:22 | <DaveBeer> * new expression with placement params |
204 | 18:23 | <LarryRuane> DaveBeer: i wasn't either, until a few days ago! |
205 | 18:25 | <martinus> You can use void* storage = operator new (m_chunk_size_bytes, std::align_val_t{ELEM_ALIGN_BYTES}); just fine, but then you risk of using some other operator new. It's safer to always use ::operator new, unless you really want other operator new to work. So far I never needed that... |