Add pool based memory resource (utxo db and indexes)

https://github.com/bitcoin/bitcoin/pull/25325

Host: larryruane  -  PR author: martinus

The PR branch HEAD was d87cb99bb37637e26a9e00b9f7de4bc6f44cb79d at the time of this review club meeting.

Notes

  • A pool-based resource is a resource that is pre-allocated and managed by a pool.

  • A pool-based resource is typically a limited or expensive resource, such as a database connection, thread, or memory buffer. The pool keeps a number of pre-allocated resources ready for use, so that when a program requests a resource, it can be quickly and efficiently provided from the pool.

  • When a resource user is finished using a pool-based resource, it returns it to the pool rather than releasing it back to the system. This means that the resource can be reused by another request without the overhead of allocating a new resource.

  • (From the PR description:) This PR implements a memory resource similar to std::pmr::unsynchronized_pool_resource, but optimized for node-based containers. The goal is to be able to cache more coins with the same memory usage, and allocate/deallocate faster.

  • The Bitcoin Core client, bitcoind, spends a large fraction of its time validating transactions.

  • We validate a transaction when we receive it over the P2P network, before adding it to our mempool; and also when we receive it in a new block, if we haven’t already validated it (not already in our mempool).

  • An important part of validating a transaction is ensuring that each of its inputs refers (using a COutPoint) to an existing transaction output, and that this output is unspent.

  • The speed of validating transactions critically depends on how quickly we can locate the UTXOs referred to by their inputs.

  • When a transaction is confirmed, the UTXOs its inputs refer to become spent, and must be removed from the UTXO set. The transaction’s outputs create new UTXOs which must be added to the UTXO set. So the UTXO set is modified very often, especially during Initial Block Download (IBD), when almost all we’re doing is validating transactions as we receive each historical block.

  • The entire UTXO set is stored on disk in the chainstate subdirectory of the data directory, in LevelDB format. Since reading from disk is orders of magnitude slower than reading from memory, we would like to have the UTXOs that we’re most likely to need to access cached in memory.

  • This UTXO cache, also called the coins cache, is one of the most memory-intensive data structures in Bitcoin Core. Its CPU performance is also important, both for lookup, and for modification since entries are added to and deleted from this cache at a high rate.

  • The size of this memory cache can be controlled using the -dbcache configuration setting.

  • A memory-resident unspent transaction output, UTXO, is represented in the codebase as a Coin object. The key information in a Coin is a CTxOut, which contains an amount (nValue) and a scriptPubKey, sometimes referred to as the locking script because it prevents the coin from being spent without the authorizing witness as an input in a later transaction.

  • For inclusion in the coins cache, each Coin is wrapped within a CCoinsCacheEntry

  • The coins cache itself, containing the individual CCoinsCacheEntry items, is implemented as a std::unordered_map container with the alias name CCoinsMap. This map is part of the CCoinsViewCache class.

  • To summarize, we’d like this CCoinsMap (unordered map) to have the minimum memory requirement and the highest possible performance. Reducing its memory requirement means that it can store more entries (UTXOs) within a given amount of physical memory, which increases the cache hit rate.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. At a high level, what is the goal of this PR?

  3. Why is CCoinsMap implemented as an std::unordered_map instead of a std::map? What are the advantages of each?

  4. How does the PR change the implementation of CCoinsMap?

  5. Regarding class PoolResource
    • What does the final keyword do? Why is it needed here?
    • What does static_assert do? Why is it used here instead of a regular assert?
    • What does the expression (ALIGN_BYTES & (ALIGN_BYTES - 1)) == 0 do?
    • Why is this class templated by MAX_BLOCK_SIZE_BYTES and ALIGN_BYTES,
    • Why do some methods specify the [[nodiscard]] attribute?
  6. The PoolResource takes one argument, chunk_size_bytes. What does this argument do, and what are the tradeoffs when deciding on this value?

  7. Why is this allocator faster (for the coins cache) than the standard allocator?

  8. Why does this allocator use less memory (for the coins cache) than the standard allocator?

  9. Can you think of a usage pattern (perhaps outside of the coins cache) that might cause this allocator to use more memory than the standard allocator?

  10. Are there other data structures that can take advantage of this new allocator?

Meeting Log

  117:00 <LarryRuane> #startmeeting
  217:00 <pablomartin> hello all!
  317:00 <effexzi> Hello every1
  417:00 <glozow> hi
  517:00 <DaveBeer> hi
  617:00 <martinus> hi!
  717:00 <abubakar> hi
  817:00 <hernanmarino> Helloooo
  917:00 <svav> Hi
 1017:00 <LarryRuane> hi everyone, please say hi so we know who's here!
 1117:00 <glozow> hi martinus, great to have you here!
 1217:00 <codo> hi
 1317:00 <LarryRuane> yes, thank you for being here, @martinus !
 1417:00 <LarryRuane> any first-time attendees here today?
 1517:00 <jonatack1> hi
 1617:00 <martinus> thanks, and thanks LarryRuane for having my PR here!
 1717:01 <DaveBeer> yes, I'm here for first time
 1817:01 <LarryRuane> here's the write-up for today's review club session: https://bitcoincore.reviews/25325
 1917:02 <LarryRuane> welcome, @DaveBeer ! Feel free to ask away, any questions, even if we've moved on, continuing with previous threads is fine!
 2017:02 <pakaro> hi
 2117:03 <LarryRuane> first question, our usual: Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
 2217:03 <DaveBeer> y, read notes, briefly looked at code
 2317:03 <abubakar> Yes, Approach ACK, read through the PR commits and have an understanding of the PR code
 2417:03 <LarryRuane> Also, does anyone have any questions on the Notes (intended to be background material)? We can discuss those first if there are any
 2517:03 <hernanmarino> 70% yes
 2617:04 <martinus> I'd ACK it because I wrote it ;)
 2717:04 <pakaro> I was wondering where the data, if any, for cache hit rate is
 2817:04 <LarryRuane> *whew* +1 @martinus
 2917:04 <pakaro> ie. how often we access utxo from cache and how often we reach to disk
 3017:05 <LarryRuane> that's a good question, I don't know if there's any tracking of that, does anyone know?
 3117:05 <pablomartin> yes, read pr, notes, answered some questions, pending running some benchs & testing before commenting on the pr
 3217:06 <jonatack> Reviewed both the original PR (https://github.com/bitcoin/bitcoin/pull/22702) and then this newer one. Have also been running them for a long time on my nodes.
 3317:06 <Steve68> Hello, this is my first time here, I may be a little lost. Is there a Bitcoin core meeting on wednesdays?
 3417:07 <LarryRuane> I was looking over the notes just before the meeting, and even though there are lots of notes, there may be a few concepts that people aren't too familiar with... maybe we can take a couple minutes to cover some basics...
 3517:07 <LarryRuane> were most of you aware that C++ standard library contains allow you to use a custom memory allocator?
 3617:07 <hernanmarino> jonatack: great, this deserves a long testing
 3717:07 <LarryRuane> for example, take a look at the `Allocator` template argument here https://en.cppreference.com/w/cpp/container/unordered_map
 3817:07 <jonatack> Steve68: https://github.com/jonatack/bitcoin-development/blob/master/bitcoin-core-dev-irc-meetings.txt
 3917:08 <LarryRuane> the default memory allocator is the standard system allocator (which basically just does `new` and `delete`), but you can override the default
 4017:09 <Steve68> thank you!
 4117:09 <LarryRuane> (note that in c++ if you override a template argument, you must specify all preceding template arguments, even if you just want the defaults for those)
 4217:09 <LarryRuane> when a container object (like an `std::unordered_map` instance) wants to allocate memory, it uses its configured allocator (standard or custom)
 4317:10 <LarryRuane> (@martinus correct me if I'm wrong on any of this!)
 4417:10 <LarryRuane> this PR changes the allocator for the `unordered_map` used in the coins cache, and _only_ that one (not all unordered maps)
 4517:10 <martinus> all good from my side :)
 4617:11 <LarryRuane> it changes it to one implemented by this PR (so the PR has two main parts, implementing a new allocator, and making this one particular unordered_map use it)
 4717:11 <LarryRuane> Any of that unclear, or want to discuss?
 4817:12 <pakaro> very well described
 4917:13 <jonatack> for info, martinus has a blog at https://martin.ankerl.com and wrote the benchmarking library that we use since a couple of years now in bitcoin core
 5017:13 <LarryRuane> Okay we can get into the questions then ... one question I forgot to ask, why is there no mutex locking in this PR?
 5117:13 <martinus> Maybe let me add the main reasons for this custom allocator: unordered_map does one allocation per entry, and this can be costly: malloc/free can be relatively slow, and it also has a memory overhead. This PR makes it faster and reduces that overhead.
 5217:13 <LarryRuane> jonatack: TIL -- thanks for the link!!
 5317:14 <jonatack> (on the blog you can also see his extensive work on hashmap data structures in C++)
 5417:14 <LarryRuane> is that like standard c++ (not just bitcoin core)?
 5517:15 <martinus> It's per the standard that std::unordered_map has to perform at least one allocation per node (entry), and there's no way around this. Other hashmap implementations do this differently so they can be faster
 5617:16 <martinus> But changing the whole hashmap of such an integral part can be dangerous (hashmaps are hard to get right), so this PR just changes the allocator to get most of the benefits
 5717:17 <LarryRuane> martinus: thanks, and you mentioned that malloc/free (the lowest-level dynamic memory allocation primitives) can be slow, this is question 7 (ok if we go out of order!) ... can you give us reasons why?
 5817:18 <pablomartin> perhaps it's worth mentioning that originally, the focus was on the hashmap (to make std::unordered_map faster...) - https://github.com/bitcoin/bitcoin/pull/16718 - by jamesob
 5917:19 <martinus> sure, I mean malloc is really well optimized, but it has to be generically implemented for whatever size whatever thread currently wants to allocate. But with the unordered_map we know that we need a lot of nodes, and we know that each of them has exactly the same size; so we can use that to optimize for this specific case.
 6017:19 <abubakar> The Allocator implementation avoids memory fragmentation. This ensures no gaps between allocated blocks of memory, also avoids the overhead of calling malloc() and free() for every allocation and deallocation. Instead, the allocator manages its own pool of memory and allocates and deallocates blocks from this pool as needed. reduces the number of system calls required for memory
 6117:19 <abubakar> allocation and deallocation
 6217:21 <LarryRuane> martinus: abubakar: yes, thanks ... I'm not sure about reducing the number of system calls, though... aren't most malloc and free just within the runtime library?
 6317:22 <DaveBeer> yes they manage some kind of tree like structure for memory chunks, as long as large enough chunk is available, system call should not be needed
 6417:22 <DaveBeer> in runtime library that is
 6517:22 <LarryRuane> what I was thinking about the performance improvement was that no mutex locking is required ... i mean, there may be locking at a higher level (around the entire map, if multiple threads can access it), but not for each individual memory alloc and dealloc
 6617:24 <abubakar> LarryRuane: yes thanks.
 6717:24 <LarryRuane> and for question 8, "Why does this allocator use less memory (for the coins cache) than the standard allocator?", yes, I think @abubakar hit on the main reason, all the allocations are tightly packed together ... as opposed to individual allocations, where there's some (hidden to us) overhead
 6817:25 <LarryRuane> so I think martinus answered question 2, the goal of this PR ... shall we go to question 3?
 6917:25 <jonatack> LarryRuane: re locking: that said, my understanding is that locking itself isn't usually expensive, but lock contention is
 7017:26 <LarryRuane> yes, and there would be a lot of contention for memory allocation and deallocation, is that right?
 7117:27 <jonatack> I don't know offhand
 7217:27 <jonatack> just making the distinction
 7317:27 <LarryRuane> i know there's a build that measures lock contention (`--enable-debug` i think), but i don't know if it measures dynamic memory lock contention
 7417:28 <LarryRuane> Here's question 3, might be kind of easy: Why is CCoinsMap implemented as an std::unordered_map used instead of a std::map? What are the advantages of each?
 7517:28 <hernanmarino> it's faster
 7617:28 <abubakar> for question 3.. Because Coins are not ordered, we want fast read time, std::unordered_map uses hash table while std::map uses BST.
 7717:28 <hernanmarino> unordered_map, I mean
 7817:29 <abubakar> Std::unordered is good to access element fast that dont care about order, whereas std::map. Is good for sorted entries.
 7917:29 <DaveBeer> unordered has O(1) access time while regular map has O(log(N)) access time and Coins don't need to be visited in any particular order, so unordered_map is enough
 8017:29 <pakaro> there's no need for order, since each utxo in the utxo is unique. +1 davebeer abubakar
 8117:30 <jonatack> yes, DEBUG_LOCKCONTENTION is now built into the debug build and you can enable the lock logging category to see it
 8217:30 <jonatack> see https://github.com/bitcoin/bitcoin/blob/master/doc/developer-notes.md#debug_lockcontention for more
 8317:30 <LarryRuane> hernanmarino: abubakar: DaveBeer: pakaro: +1 - good answers, those are my understandings too
 8417:31 <LarryRuane> i think i've noticed quite a few maps that could be unordered_maps ... but if they're not performance-critical, i guess it doesn't matter
 8517:31 <pablomartin> faster lookup, customizable hashing, faster insertion & removal, less memory usage...
 8617:31 <LarryRuane> i think someone said that when bitcoin core was first written, there was no `std::unordered_map` yet, only added later ... does anyone know if that's true?
 8717:31 <LarryRuane> so (if that's true), some are just leftover from the early days
 8817:32 <martinus> When maps are very small (say 20 elements or so) sometimes std::map can be faster, but in this case the CCoinMap is really huge so std::unordered_map is definitely faster
 8917:32 <DaveBeer> unordered_map was introduced only in C++11 I think
 9017:32 <hernanmarino> LarryRuane, I believe it s true
 9117:32 <hernanmarino> Davebeer: +1
 9217:32 <jonatack> (but our lock contention logging is only related to the locks we call in src/sync.h)
 9317:33 <LarryRuane> jonatack: i see, not standard library locks
 9417:33 <hernanmarino> Also, the custom allocators where used in this PR where introduced in C++17
 9517:33 <hernanmarino> that*
 9617:33 <LarryRuane> continue discussing but let me put question 6 out there... The PoolResource takes one argument, chunk_size_bytes. What does this argument do, and what are the tradeoffs when deciding on this value?
 9717:35 <martinus> hernanmarino: I think the std::allocators were introduced in C++11, in C++17 the pmr allocators were introduced. But they are still not implemented everywhere...
 9817:35 <hernanmarino> martinus: thanks for clarifying
 9917:35 <pakaro> the number of bytes to give to each request for memory-space?
10017:36 <LarryRuane> martinus: oh that's actually something i was wondering, the PR description says "A memory resource similar to std::pmr::unsynchronized_pool_resource, but optimized for node-based containers" ... I'm unsure if `pmr` could have been used, but would not have been as efficient? Or not possible?
10117:36 <pakaro> if it's too large, we are wasting memory, if it is too small, an entry would not fit, and i assume the program would crash...this is why we do the *4 iirc...i may be off here
10217:37 <jonatack> Bitcoin Core began with C++98, so yes, unordered_map didn't arrive until later IIRC
10317:38 <LarryRuane> pakaro: +1 on the first part, there's always some unused space in the most recent chunk ... but if it's too small, the PoolAllocator just does a regular new (and later delete)
10417:38 <LarryRuane> won't crash
10517:38 <martinus> LarryRuane: Actually I first implemented this PR by using the pmr allocators and that way it was quite a bit simpler, but unfortunately that did not compile everywhere because libc++ still doesn't implement it. I think on MacOS that didn't work. It worked locally though with my libstdc++
10617:39 <LarryRuane> oh i see... that must have been a frustrating day, if you had it all working on your machine!
10717:40 <LarryRuane> so on the question of choosing the chunk size, if it's too large there's wasted memory, and if it's too small, then there are more system allocations (which we know are somewhat slow)
10817:42 <LarryRuane> martinus: I was wondering, did you consider making `chunk_size_bytes` a template argument? What were the considerations there?
10917:44 <martinus> LarryRuane: yes I thought about making it either a template or an argument in the constructor, but then I though I should try to keep the PR as simple as possible. It's already complex enough and requires quite a lot of knowledge of internals. It might be a future extension if the allocator is used more widely
11017:44 <LarryRuane> I guess also, the more stuff you template, the more actual code memory usage there can be!
11117:45 <LarryRuane> since each combination of template values creates a whole new instance of all the code (IIUC)
11217:46 <LarryRuane> (but probably code memory usage is not important)
11317:47 <martinus> Also, allocating a big chunk of memory is actually quite cheap, at least in Linux this is done lazily: the malloc of the chunk doesn't really do much, only when the memory is actually used (read / write) the operating system goes and makes that page available.
11417:47 <Murch> Is that why you needed the two different variants of new?
11517:47 <LarryRuane> As long as you mentioned "if the allocator is used more widely" -- question 10 is: Are there other data structures that can take advantage of this new allocator?
11617:48 <Murch> To make sure it actually is made available?
11717:48 <martinus> Murch: no this is completely hidden and done from the kernel as far as I know, this has nothing to do with the different versions of new
11817:48 <Murch> Oh okay
11917:49 <LarryRuane> yes, there's the operator new: https://en.cppreference.com/w/cpp/memory/new/operator_new and the expression new: https://en.cppreference.com/w/cpp/language/new ... which i never knew about before!
12017:49 <Murch> TBH, this all seems like higher arcane mysteries to me ^^
12117:49 <LarryRuane> It's like the expression new can be given _already allocated_ memory, right? (they call it "placement-params")
12217:50 <hernanmarino> LarryRuane: On q. 10 Mempool is the first that comes to mind , but i don't think it can benefit a lot from this... perhaps something more heavily used during IBD? Blockindexes ?? I am not sure if they really use a lot of memory ...
12317:50 <LarryRuane> yes, the uses of `new` in this PR were unfamiliar
12417:50 <jonatack> Murch: once I tweeted about enjoying C++, and a long-time Ruby friend replied "Jon, are you OK? Blink twice if under duress" :)
12517:51 <martinus> ah yes, allocating with operator new and in-place construction with new(...) are completely different things, but they have the same name... It's C++ being as obfuscated as it can be
12617:51 <LarryRuane> hernanmarino: +1 - I was thinking mempool also, but I'm unsure
12717:51 <hernanmarino> I'm just guessing, didn thoroughly reviewd the code to answer this one
12817:51 <Murch> jonatack: Yeah, I’m definitely still not in it because of C++ 😅
12917:51 <hernanmarino> :)
13017:52 <Murch> 😀😆😀😆
13117:52 <LarryRuane> martinus: kudos for figuring out some of this very weird (but needed) c++!
13217:53 <Murch> hernanmarino: I’m not sure. A lot of transactions have pretty similar sizes, and the big ones could just be allocated separately. Might be useful to cover the ~24% one input one output txs, and the 46% one input two output txs
13317:54 <LarryRuane> I think this is a great PR, I really want it to merge ... it was really interesting to read about people's concerns with safety (i think that's why the earlier attempt didn't happen)
13417:55 <LarryRuane> Okay only 5 minutes left, anyone want to take on question 9: Can you think of a usage pattern (perhaps outside of the coins cache) that might cause this allocator to use more memory than the standard allocator?
13517:55 <LarryRuane> Also, if anyone would like to throw out any answers to the sub-questions in question 5... I think those are interesting c++ questions
13617:56 <hernanmarino> Murch: Agree. I was also thinking about something more heavily used during IBD, and perhaps it's not the case with the mempool
13717:56 <pakaro> my understanding of static_assert is that it will throw a compilation error in debug mode, when an 'assert' will not throw an error in debug mode.
13817:57 <pakaro> re. q5
13917:57 <LarryRuane> pakaro: no actually, assert crashes the node even in a non-debug build
14017:58 <pakaro> then im not sure of the difference between static_assert & assert
14117:58 <hernanmarino> re q9 I'm just guessing, but perhaps allocating only a few big chunks of memory, instead of several small ones ...
14217:58 <LarryRuane> it's interesting to look at *where* those `static_assert`s are ... they're not inside functions! (for fun I tried changing one of them to a regular assert, and it wouldn't compile)
14317:58 <abubakar> pre allocated memory more than necessary might result to alot of unused memory
14417:58 <jonatack> pakaro: static assert is checked at compilation time
14517:58 <jonatack> vs runtime for assert
14617:58 <pablomartin> q9: perhaps in scenarios where the data structure contains a mixture of large and small objects
14717:58 <LarryRuane> yes, static_assert triggers for any kind of build
14817:59 <LarryRuane> pablomartin: yes, that's what i was thinking ... like if you allocate a bunch of one size of objects, then free them all ... then allocate a bunch of other sizes but never that first size, then they're sort of stuck in their freelist
14918:00 <LarryRuane> ok we're at time, thanks to all of you!! and especially @martinus ! Please go review the PR!
15018:00 <martinus> static_asserts are great because it immediately gives you a compile error. So if it compiles you can be sure that all static_assert's are correct. assert() are only evaluated when that code is actually run, so ideally use static_assert when possible
15118:00 <abubakar> final ensures that the PoolResource class can be modified outside it's scope
15218:00 <pablomartin> thanks all!
15318:00 <LarryRuane> #endmeeting
15418:00 <effexzi> Thanks every1
15518:00 <martinus> Thanks a lot LarryRuane for moderating this meeting!
15618:00 <abubakar> LarryRuane: thanks for hosting
15718:01 <Murch> Thanks Larry
15818:01 <hernanmarino> thanks Larry for hosting, and martinus for great improvement
15918:01 <LarryRuane> yes, and a regular assert might not be hit ... until post-release, on user machines!
16018:01 <Murch> Also thanks to martinus for joining and giving more details!
16118:02 <DaveBeer> thanks Larry and martinus
16218:02 <pakaro> computer crashed. assuming end has been called. thanks everyone!
16318:02 <LarryRuane> pakaro: yes, thanks for being here!
16418:03 <jonatack> LarryRuane: (ALIGN_BYTES & (ALIGN_BYTES - 1)) == 0 tests for being a multiple of two IIRC, and the reason to use it over modulo could be performance. If I'm not confused, I reviewed it a while ago.
16518:03 <LarryRuane> abubakar: I think of `final` as meaning that you can't create a class that uses this one as its base class ... but I'm not really sure if there's any special reason it's used here
16618:04 <Murch> jonatack: Yeah, that was pretty nifty
16718:04 <jonatack> (IIRC modulo is a fairly slow operation)
16818:04 <Amirreza> Hi everyone
16918:04 <pakaro> +1 murch very nifty
17018:04 <LarryRuane> jonatack: yes that's correct ... only a power of 2, or zero, will make that expression zero
17118:04 <LarryRuane> but you can't even use modulo, right?
17218:05 <jonatack> LarryRuane: was referring to the comment at https://github.com/bitcoin/bitcoin/pull/25325/files#r1124914676
17318:05 <Murch> Amirreza: Welcome, however the Bitcoin Core PR Review Club just finished
17418:05 <LarryRuane> you essentially want to check of just 1 bit is set ... there are some instruction sets that have a "popcount" (number of bits set) instruction, but obviously that's pretty low-level
17518:05 <jonatack> I recall messing around with adding static asserts with modulo as you suggested and checking
17618:06 <martinus> yeah the "final" here is not really necessary, it's just that no one can use it as a base class. Actually I'm not sure why I have it here, but it doesn't hurt
17718:06 <Amirreza> Murch, Oh, I'm too late. I just did a mistake in timing!
17818:06 <LarryRuane> martinus: ok cool ... is there any special reason for the `[[nodiscard]]`? I think it's nice, but any special reason?
17918:07 <jonatack> LarryRuane: nodiscard is good to add for getters to ensure that the result is in fact used
18018:08 <jonatack> LarryRuane: TIL wrt to your popcount comment, thanks!
18118:08 <LarryRuane> ok, makes sense, because it's a pure function i think it's called (no side-effects), so if you're not using the result why are you calling it??
18218:08 <martinus> I usually use [[nodiscard]] everywhere, because when I return something I usually want that to be used; and if one doesn't use the return value it usually means that someone called the method but didn't have to. Especially all const methods should be [[nodiscard]]
18318:09 <LarryRuane> martinus: i see, good to know... almost seems like `const` functions can be nodiscard by default?
18418:09 <martinus> LarryRuane: yes I think it would be better to have that as default, but It's C++ so it won't change :)
18518:09 <jonatack> martinus: good points
18618:10 <LarryRuane> but it is interesting i guess that even `const` methods can have side-effects -- the `mutable` variables ... but callers shouldn't know about those side-effects
18718:10 <LarryRuane> martinus: thanks.. I think your code is really good for all of us to use as a model!
18818:11 <martinus> thanks!
18918:12 <jonatack> Thanks for your long-standing work and patience to improve the performance here martinus, and thank you LarryRuane for choosing to review this pull and host a review club about it.
19018:13 <pakaro> +1 jonatack - reading the comments from 2019 puts in perspective the amount of deliberation that goes into a pr like this
19118:13 <DaveBeer> yeah this one was hard for me to parse '=D    void* storage = ::operator new (m_chunk_size_bytes, std::align_val_t{ELEM_ALIGN_BYTES});
19218:13 <DaveBeer>         m_available_memory_it = new (storage) std::byte[m_chunk_size_bytes];
19318:14 <LarryRuane> jonatack: +1 about martinus, and you're very welcome! I really enjoyed learning about this one!
19418:15 <pakaro> +larryruane i learnt a lot from your notes and comments in the pr
19518:16 <LarryRuane> DaveBeer: yes me too, but i think it's *almost* the same as a normal new, like `void* storage = new std::bytes[m_chunk_size_bytes]` except that we want it aligned (the default new with a one-byte size wouldn't be aligned by 8 bytes)
19618:16 <abubakar> same really good notes Larry :)
19718:16 <LarryRuane> pakaro: abubakar: thanks
19818:18 <jonatack> +1 excellent notes. Oh and whoever PRs the meeting log could maybe also drop the word "used" from Question 3.
19918:19 <martinus> right, I want it to be aligned correctly. Also note that I'm always using ::operator new (the initial ::), because otherwise it could be that we accidentally use an overloaded version of operator new
20018:19 <DaveBeer> LarryRuane: I see, thanks, I wasn't even aware of the new expression until today
20118:20 <LarryRuane> oh good catch @jonatack yes I'll fix that!
20218:21 <LarryRuane> martinus: oh that's really interesting, i thought it was only so you could specify the alignment ... can you specify alignment without the `::operator`?
20318:22 <DaveBeer> * new expression with placement params
20418:23 <LarryRuane> DaveBeer: i wasn't either, until a few days ago!
20518:25 <martinus> You can use void* storage = operator new (m_chunk_size_bytes, std::align_val_t{ELEM_ALIGN_BYTES}); just fine, but then you risk of using some other operator new. It's safer to always use ::operator new, unless you really want other operator new to work. So far I never needed that...