Split hashing/index `GetUTXOStats` codepaths, decouple from `coinstatsindex` (utxo db and indexes)

https://github.com/bitcoin/bitcoin/pull/24410

Host: dongcarl  -  PR author: dongcarl

Notes

  • The gettxoutsetsetinfo RPC uses the GetUTXOStats function which calculates statistics about the UTXO set. These statistics include the the total number of transactions, the total amount (in bitcoins) of all outputs, etc.

  • A Coinstats Index was added in #19521 that dramatically sped up the gettxoutsetinfo RPC as it retains UTXO statistics calculation results for every block. The Coinstats Index also allows for querying UTXO statistics for a particular block instead of just current tip.

  • The libbitcoinkernel project is an effort to decouple Bitcoin Core’s consensus engine from other non-consensus modules (such as the various indices) in the codebase.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. On a conceptual level, which modules in Bitcoin Core likely belong in libbitcoinkernel and which ones don’t?

  3. Where in the Bitcoin Core codebase can you see libbitcoinkernel’s entanglement with index? (There might be multiple places)

  4. What is the CCoinsStats struct, and how is it used in GetUTXOStats?

  5. Please describe how GetUTXOStats is called in ChainstateManager::PopulateAndValidateSnapshot and what codepath(s) it takes when called in this way.

  6. Please describe how GetUTXOStats is called for the gettxoutsetinfo RPC and what codepath(s) it takes when called in this way.

  7. After this PR is merged, what would happen if a contributor re-introduced a dependency from validation to the Coinstats Index?

Meeting Log

  117:00 <dongcarl> #startmeeting
  217:00 <dongcarl> hello all
  317:00 <josibake> hi
  417:00 <larryruane> hi!
  517:00 <lightlike> hi
  617:01 <glozow> hi!
  717:01 <dongcarl> we'll wait a few secs for people to show up, please say "hi" if you're here and at your keyboard!
  817:02 <dongcarl> We'll be discussing [kernel 2a/n] Split hashing/index `GetUTXOStats` codepaths, decouple from `coinstatsindex` today, link: https://bitcoincore.reviews/24410
  917:02 <sipa> "hi"
 1017:02 <michaelfolkson> hi
 1117:02 <otech> 👋
 1217:02 <oliver92> sup
 1317:03 <dongcarl> Just so people know the ground rules: If you have a question, you don't have to ask to ask a question, just go ahead and ask!
 1417:03 <larryruane> (sipa doesn't really mean it)
 1517:03 <dongcarl> larryruane: He's being literal! :-)
 1617:03 <larryruane> :)
 1717:04 <dongcarl> Okay! Let's get started.
 1817:04 <dongcarl> Did everyone get a chance to review the PR? How about a quick y/n from everyone
 1917:04 <josibake> y
 2017:04 <michaelfolkson> y
 2117:04 <oliver92> n
 2217:04 <larryruane> n
 2317:04 <lightlike> y
 2417:05 <otech> n
 2517:05 <dongcarl> Okay, let's get into the questions...
 2617:05 <dongcarl> For those of you who reviewed... Concept ACK, approach ACK, tested ACK, or NACK?
 2717:06 <josibake> Concept ACK
 2817:06 <effexzi> N
 2917:07 <michaelfolkson> Approach ACK. A good choice for 2a :)
 3017:07 <lightlike> approach ACK
 3117:07 <dongcarl> Cool. Next one!
 3217:07 <dongcarl> On a conceptual level, which modules in Bitcoin Core likely belong in libbitcoinkernel and which ones don’t?
 3317:08 <oliver92> I'd say anything consensus-related && stateless
 3417:08 <josibake> what exactly does module mean in this context? is there a specific meaning or just more generally "a logical collection of code"
 3517:08 <larryruane> at a very high level, any code that, if implemented differently, could cause consensus failures (chain split) should be in libbitcoinkernel (else not)
 3617:09 <larryruane> (differently on different nodes on the network)
 3717:09 <dongcarl> josibake: Logical collection of code!
 3817:09 <michaelfolkson> Yeah node validation, not P2P, wallet, GUI etc
 3917:09 <antonleviathan> o/
 4017:09 <larryruane> RBF policies are an example of what would NOT be included
 4117:10 — dongcarl waves hello to the newcomers
 4217:10 <dongcarl> Right, lots of good answers there.
 4317:10 <michaelfolkson> secp256k1?!
 4417:10 <larryruane> definitely!
 4517:10 <michaelfolkson> Surely part of the consensus engine
 4617:10 <dongcarl> To touch on oliver92's answer, statelessness is not a goal of libbitcoinkernel, but was a goal of libbitcoinconsensus (as I understood it)
 4717:11 <dongcarl> As michaelfolkson said, P2P, Wallet, GUI are all things that definitely do not belong in libbitcoinkernel
 4817:12 <larryruane> if some part of the code is "stateless" (just so i understand), does that mean only has pure functions?
 4917:12 <dongcarl> larryruane: In my view it'd just not have persistence, there might be local state
 5017:13 <michaelfolkson> secp256k1 would be a separate library from libbitcoinkernel but still part of the consensus engine
 5117:13 <dongcarl> michaelfolkson: Right now, we embed secp256k1 inside libbitcoinkernel
 5217:13 <michaelfolkson> Ohh ok, cool
 5317:13 <svav> Hi
 5417:14 <dongcarl> The overall philosophy is "outside-in" and "incremental" meaning that we'll slowly whittle things down, and not aim for a minimal library all at once :-)
 5517:14 <dongcarl> Let's move on!
 5617:14 <dongcarl> Where in the Bitcoin Core codebase can you see libbitcoinkernel’s entanglement with index? (There might be multiple places)
 5717:14 <dongcarl> (This is as of master)
 5817:14 <lightlike> will the mempool be taken out of libbitcoinkernel too?
 5917:15 <dongcarl> lightlike: Good question! I think we can ship libbitcoinkernel with mempool (since mempool <-> validation is quite tightly coupled), and most external users will want to use Core's policy, but perhaps there will be a configure flag later to not link it in.
 6017:18 <josibake> at the risk of giving the obvious answer, we see `libitcoinkernel` entangled with index is when calling `PopulateAndValidateSnapshots`
 6117:18 <dongcarl> josibake: That is correct :-)
 6217:18 <josibake> `PopulateAndValidateSnapshots` is only for assumeutxo, correct?
 6317:19 <dongcarl> josibake: Yup!
 6417:19 <fanquake> mempool = boost 🥲
 6517:20 <dongcarl> fanquake: I feel 🥲 too, but we do one thing at a time in this project!
 6617:20 <sipa> boost is headers only this day, no?
 6717:20 <josibake> just curious (and feel free to bunt if this is off topic) but why is assumeutxo in libbitcoinkernel? instead of breaking up `GetUTXOstats` , why not move assumeutxo out of libbitcoinkernel?
 6817:20 <fanquake> sipa: yea
 6917:20 <sipa> @fanquake Not a big concern then, as it doesn't cause a libbitcoinkernel dependency.
 7017:20 <fanquake> multi_index might even be in-tree soon enough anyways
 7117:21 <dongcarl> josibake: I think moving assumeutxo out of validation will be much more work than separating the two codepaths of GetUTXOStats, and the two codepaths needed separation in any case
 7217:21 <dongcarl> Okay back to the question
 7317:21 <fanquake> It's a dependency in the sense that you still need boost. no libs sure
 7417:21 <dongcarl> Can anyone spot another point where you can see the entaglement between libbitcoinkernel and index/ in the build system code?
 7517:21 <lightlike> btw, the function is "PopulateAndValidateSnapshot" (without an s) in case anyone else was looking it up
 7617:22 <dongcarl> (e.g. src/Makefile.am)
 7717:22 <sipa> @fanquake I meant runtime dependency, but indeed.
 7817:25 <dongcarl> For the sake of time, I'll answer my own question, the index/ files are listed under libbitcoinkernel_la_SOURCES, and if you remove them, there will be linker errors: https://github.com/bitcoin/bitcoin/blob/9db941d7737406b8593024ba130c3f9c186af4c6/src/Makefile.am#L848-L875
 7917:26 <dongcarl> Next! Question 4: What is the CCoinsStats struct, and how is it used in GetUTXOStats?
 8017:26 <josibake> so as of now, the only index one remaining (after this pr) is `index/blockfilterindex.cpp`
 8117:27 <dongcarl> josibake: Kinda! That was just leftover from #21726, it's removed in https://github.com/bitcoin/bitcoin/pull/24410/commits/195c96ad88bf97aee3ff5597339659451470864f
 8217:30 <lightlike> CCoinsStats has various kinds of statistics about the utxos at one block, and GetUTXOStats fills it with the correct info
 8317:31 <dongcarl> lightlike: Yup! There's a bit more nuance to that though, hint: what you say is true as of after my PR, but not technically true in master.
 8417:33 <michaelfolkson> dongcarl: Can you elaborate? Looks to me like that that is true in master...
 8517:33 <lightlike> oh yes, right now it's also pre-filled with inputs, such as whether to use an index or not
 8617:34 <lightlike> and the hash type. so it's both in and out
 8717:34 <dongcarl> lightlike: That's exactly right, CCoinsStats (as of master) is a struct that’s serving the role of an in-out param in GetUTXOStats, as in it contains certain members that are logical inputs to GetUTXOStats (hash_type, index_requested), and certain members that are logical outputs of GetUTXOStats (hashSerialized, etc.).
 8817:35 <josibake> ah, i now understand what was meant by in-out param in the PR :)
 8917:35 <dongcarl> Note that in this PR, we remove the logical inputs to make CCoinsStats a pure out-param
 9017:35 <dongcarl> josibake: Oh! Perhaps I should explain the terminology more carefully in the commit messages :-)
 9117:36 <dongcarl> Okay on to the next one
 9217:36 <dongcarl> Question 5: Please describe how GetUTXOStats is called in ChainstateManager::PopulateAndValidateSnapshot and what codepath(s) it takes when called in this way.
 9317:36 <dongcarl> Link for ChainstateManager::PopulateAndValidateSnapshot https://github.com/bitcoin/bitcoin/blob/12455acca2c3adf5c88ae9c1a02a7c192fe0f53b/src/validation.cpp#L4970
 9417:36 <dongcarl> Note: We're looking at master
 9517:38 <dongcarl> Not: We don't particularly care about how PopulateAndValidateSnapshot works, just how it calls GetUTXOStats
 9617:39 <dongcarl> s/Not/Note/
 9717:40 <lightlike> it uses HASH_SERIALIZED as hash type and the default for index_requested (true) - but the latter doesnt matter with HASH_SERIALIZED.
 9817:41 <dongcarl> lightlike: Expand on that a bit, why doesn't the latter matter with HASH_SERIALIZED?
 9917:42 <lightlike> because in GetUTXOStats, we only use the index to lookup the data for hash type MUHASH or NONE.
10017:44 <dongcarl> Yup that's exactly right :-) There's no chance for us to use the coinstatsindex in the way that we call GetUTXOStats from PopulateAndValidateSnapshot, and that's why we can remove it from libbitcoinkernel
10117:44 <lightlike> https://github.com/bitcoin/bitcoin/blob/9db941d7737406b8593024ba130c3f9c186af4c6/src/node/coinstats.cpp#L109
10217:44 <dongcarl> Exactly!
10317:45 <dongcarl> Okay, a similar question, but a little more involved: Please describe how GetUTXOStats is called for the gettxoutsetinfo RPC and what codepath(s) it takes when called in this way.
10417:45 <dongcarl> Link for gettxoutsetinfo: https://github.com/bitcoin/bitcoin/blob/194b414697777b5ac9d9918004b851dbd4f8ce17/src/rpc/blockchain.cpp#L811
10517:46 <dongcarl> Note: this one will require looking at a little bit of surrounding code in the gettxoutsetinfo RPC function
10617:49 <lightlike> not sure if this is what is asked, but on a high level it sets the parameter based on user input. i.e. the user can specify the hash type and whether to use an index, and the parameters are set accordingly.
10717:50 <josibake> yeah, same answer. the user can pass in hash_type, where its one of "hash_serialized_2", "muhash", "none"
10817:51 <dongcarl> All correct, I think the thing worth pointing out is that when the user specifies a particular block, they have to use the coinstatsindex
10917:51 <josibake> also, oddly, it's called once in the first if block and the boolean is checked, and then it's called again later in the same if block but the boolean isn't checked
11017:51 <josibake> which seems odd
11117:52 <dongcarl> josibake: That's an oversight that we fix in my PR actually haha
11217:52 <josibake> dongcarl: nice!
11317:53 <dongcarl> Okay last question!
11417:53 <dongcarl> After this PR is merged, what would happen if a contributor re-introduced a dependency from validation to the Coinstats Index?
11517:54 <dongcarl> (in the C++ code)
11617:54 <larryruane> linker error?
11717:55 <dongcarl> larryruane: That's correct! Can you tell me why?
11817:57 <larryruane> I thought I understood but now I'm not sure :)
11917:58 <lightlike> I'd say CI error (isn't bitcoin-chainstate currently not built by default, but only when some extra flag is specified?)
12017:58 <dongcarl> lightlike: Ah! We are assuming that we build bitcoin-chainstate :-)
12117:59 <lightlike> maybe we all should, I haven't updated my script yet :-)
12217:59 <dongcarl> Here's my answer: Because index/coinstatsindex.cpp is no longer listed in libbitcoinkernel_la_SOURCES, it won't be linked into libbitcoinkernel, therefore anything that links against libbitcoinkernel (e.g. bitcoin-chainstate) will fail to link since the coinstatsindex symbols aren't defined.
12317:59 <dongcarl> lightlike: :-)
12418:00 <dongcarl> Okay! Thanks everyone for attending!
12518:00 <josibake> thanks for hosting! i learned a lot
12618:00 <otech> thanks!
12718:00 <lightlike> thanks dongcarl!
12818:00 <josibake> also kudos on the PR, the commits are broken up very nicely
12918:00 <larryruane> thanks carl!
13018:00 <dongcarl> Was great fun talking with y'all, and feel free to stick around for questions/discussions!
13118:00 <svav> Thanks dongcarl and all
13218:00 <dongcarl> #endmeeting
13318:00 <dongcarl> Poor s t i c k
13418:02 <michaelfolkson> Thanks dongcarl!
13518:05 <effexzi> Thanks every1
13618:05 <michaelfolkson> If anyone is interested in a high level overview of the project Carl did a great short video https://btctranscripts.com/chaincode-labs/2022-04-12-carl-dong-libbitcoinkernel/
13718:05 <michaelfolkson> https://www.youtube.com/watch?v=MdxIkH6GCBs
13818:05 <dongcarl> michaelfolkson: Oh! Thanks for writing up the transcript!