Split hashing/index `GetUTXOStats` codepaths, decouple from `coinstatsindex` (
utxo db and indexes) May 11, 2022
PR author: dongcarl
gettxoutsetsetinfo RPC uses the
GetUTXOStats function which calculates statistics about
the UTXO set. These statistics include the the total number of transactions, the total amount (in
bitcoins) of all outputs, etc.
A Coinstats Index was added in
dramatically sped up the
gettxoutsetinfo RPC as it retains UTXO statistics calculation results
for every block. The Coinstats Index also allows for querying UTXO statistics for a particular block
instead of just current tip.
The libbitcoinkernel project is an effort to
decouple Bitcoin Core’s consensus engine from other non-consensus modules (such as the various
indices) in the codebase. Questions
Did you review the PR?
Concept ACK, approach ACK, tested ACK, or
On a conceptual level, which modules in Bitcoin Core likely belong in
which ones don’t?
Where in the Bitcoin Core codebase can you see
libbitcoinkernel’s entanglement with index?
(There might be multiple places)
What is the
CCoinsStats struct, and how is it used in
Please describe how
GetUTXOStats is called in
and what codepath(s) it takes when called in this way.
Please describe how
GetUTXOStats is called for the
RPC and what codepath(s) it takes when called in this way.
After this PR is merged, what would happen if a contributor re-introduced a dependency from
validation to the Coinstats Index?
1 17:00 <dongcarl> #startmeeting
2 17:00 <dongcarl> hello all
7 17:01 <dongcarl> we'll wait a few secs for people to show up, please say "hi" if you're here and at your keyboard!
8 17:02 <dongcarl> We'll be discussing [kernel 2a/n] Split hashing/index `GetUTXOStats` codepaths, decouple from `coinstatsindex` today, link: https://bitcoincore.reviews/24410
10 17:02 <michaelfolkson> hi
13 17:03 <dongcarl> Just so people know the ground rules: If you have a question, you don't have to ask to ask a question, just go ahead and ask!
14 17:03 <larryruane> (sipa doesn't really mean it)
15 17:03 <dongcarl> larryruane: He's being literal! :-)
17 17:04 <dongcarl> Okay! Let's get started.
18 17:04 <dongcarl> Did everyone get a chance to review the PR? How about a quick y/n from everyone
20 17:04 <michaelfolkson> y
25 17:05 <dongcarl> Okay, let's get into the questions...
26 17:05 <dongcarl> For those of you who reviewed... Concept ACK, approach ACK, tested ACK, or NACK?
27 17:06 <josibake> Concept ACK
29 17:07 <michaelfolkson> Approach ACK. A good choice for 2a :)
30 17:07 <lightlike> approach ACK
31 17:07 <dongcarl> Cool. Next one!
32 17:07 <dongcarl> On a conceptual level, which modules in Bitcoin Core likely belong in libbitcoinkernel and which ones don’t?
33 17:08 <oliver92> I'd say anything consensus-related && stateless
34 17:08 <josibake> what exactly does module mean in this context? is there a specific meaning or just more generally "a logical collection of code"
35 17:08 <larryruane> at a very high level, any code that, if implemented differently, could cause consensus failures (chain split) should be in libbitcoinkernel (else not)
36 17:09 <larryruane> (differently on different nodes on the network)
37 17:09 <dongcarl> josibake: Logical collection of code!
38 17:09 <michaelfolkson> Yeah node validation, not P2P, wallet, GUI etc
39 17:09 <antonleviathan> o/
40 17:09 <larryruane> RBF policies are an example of what would NOT be included
41 17:10 — dongcarl waves hello to the newcomers
42 17:10 <dongcarl> Right, lots of good answers there.
43 17:10 <michaelfolkson> secp256k1?!
44 17:10 <larryruane> definitely!
45 17:10 <michaelfolkson> Surely part of the consensus engine
46 17:10 <dongcarl> To touch on oliver92's answer, statelessness is not a goal of libbitcoinkernel, but was a goal of libbitcoinconsensus (as I understood it)
47 17:11 <dongcarl> As michaelfolkson said, P2P, Wallet, GUI are all things that definitely do not belong in libbitcoinkernel
48 17:12 <larryruane> if some part of the code is "stateless" (just so i understand), does that mean only has pure functions?
49 17:12 <dongcarl> larryruane: In my view it'd just not have persistence, there might be local state
50 17:13 <michaelfolkson> secp256k1 would be a separate library from libbitcoinkernel but still part of the consensus engine
51 17:13 <dongcarl> michaelfolkson: Right now, we embed secp256k1 inside libbitcoinkernel
52 17:13 <michaelfolkson> Ohh ok, cool
54 17:14 <dongcarl> The overall philosophy is "outside-in" and "incremental" meaning that we'll slowly whittle things down, and not aim for a minimal library all at once :-)
55 17:14 <dongcarl> Let's move on!
56 17:14 <dongcarl> Where in the Bitcoin Core codebase can you see libbitcoinkernel’s entanglement with index? (There might be multiple places)
57 17:14 <dongcarl> (This is as of master)
58 17:14 <lightlike> will the mempool be taken out of libbitcoinkernel too?
59 17:15 <dongcarl> lightlike: Good question! I think we can ship libbitcoinkernel with mempool (since mempool <-> validation is quite tightly coupled), and most external users will want to use Core's policy, but perhaps there will be a configure flag later to not link it in.
60 17:18 <josibake> at the risk of giving the obvious answer, we see `libitcoinkernel` entangled with index is when calling `PopulateAndValidateSnapshots`
61 17:18 <dongcarl> josibake: That is correct :-)
62 17:18 <josibake> `PopulateAndValidateSnapshots` is only for assumeutxo, correct?
63 17:19 <dongcarl> josibake: Yup!
64 17:19 <fanquake> mempool = boost 🥲
65 17:20 <dongcarl> fanquake: I feel 🥲 too, but we do one thing at a time in this project!
66 17:20 <sipa> boost is headers only this day, no?
67 17:20 <josibake> just curious (and feel free to bunt if this is off topic) but why is assumeutxo in libbitcoinkernel? instead of breaking up `GetUTXOstats` , why not move assumeutxo out of libbitcoinkernel?
68 17:20 <fanquake> sipa: yea
69 17:20 <sipa> @fanquake Not a big concern then, as it doesn't cause a libbitcoinkernel dependency.
70 17:20 <fanquake> multi_index might even be in-tree soon enough anyways
71 17:21 <dongcarl> josibake: I think moving assumeutxo out of validation will be much more work than separating the two codepaths of GetUTXOStats, and the two codepaths needed separation in any case
72 17:21 <dongcarl> Okay back to the question
73 17:21 <fanquake> It's a dependency in the sense that you still need boost. no libs sure
74 17:21 <dongcarl> Can anyone spot another point where you can see the entaglement between libbitcoinkernel and index/ in the build system code?
75 17:21 <lightlike> btw, the function is "PopulateAndValidateSnapshot" (without an s) in case anyone else was looking it up
76 17:22 <dongcarl> (e.g. src/Makefile.am)
77 17:22 <sipa> @fanquake I meant runtime dependency, but indeed.
79 17:26 <dongcarl> Next! Question 4: What is the CCoinsStats struct, and how is it used in GetUTXOStats?
80 17:26 <josibake> so as of now, the only index one remaining (after this pr) is `index/blockfilterindex.cpp`
82 17:30 <lightlike> CCoinsStats has various kinds of statistics about the utxos at one block, and GetUTXOStats fills it with the correct info
83 17:31 <dongcarl> lightlike: Yup! There's a bit more nuance to that though, hint: what you say is true as of after my PR, but not technically true in master.
84 17:33 <michaelfolkson> dongcarl: Can you elaborate? Looks to me like that that is true in master...
85 17:33 <lightlike> oh yes, right now it's also pre-filled with inputs, such as whether to use an index or not
86 17:34 <lightlike> and the hash type. so it's both in and out
87 17:34 <dongcarl> lightlike: That's exactly right, CCoinsStats (as of master) is a struct that’s serving the role of an in-out param in GetUTXOStats, as in it contains certain members that are logical inputs to GetUTXOStats (hash_type, index_requested), and certain members that are logical outputs of GetUTXOStats (hashSerialized, etc.).
88 17:35 <josibake> ah, i now understand what was meant by in-out param in the PR :)
89 17:35 <dongcarl> Note that in this PR, we remove the logical inputs to make CCoinsStats a pure out-param
90 17:35 <dongcarl> josibake: Oh! Perhaps I should explain the terminology more carefully in the commit messages :-)
91 17:36 <dongcarl> Okay on to the next one
92 17:36 <dongcarl> Question 5: Please describe how GetUTXOStats is called in ChainstateManager::PopulateAndValidateSnapshot and what codepath(s) it takes when called in this way.
94 17:36 <dongcarl> Note: We're looking at master
95 17:38 <dongcarl> Not: We don't particularly care about how PopulateAndValidateSnapshot works, just how it calls GetUTXOStats
96 17:39 <dongcarl> s/Not/Note/
97 17:40 <lightlike> it uses HASH_SERIALIZED as hash type and the default for index_requested (true) - but the latter doesnt matter with HASH_SERIALIZED.
98 17:41 <dongcarl> lightlike: Expand on that a bit, why doesn't the latter matter with HASH_SERIALIZED?
99 17:42 <lightlike> because in GetUTXOStats, we only use the index to lookup the data for hash type MUHASH or NONE.
100 17:44 <dongcarl> Yup that's exactly right :-) There's no chance for us to use the coinstatsindex in the way that we call GetUTXOStats from PopulateAndValidateSnapshot, and that's why we can remove it from libbitcoinkernel
102 17:44 <dongcarl> Exactly!
103 17:45 <dongcarl> Okay, a similar question, but a little more involved: Please describe how GetUTXOStats is called for the gettxoutsetinfo RPC and what codepath(s) it takes when called in this way.
105 17:46 <dongcarl> Note: this one will require looking at a little bit of surrounding code in the gettxoutsetinfo RPC function
106 17:49 <lightlike> not sure if this is what is asked, but on a high level it sets the parameter based on user input. i.e. the user can specify the hash type and whether to use an index, and the parameters are set accordingly.
107 17:50 <josibake> yeah, same answer. the user can pass in hash_type, where its one of "hash_serialized_2", "muhash", "none"
108 17:51 <dongcarl> All correct, I think the thing worth pointing out is that when the user specifies a particular block, they have to use the coinstatsindex
109 17:51 <josibake> also, oddly, it's called once in the first if block and the boolean is checked, and then it's called again later in the same if block but the boolean isn't checked
110 17:51 <josibake> which seems odd
111 17:52 <dongcarl> josibake: That's an oversight that we fix in my PR actually haha
112 17:52 <josibake> dongcarl: nice!
113 17:53 <dongcarl> Okay last question!
114 17:53 <dongcarl> After this PR is merged, what would happen if a contributor re-introduced a dependency from validation to the Coinstats Index?
115 17:54 <dongcarl> (in the C++ code)
116 17:54 <larryruane> linker error?
117 17:55 <dongcarl> larryruane: That's correct! Can you tell me why?
118 17:57 <larryruane> I thought I understood but now I'm not sure :)
119 17:58 <lightlike> I'd say CI error (isn't bitcoin-chainstate currently not built by default, but only when some extra flag is specified?)
120 17:58 <dongcarl> lightlike: Ah! We are assuming that we build bitcoin-chainstate :-)
121 17:59 <lightlike> maybe we all should, I haven't updated my script yet :-)
122 17:59 <dongcarl> Here's my answer: Because index/coinstatsindex.cpp is no longer listed in libbitcoinkernel_la_SOURCES, it won't be linked into libbitcoinkernel, therefore anything that links against libbitcoinkernel (e.g. bitcoin-chainstate) will fail to link since the coinstatsindex symbols aren't defined.
123 17:59 <dongcarl> lightlike: :-)
124 18:00 <dongcarl> Okay! Thanks everyone for attending!
125 18:00 <josibake> thanks for hosting! i learned a lot
127 18:00 <lightlike> thanks dongcarl!
128 18:00 <josibake> also kudos on the PR, the commits are broken up very nicely
129 18:00 <larryruane> thanks carl!
130 18:00 <dongcarl> Was great fun talking with y'all, and feel free to stick around for questions/discussions!
131 18:00 <svav> Thanks dongcarl and all
132 18:00 <dongcarl> #endmeeting
133 18:00 <dongcarl> Poor s t i c k
134 18:02 <michaelfolkson> Thanks dongcarl!
135 18:05 <effexzi> Thanks every1
138 18:05 <dongcarl> michaelfolkson: Oh! Thanks for writing up the transcript!