Mar 4, 2020
The PR branch HEAD was a50f956e at the time of this review club meeting.
NOTE! Earlier versions of this PR had a different scope from the current
branch. The discussion prior to this
by the author is therefore no longer relevant for the PR in its current form. Notes
-reindex argument wipes and rebuilds the two leveldb databases for the
block index and the chainstate, using the block files saved on disk. This
procedure is completely local and requires no interactions with the p2p
network. For help on the argument see
bitcoind -help | grep -A1 reindex.
Reindexing is a lengthy procedure taking several hours on mainnet and
consists of two steps:
Rebuilding the index of blocks based on the
blk*.dat files saved in
Rebuilding the chainstate (UTXO set) by fully validating each block
starting from genesis using the block index created in step 1.
There is a second command,
reindex-chainstate that will only perform step 2.
This PR improves the runtime of step 1 and does not affect step 2 (which
is already highly optimized since it uses the same validation functions that
are used for connecting new blocks received from the network).
Reindex uses the
CBufferedFile stream, introduced in
#1962, which has a buffer to
allow “rewinding” the stream position to an earlier position without additional
disk access. In the merged PR
#16577, the author of this
week’s PR fixed a bug in
CBufferedFile and added comprehensive unit tests.
The reindexing happens in
LoadExternalBlockFile() (in validation.cpp), which is
called for each block file. The block index is rebuilt by calling
AcceptBlock() for each block in the chain in correct order.
However, blocks are usually not saved to disk in the correct order during IBD
(listen to Pieter Wuille’s recent
more background, starting at 3:30). Therefore, the parent hash of blocks that
are encountered without a known parent is saved in the map
mapBlocksUnknownParent. After accepting a block, the reindex algorithm
recursively finds all blocks from
mapBlocksUnknownParent which are ready for
processing now, tries to accept them and removes them from the map.
Before this PR, we would always read entire blocks from disk into our buffer.
If the predecessor of a block was not available, the blocks would have to be
read again at a later point. This PR changes behavior such that initially only
the 80 byte block header is read into the buffer, and if we can’t accept the
block yet, the rest of the block is skipped for now.
Did you review the PR?
Concept ACK, approach ACK, tested ACK, or
(Don’t forget to put your PR review on GitHub.)
In which situations is
reindex useful? When is it better to use
What does the block index database look like? What information is stored
Can you explain why this PR reduces the runtime of reindexing considerably?
How good is the test coverage of the reindexing (functional and unit tests)?
Do you think it can be improved?
Can you think of further optimizations of reindexing beyond this PR that are
1 13:00 <jnewbery> #startmeeting
2 13:00 <jnewbery> Hi folks! Welcome to Bitcoin Core PR Review Club. Feel free to say hi to let everyone know you're at keyboard.
10 13:00 <jnewbery> As always, we'll have a host to guide discussion, but feel free to jump in at any time to ask questions.
12 13:00 <michaelfolkson> hi
17 13:01 <jnewbery> lightlike is hosting for us today. Thanks lightlike! We also have the PR author, LarryRuane here. Thanks for joining us, Larry
18 13:01 <jnewbery> ok, I'll pass over to lightlike
19 13:01 <lightlike> Ok so today's PR is about improving the performance of reindexing.
21 13:01 <lightlike> Did everyone get a chance to review the PR? ( y/n )
28 13:02 <willcl_ark> brief code review, but not tested
30 13:02 <michaelfolkson> ~.1 y
31 13:02 <ajonas> read comments and code but not tested
32 13:02 <lightlike> So for those who did get a chance to look at it, what is your opinion on the PR? Concept ACK, approach ACK, tested ACK, or NACK?
33 13:02 <emilengler> To summarize it up: It does a more low level job when reading the reindexing plus it only reads the block header instead fo the entire block
34 13:02 <emilengler> Correct me if I missed something
35 13:03 <emilengler> lightlike: I did a Concept ACK. Was going to test once my chain is fully synced but I reviewed the code
36 13:03 <willcl_ark> if implemented correctly, seems like a win with no downsides
37 13:03 <jnewbery> Concept ACK. Better is better
38 13:04 <fjahr> Concept ACK but i only skimmed the test so far and did not have time to test
39 13:04 <lightlike> emilengler: I wouldn't really say that the level really changed ( we still use CBufferedFile, which is pretty low-level).
40 13:04 <lightlike> But first to some more general questions:
41 13:04 <lightlike> In which situations is reindex useful? When is it better to use reindex-chainstate?
42 13:05 <instagibbs> reindex is when block index is fubar, right?
43 13:05 <instagibbs> block data is ok, but block index broken
44 13:05 <fjahr> both rebuild the utxo set, full reindex rebuilds the block index as well
46 13:06 <LarryRuane> yes or for example, isn't it needed if you enable `txindex`?
47 13:06 <willcl_ark> I think if you enable txindex (e.g. ElectrumX) then you _require_ a reindex also
48 13:06 <fjahr> so it depends on what you need
49 13:06 <elichai2> about note 2.2, is reindex really fully validating? why? and does it have the same logic as IBD?(ie assumevalid)
50 13:06 <lightlike> instagibbs, fjahr: yes.
51 13:06 <jnewbery> instagibbs: that's how I understand it. reindex-chainstate only rebuilds the UTXO set. reindex also rebuilds the block index
52 13:06 <LarryRuane> reindex is in a way similar to initial block sync except it can be done completely offline
53 13:06 <michaelfolkson> In which situations is reindex useful? Data corruption yes. Anything else?
54 13:06 <lightlike> elichai2: yes, I think it does, didn't think of that until later today.
55 13:07 <jnewbery> willcl_ark: that used to be true. Since jimpo refactored the indexing code, I think that building a txindex doesn't require a -reindex
56 13:07 <instagibbs> right, incremental indexing is a thing now AFAIU :)
57 13:08 <willcl_ark> jnewbery: that's good news! I recall doing it myself a while ago for that
58 13:08 <emilengler> michaelfolkson: I'm not sure but maybe for verification
59 13:08 <emilengler> For example if you transferred the data from a disk to another
61 13:08 <lightlike> I also think reindexing might possibly be needed if the format of the index db would ever be changed in the future between releases
62 13:08 <willcl_ark> although, as IBD is usually saturated at CPU, perhaps --reindex is not actually that useful (unless you are bandwidth/data constrained in some way for a db error)
64 13:09 <michaelfolkson> Yeah makes sense emilengler
65 13:09 <jnewbery> willcl_ark: PR 13033 I think
66 13:09 <fjahr> michaelfolkson: definitely when you were running pruned node
67 13:09 <fjahr> and now you switch to fully validating
69 13:10 <jnewbery> fjahr: not sure I understand. A pruned node _is_ fully validating
70 13:11 <michaelfolkson> Yeah that's good one fjahr. You switch to unpruned I assume you mean
72 13:11 <fjahr> yeah, sorry, i meant non pruned node %)
73 13:11 <lightlike> just noting that in some situations (only utxo db is broken), reindex-chainstate is sufficient.
74 13:11 <lightlike> ok, next q:
75 13:11 <lightlike> What does the block index database look like? What information is stored there?
76 13:12 <lightlike> it's quite a general question, just name some of the most important infos...
77 13:13 <Guest83> Indexed on header?
78 13:13 <willcl_ark> block hash, tx hashes, block file names
79 13:13 <michaelfolkson> Where block is stored on disk
80 13:14 <LarryRuane> i was just reading sipa's reply on the stackexchange (linked above).. "You should use -reindex only when you were running in pruning mode..." should that be non-pruning mode? to reindex requires that you have all 300+ gb of blocks, right?
82 13:14 <emilengler> Yes reindexing is not possible if pruning
83 13:15 <emilengler> I think it throws a runtime error then
84 13:15 <willcl_ark> LarryRuane: does --reindex init a full IBD if it's missing the data?
85 13:15 <LarryRuane> good question, i don't know
86 13:16 <fjahr> LarryRuane: if you were running it pruned and you switch to non pruning I think you have to start the node with -reindex if i remember correctly
87 13:16 <lightlike> so willcl_ark: I think that it would sync up the last block we can connect from disk, and then switch to IBD for the rest of the chain
88 13:17 <willcl_ark> that would seem logical
89 13:17 <jnewbery> I'm looking in init.cpp, and there appears to be logic for when using -reindex and -prune, so I guess it'll just redownload blocks from the network
91 13:18 <molly> once i tested reindex on a laptop a few years ago, it took a week to fully sync, so i wouldn't use reindex if i have a corrupt database, i would resync the node from scratch, it's faster
92 13:19 <willcl_ark> maybe not after this PR :)
93 13:19 <jnewbery> molly: I'd be surprised if that were true in general, even before this PR
94 13:19 <lightlike> willcl_ark: I think that is what happened to me once when I had a corrupt block file (and slow internet connection): I would reindex up until the corrupt block, and then download all blocks beyond that from peers
95 13:19 <molly> jnewbery, i haven't looked at this PR
96 13:20 <LarryRuane> for me it took slightly less time to reindex than IBD last time i checked, several months ago ... but also reindexing puts less traffic on the network on load on peers
97 13:21 <lightlike> ok, next question: Can you explain why this PR reduces the runtime of reindexing considerably?
98 13:21 <emilengler> LarryRuane: If you use the same hardware it jsut takes longer becasue of the growing size
99 13:21 <LarryRuane> (do you want me to answer? :) )
100 13:21 <nehan_> it avoids deserializing blocks that will need to be deserialized later
101 13:22 <emilengler> AFAIK because it reads the header instead of the entire block
102 13:22 <lightlike> LarryRuane: maybe give others a chance first, but feel free to add your view on anything :-)
103 13:22 <theme> reduced read operations?
104 13:23 <jnewbery> it allows us to seek through the block file for headers rather than reading everything
105 13:23 <LarryRuane> nehan_ that's correct, a lot of CPU is spent deserializing blocks unnecessarily (throwing that work away)
106 13:23 <cprkrn> Yup. Doesn't require reading all of the data
107 13:23 <lightlike> emilengler: yes, but just at the first encounter, because based on the header we can decide if wa want to deserialize the block already now (or need it only later)
108 13:23 <raj_> by checking into memory for a chilld block instead of reading it from the disk when the parent is found.
109 13:24 <LarryRuane> so... i made a comment on the PR yesterday that explains in more detail but it doesn't actually reduce reads from disk (either num of reads or length of reads), but it only saves CPU time spent deserializing
111 13:25 <LarryRuane> lightlike yes that is correct
113 13:26 <lightlike> yes, I was not precise there in my notes: The reading from disk, in my understanding, is done in the Fill() method, which happens also if we skip the block
114 13:26 <willcl_ark> jnewbery: so thats how #13033 did it
115 13:27 <lightlike> jnewbery: do you know if a good documentation exists somewhere that is up to date?
116 13:27 <MarcoFalke> Ideally the documentation about Bitcoin Core should be in the source code :)
117 13:28 <jnewbery> lightlike: I do not. Someone should suggest edits to sipa's SE answer
118 13:28 <MarcoFalke> So someone should copy-past the useful parts into our code base
120 13:30 <lightlike> I think it is also helpful to mention the observation by Larry in the PR, that this change is only an improvement because IBD typically saves block out of order. If everything was in order, this change would make things (slightly) slower.
121 13:32 <nehan_> LarryRuane: I added a comment. You are now holding cs_main while doing a disk read (i think) in Skip(). might that affect performance?
122 13:32 <raj_> just one tangent question. While reindexing, is transaction validation process again repeated?
123 13:32 <LarryRuane> yes because when we encounter a block and its parent has already been seen, it backs up (in the memory buffer) by 80 bytes (header) and deserializes the entire block (incuding header, again) .. so header is deserialized twice .. but since it's only 80 bytes probably not much impact
124 13:33 <michaelfolkson> That's just used for release notes jnewbery? achow101 said that got vandalized recently as no merge restrictions
125 13:33 <LarryRuane> raj_ i believe yes, similar to IBD, all checks are performed, only difference is blocks come from disk instead of peers
126 13:34 <lightlike> raj_: although as elichai2 noted earlier, with the same restrictions as in IBD (afaik not every signature of very old blocks is validated, but I don't know the details there)
127 13:36 <lightlike> ok, next q: How about test coverage of the reindexing (functional and unit tests)? Do you think it can be improved?
128 13:36 <LarryRuane> nehan_ thank you, i'll reply there, but today, where of course there is no Skip(), instead the deserialization (`blkdat >> block;`) can trigger the same disk read, so there is no difference
129 13:37 <nehan_> LarryRuane: I think that's outside the cs_main lock though
130 13:37 <jnewbery> raj_: yes, exactly the same as for IBD. You can imagine we're treating the blk files as untrusted and revalidating everything again. assumevalid means that we don't check scripts/signatures before a certain height
131 13:39 <LarryRuane> nehan_ you're right! good catch, i'll investigate how to improve that (or try to see if it may be acceptable)
132 13:40 <willcl_ark> jnewbery: which also might give us another --reindex use-case: recieving an offline copy of the block files from a friend (or torrent download?) which you might want to reindex before trusting
133 13:40 <nehan_> LarryRuane: I'm having a lot of trouble figuring out how nRewind is updated given that it moved around. I'm still trying to get my head around CBufferedFile semantics...
134 13:41 <LarryRuane> _nehan_ this lock can be dropped before calling Skip() .. Is there a way to drop a LOCK() besides it going out of scope? i don't think i've ever seen that
136 13:41 <jnewbery> nehan_: good catch! In practice, I don't think it matters too much. If we're doing reindex then the critical path is single-threaded in this thread (the ThreadImport thread)
137 13:41 <raj_> jnewbery: Thanks. So how much is actuallly the difference between a fresh IBD and reindexing in terms of time?
138 13:41 <jnewbery> definitely worth benching performance though
139 13:43 <LarryRuane> nehan_ yes that rewind stuff is somewhat obtuse... basic idea of rewind is it marks a point in the stream that we can reposition to if deserialization throws
140 13:43 <raj_> nehan_: I cant seem to find where cs_main is held in read(). :(
141 13:43 <jnewbery> raj_: depends on bandwidth/quality of peers/disk access speed/CPU
142 13:43 <lightlike> raj_: that would depend a lot on the speed of your internet connection (and that of your peers).
144 13:44 <michaelfolkson> Re tests. Any additional tests would be separate PR not related to this specific performance improvement?
145 13:45 <raj_> ok makes sense. So in % term what kind of reduction we are talking about here?
146 13:45 <lightlike> michaelfolkson: yes, plus this PR adds unit tests for the new Skip() functionality.
147 13:45 <raj_> Oh thanks nehan_
148 13:45 <jnewbery> in general, you'd expect reading files from disk to be faster than downloading that data from peers, but exact quantitative difference depends on those factors
149 13:45 <nehan_> LarryRuane: oh so that's another thing, read() could throw while holding the lock. does that matter?
150 13:45 <lightlike> but in general I think that while the unit tests of CBufferedFile are really great, the validation code is tested quite poorly in the functional tests:
151 13:46 <LarryRuane> just for those who may not be that familiar with this code ... (this wasn't at all clear to me initially) ... deserialization (`blkdat >> ....`) can throw an exception, and then we magically end up the catch and continue that top-level loop
152 13:46 <LarryRuane> nehan_ nope, the compiler (semantics of LOCK) deal with that correctly, one aspect of its magic
153 13:46 <nehan_> LarryRuane: cool thanks
154 13:47 <lightlike> feature_reindex.py exists but seems quite rudimentary.
155 13:48 <lightlike> Last question: Can you think of further optimizations of reindexing beyond this PR that are possible/worthwhile?
156 13:48 <jnewbery> lightlike: I agree that feature_reindex.py is rudimentary. What kind of additional tests would you like to see?
157 13:49 <LarryRuane> lightlike yes it's tiny, tbh i didn't even really look at it while doing this PR .. yes, my question too, should it do more?
158 13:49 <LarryRuane> i could look into that
159 13:49 <lightlike> LarryRuane: That was in no way meant a criticism - it was like that for years, and you added lots of units tests :-)
160 13:50 <lightlike> jnewbery: I think having multiple block files, and also full blocks with blocks being serialized out of order, would be nice.
161 13:50 <jnewbery> LarryRuane: also, you don't change functionality, so there shouldn't need to be changes to functional tests!
162 13:50 <lightlike> *full block files (MB)
163 13:50 <LarryRuane> the testing for this PR isn't really great, it does have a nice test for the new Skip method, but there really aren't any (new) tests for the changes to validation.cpp .... oh no, not taken that way!
164 13:50 <jnewbery> but improving functional test coverage in a separate PR seems like it'd be worthwhile
165 13:51 <LarryRuane> yes i agree, i could take that on if people like that idea ... since i have learned a little about this area already
166 13:52 <jonatack> +1 on that LarryRuane
167 13:52 <jnewbery> lightlike: to answer your question about further optimizations, it seems like a lot of the performance degredation is due to blocks being out of order in the blk files. Maybe this is too wacky, but could bitcoind sort those blocks in the background, so that on reindex we're less likely to have to skip forwards and backwards in those files?
168 13:53 <LarryRuane> ok i would love to ... what's the convention, make a ticket? or just a PR okay?
169 13:54 <docallag> How often would you expect to reindex?
170 13:54 <lightlike> jnewbery: do you mean on the fly during IBD, or some kind of reorganization script that can be run on demand?
171 13:54 <jnewbery> docallag: basically never
172 13:55 <LarryRuane> that is an interesting idea jnewbery.... i like it .. i suggested a different way to improve this at the end of the last comment i made on the PR (yesterday), maybe take a look at that too .. but my suggestion would create more files in `blocks/`
173 13:55 <jonatack> LarryRuane: maybe ask for input on #bitcoin-core-dev, could add it as a weekly meeting topic
174 13:55 <jnewbery> lightlike: I was thinking in the background after IBD, but could also be on demand
175 13:55 <michaelfolkson> Why do the blocks go out of order in the first place?
176 13:56 <LarryRuane> headers-first IBD
177 13:56 <jnewbery> I don't necessarily think it's worth it, since it would be changing mainline behavior to optimize for something we expect never to do
178 13:56 <LarryRuane> @jnew
180 13:56 <LarryRuane> jnewbery good point
181 13:57 <fjahr> jnewbery: I was just going to ask, such on the fly optimization might be hard to get in if it slows down iBD even slightly?
182 13:57 <nehan_> also crazy on the fly optimizations often increase bug surface area :)
183 13:57 <jnewbery> fjahr: right, if such an idea was considered, it should be in the background after IBD
184 13:57 <lightlike> yes, it seems a bit pessimistic to think too much about reindexing during IBD (plus, It's just an hour)
185 13:57 <fjahr> ah, I see, you wrote after ibd earlier
186 13:57 <jnewbery> nehan_: +1
187 13:58 <lightlike> ok, we are almost at the end of the hour. Any questions left?
188 13:59 <jnewbery> I don't know the serving-blocks-to-other-peers-doing-IBD logic well enough to know whether having blocks serialized on disk in order would be an improvement there too
189 13:59 <docallag> How would you know you needed to reindex?
191 14:00 <LarryRuane> jnewbery i think *probably* not, because i think the indices record the exact start (and length) of each block (to serve to peers during IBD)
192 14:00 <docallag> Sorry I meant would Core fall over and then you'd know to reindex?
193 14:00 <lightlike> let's wrap it up, thanks all for participating!
194 14:00 <michaelfolkson> Use cases covered earlier in the meeting docollag if that's your question. Basically something goes wrong or you want to do something you can't do
195 14:00 <LarryRuane> thank you everyone! been great
196 14:00 <emilengler> thanks for hosting :)
197 14:01 <nehan_> thank you!
198 14:01 <emilengler> and thanks to LarryRuane for the PR
199 14:01 <jnewbery> Great meeting. Thanks for hosting lightlike, and thanks for joining us LarryRuane!
200 14:01 <lightlike> and thanks LarryRuane for answering questions
201 14:01 <willcl_ark> thanks lightlike + LarryRuane
202 14:01 <emilengler> And also thanks to anyone else for his/her feedback
203 14:01 <emzy> thank you!
205 14:01 <michaelfolkson> Thanks all. And congrats jonatack :)
206 14:02 <jnewbery> LarryRuane: I think you're right, but it seems possible that we could be more efficient in serving blocks if they were serialized in order
207 14:02 <LarryRuane> ah i see, perhaps less seeking
208 14:02 <jonatack> Thanks lightlike, LarryRuane, jnewbery and everyone! (Thank you, michaelfolkson)
209 14:02 <jnewbery> like we could have a new P2P message to serve a range of blocks(?)
210 14:03 <jnewbery> sorry - probably way off topic!
211 14:03 <LarryRuane> makes sense
212 14:14 <emilengler> Is this meeting over now? Maybe it is time to do hashtag end meeting
213 14:16 <lightlike> right
214 14:16 <lightlike> #endmeeting
215 14:16 <lightlike> (though I'm not sure if there are any bots here that would listen to it) Meeting Log – Asia time zone
217 05:01 <kallewoof> #startmeeting
219 05:01 <meshcollider> hi :)
221 05:01 <kallewoof> If you're here, say hi so we know
222 05:01 <coinsureNZ> yep thats the one meshcollider , albiet expat now
223 05:01 <jnewbery> it's 11pm here so I'm not going to stick around for too long. I just wanted to be here at the start of the first one :)
224 05:01 <anditto> hi! ^_^
225 05:01 <kallewoof> or to quote our illustrious leader, < jnewbery> Hi folks! Welcome to Bitcoin Core PR Review Club. Feel free to say hi to let everyone know you're at keyboard.
226 05:02 <kallewoof> jnewbery: thanks for sticking around :)
228 05:02 <meshcollider> Night John
229 05:02 <kallewoof> I'm going to guide myself through the other log so forgive some amount of copy-pasting.
231 05:03 <kallewoof> Did everyone get a chance to review the PR? ( y/n )
234 05:03 <jnewbery> kallewoof: I'll post meeting logs sooner in future. I had to run out straight after the meeting today
236 05:03 <coinsureNZ> not the code- just the gist of the nodes
237 05:03 <fanquake> 1/2 y
238 05:03 <coinsureNZ> *notes
239 05:04 <kallewoof> jnewbery: No worries. I am still wondering if it's better to NOT look at logs to reduce bias, but we'll use it this time as I've never done one of these before
240 05:04 <kallewoof> What are people's general opinion on the PR?
241 05:05 <kallewoof> Concept/code/approach/etc.
242 05:06 <fanquake> Concept ACK improving performance. +1 new test. I can't actually bench it atm though.
243 05:06 <fanquake> *test code
244 05:07 <fanquake> I don't think I've actually run a --reindex in quite a while.
245 05:08 <jnewbery> goodnight all. Have fun!
246 05:08 <kallewoof> fanquake: why can't you bench it?
247 05:08 <kallewoof> jnewbery: night night
248 05:08 <RubenSomsen> hey guys, and goodnight John :)
249 05:09 <kallewoof> hi Ruben :)
250 05:09 <fanquake> I nuked my datadir, and need to redownload block data
251 05:09 <aj> i wonder how the performance changes depending on the layout of the blocks. i think you get different layouts depending on the network speed of the peers you do IBD from -- if you've got 1 peer that gives you a block per second (~10Mbps), and another that gives you a block per minute (~133kbps), then I think you'll tend to have sets of 60 out of order blocks (from the fast peer) when you're
252 05:09 <aj> expecting a block from the slow peer
253 05:10 <kallewoof> aj: does this PR impact that?
254 05:11 <aj> i think the PR's performance improvement should rely on the "60 out of order blocks" size matching the choice of memory size constant?
255 05:13 <kallewoof> aj: I don't think I follow. Can you point out file/line no?
256 05:13 <aj> no not really, maybe i'm way off
257 05:13 <kallewoof> i don't know :)
258 05:14 <kallewoof> to summarize what the PR does, IIUC, is to instead of reading every block one at a time, it reads only the block header, then skips over the block if it's not the desired one
259 05:15 <kcalvinalvin> Does it keep that skipped block in memory?
262 05:17 <meshcollider> aj: that's an interesting thought anyway
263 05:17 <kallewoof> aj: ahh, okay
265 05:18 <kcalvinalvin> Little bit of a basic question but what data does the txindex=1 save vs when you do txindex=0?
266 05:19 <kallewoof> aj: I didn't realize there was a 'keep in memory' component. I only looked at this PR yesterday.
267 05:19 <aj> kallewoof: well there's not any more, so you're right!
268 05:19 <kallewoof> kcalvinalvin: I think it stores the block hash for each transaction only. Would have to check
269 05:21 <kallewoof> Any other general opinions on the concept / idea of this code change? Or if someone reviewed it and have questions about the code or such, we can go through that too.
270 05:23 <aj> kcalvinalvin: txindex is run as a separate process, that triggers off new blocks being accepted, and stores the index data in a separate ldb in .bitcoin/indexes/txindex
271 05:23 <aj> separate thread, not process i suppose
272 05:24 <kcalvinalvin> Is reindex also a separate thread?
273 05:25 <aj> reindex is updating the chain state which is the most important thing, so it's kind of the main thread?
274 05:26 <fanquake> and it's also using the loadblk thread
275 05:29 <kallewoof> I'm gonna steal a question from the IRC log, as I was actually unsure about this one myself: In which situations is reindex useful? When is it better to use reindex-chainstate?
277 05:37 <kallewoof> so reindex recreates the index pointing out where in the blk files the blocks are located
278 05:37 <kallewoof> and also does what reindex-chainstate does.
279 05:39 <kallewoof> aj: seems you also do it when you enable txindex
280 05:42 <kallewoof> The notes say "Reindex uses the CBufferedFile stream, introduced in #1962, which has a buffer to allow “rewinding” the stream position to an earlier position without additional disk access." -- does that mean it keeps all the read data from the blk file in memory until it's destroyed?
281 05:45 <aj> i think it's a fixed size buffer, size declared as second arg to constructor, so 8MB currently
282 05:47 <aj> kallewoof: i'm pretty sure the new txindex code lets you enable txindex *without* doing a reindex? PR#13033
283 05:48 <kallewoof> ohh, didn't know that. nice.
284 05:51 <kallewoof> Does anyone have any other questions about the PR? I think these meetings are partially meant to mentor people through the review process, so don't hesitate to speak up even if you feel unsure of yourself.
285 05:51 <kallewoof> Me and AJ can keep going for days unless somebody stops us. :P
286 05:53 <fanquake> I'd be interested if someone on Linux could benchmark this with and without the thread prioritization from SCHED_BATCH, to see how much difference it actually makes
287 05:54 <fanquake> (unrelated to the changeset though)
288 05:54 <fanquake> macOS doesn't support SCHED_BATCH so I can't check easily.
289 05:54 <kallewoof> fanquake: I have a linux full node synced up. How do I toggle SCHED_BATCH, though?
290 05:54 <fanquake> Easiest way is probably just to comment out the call to ScheduleBatchPriority()
292 05:55 <kallewoof> Ahh, gotcha
293 05:56 <kallewoof> So if I get this right, the entire reindex part is in init.cpp's ThreadImport function. That while loop runs until it gets to the end and then it continues on with the rest.
294 05:59 <kallewoof> Yeah that seems to be the case. The author said he put an assert(0) after the "Reindexing finished" and used time to benchmark it. Guess I'll do that, with and without the ScheduleBatchPriority part.
295 06:02 <kallewoof> Anyway, unless people have more questions or opinions on the PR, I think we can end the meeting.
296 06:02 <meshcollider> Yeah it all seems sane to me and I don't have any specific questions, I'd like to test it tomorrow when I get back to uni though
297 06:03 <meshcollider> That seems the most useful thing to do
299 06:03 <anditto> I don't remember last time I used -reindex & never thought about it, so for this particular PR I'm trying to learn as much as I can. It's also nice to have this meeting in an accessible timezone. Thanks everyone.
300 06:03 <kallewoof> meshcollider: cool, yeah i'm going to test it on my linux machine with/without the sched batch tweak
301 06:04 <kallewoof> I'm up for talking about more general bitcoin core related things, but maybe we need to switch to another channel (#bitcoin-core-dev?)
302 06:04 <meshcollider> Thanks for hosting btw kallewoof :)
304 06:05 <fanquake> Thanks kallewoof
305 06:05 <kallewoof> my pleasure! I think I'm deviating a bit from the original concept, but I'll read up and do better in the future, if we do these again.
306 06:05 <anditto> Thanks kallewoof.
307 06:05 <akionak> Thanks kallewoof
308 06:06 <kallewoof> I know people were talkign about doing this & a more general bitcoin core related meeting (like the one that is happening tonight in #bitcoin-core-dev). Since we're a smaller crowd it may make sense to combine the two into one.
309 06:10 <coinsureNZ> thanks guys, didnt have much to add just eager to sit in on these discussions for a while and understand how PRs are evaluated. Its been informative and I appreciate being able to added in a workable time slot
310 06:10 <RubenSomsen> Thanks kalle :)
311 06:10 <kallewoof> coinsureNZ: Thanks for joining!
312 06:11 <kallewoof> #endmeeting