The PR branch HEAD was a50f956e at the time of this review club meeting.
NOTE! Earlier versions of this PR had a different scope from the current
branch. The discussion prior to this
comment
by the author is therefore no longer relevant for the PR in its current form.
Notes
The -reindex argument wipes and rebuilds the two leveldb databases for the
block index and the chainstate, using the block files saved on disk. This
procedure is completely local and requires no interactions with the p2p
network. For help on the argument see bitcoind -help | grep -A1 reindex.
Reindexing is a lengthy procedure taking several hours on mainnet and
consists of two steps:
Rebuilding the index of blocks based on the blk*.dat files saved in
.bitcoin/blocks.
Rebuilding the chainstate (UTXO set) by fully validating each block
starting from genesis using the block index created in step 1.
There is a second command, reindex-chainstate that will only perform step 2.
This PR improves the runtime of step 1 and does not affect step 2 (which
is already highly optimized since it uses the same validation functions that
are used for connecting new blocks received from the network).
Reindex uses the CBufferedFile stream, introduced in
#1962, which has a buffer to
allow “rewinding” the stream position to an earlier position without additional
disk access. In the merged PR
#16577, the author of this
week’s PR fixed a bug in CBufferedFile and added comprehensive unit tests.
The reindexing happens in LoadExternalBlockFile() (in validation.cpp), which is
called for each block file. The block index is rebuilt by calling
AcceptBlock() for each block in the chain in correct order.
However, blocks are usually not saved to disk in the correct order during IBD
(listen to Pieter Wuille’s recent Chaincode
podcast for
more background, starting at 3:30). Therefore, the parent hash of blocks that
are encountered without a known parent is saved in the map
mapBlocksUnknownParent. After accepting a block, the reindex algorithm
recursively finds all blocks from mapBlocksUnknownParent which are ready for
processing now, tries to accept them and removes them from the map.
Before this PR, we would always read entire blocks from disk into our buffer.
If the predecessor of a block was not available, the blocks would have to be
read again at a later point. This PR changes behavior such that initially only
the 80 byte block header is read into the buffer, and if we can’t accept the
block yet, the rest of the block is skipped for now.
<emilengler> To summarize it up: It does a more low level job when reading the reindexing plus it only reads the block header instead fo the entire block
<willcl_ark> although, as IBD is usually saturated at CPU, perhaps --reindex is not actually that useful (unless you are bandwidth/data constrained in some way for a db error)
<LarryRuane> i was just reading sipa's reply on the stackexchange (linked above).. "You should use -reindex only when you were running in pruning mode..." should that be non-pruning mode? to reindex requires that you have all 300+ gb of blocks, right?
<fjahr> LarryRuane: if you were running it pruned and you switch to non pruning I think you have to start the node with -reindex if i remember correctly
<jnewbery> I'm looking in init.cpp, and there appears to be logic for when using -reindex and -prune, so I guess it'll just redownload blocks from the network
<molly> once i tested reindex on a laptop a few years ago, it took a week to fully sync, so i wouldn't use reindex if i have a corrupt database, i would resync the node from scratch, it's faster
<lightlike> willcl_ark: I think that is what happened to me once when I had a corrupt block file (and slow internet connection): I would reindex up until the corrupt block, and then download all blocks beyond that from peers
<LarryRuane> for me it took slightly less time to reindex than IBD last time i checked, several months ago ... but also reindexing puts less traffic on the network on load on peers
<lightlike> emilengler: yes, but just at the first encounter, because based on the header we can decide if wa want to deserialize the block already now (or need it only later)
<LarryRuane> so... i made a comment on the PR yesterday that explains in more detail but it doesn't actually reduce reads from disk (either num of reads or length of reads), but it only saves CPU time spent deserializing
<lightlike> yes, I was not precise there in my notes: The reading from disk, in my understanding, is done in the Fill() method, which happens also if we skip the block
<lightlike> I think it is also helpful to mention the observation by Larry in the PR, that this change is only an improvement because IBD typically saves block out of order. If everything was in order, this change would make things (slightly) slower.
<LarryRuane> yes because when we encounter a block and its parent has already been seen, it backs up (in the memory buffer) by 80 bytes (header) and deserializes the entire block (incuding header, again) .. so header is deserialized twice .. but since it's only 80 bytes probably not much impact
<lightlike> raj_: although as elichai2 noted earlier, with the same restrictions as in IBD (afaik not every signature of very old blocks is validated, but I don't know the details there)
<LarryRuane> nehan_ thank you, i'll reply there, but today, where of course there is no Skip(), instead the deserialization (`blkdat >> block;`) can trigger the same disk read, so there is no difference
<jnewbery> raj_: yes, exactly the same as for IBD. You can imagine we're treating the blk files as untrusted and revalidating everything again. assumevalid means that we don't check scripts/signatures before a certain height
<willcl_ark> jnewbery: which also might give us another --reindex use-case: recieving an offline copy of the block files from a friend (or torrent download?) which you might want to reindex before trusting
<nehan_> LarryRuane: I'm having a lot of trouble figuring out how nRewind is updated given that it moved around. I'm still trying to get my head around CBufferedFile semantics...
<LarryRuane> _nehan_ this lock can be dropped before calling Skip() .. Is there a way to drop a LOCK() besides it going out of scope? i don't think i've ever seen that
<jnewbery> nehan_: good catch! In practice, I don't think it matters too much. If we're doing reindex then the critical path is single-threaded in this thread (the ThreadImport thread)
<LarryRuane> nehan_ yes that rewind stuff is somewhat obtuse... basic idea of rewind is it marks a point in the stream that we can reposition to if deserialization throws
<jnewbery> in general, you'd expect reading files from disk to be faster than downloading that data from peers, but exact quantitative difference depends on those factors
<lightlike> but in general I think that while the unit tests of CBufferedFile are really great, the validation code is tested quite poorly in the functional tests:
<LarryRuane> just for those who may not be that familiar with this code ... (this wasn't at all clear to me initially) ... deserialization (`blkdat >> ....`) can throw an exception, and then we magically end up the catch and continue that top-level loop
<LarryRuane> the testing for this PR isn't really great, it does have a nice test for the new Skip method, but there really aren't any (new) tests for the changes to validation.cpp .... oh no, not taken that way!
<jnewbery> lightlike: to answer your question about further optimizations, it seems like a lot of the performance degredation is due to blocks being out of order in the blk files. Maybe this is too wacky, but could bitcoind sort those blocks in the background, so that on reindex we're less likely to have to skip forwards and backwards in those files?
<LarryRuane> that is an interesting idea jnewbery.... i like it .. i suggested a different way to improve this at the end of the last comment i made on the PR (yesterday), maybe take a look at that too .. but my suggestion would create more files in `blocks/`
<jnewbery> I don't know the serving-blocks-to-other-peers-doing-IBD logic well enough to know whether having blocks serialized on disk in order would be an improvement there too
<LarryRuane> jnewbery i think *probably* not, because i think the indices record the exact start (and length) of each block (to serve to peers during IBD)
<michaelfolkson> Use cases covered earlier in the meeting docollag if that's your question. Basically something goes wrong or you want to do something you can't do
<kallewoof> or to quote our illustrious leader, < jnewbery> Hi folks! Welcome to Bitcoin Core PR Review Club. Feel free to say hi to let everyone know you're at keyboard.
<kallewoof> jnewbery: No worries. I am still wondering if it's better to NOT look at logs to reduce bias, but we'll use it this time as I've never done one of these before
<aj> i wonder how the performance changes depending on the layout of the blocks. i think you get different layouts depending on the network speed of the peers you do IBD from -- if you've got 1 peer that gives you a block per second (~10Mbps), and another that gives you a block per minute (~133kbps), then I think you'll tend to have sets of 60 out of order blocks (from the fast peer) when you're
<kallewoof> to summarize what the PR does, IIUC, is to instead of reading every block one at a time, it reads only the block header, then skips over the block if it's not the desired one
<kallewoof> Any other general opinions on the concept / idea of this code change? Or if someone reviewed it and have questions about the code or such, we can go through that too.
<aj> kcalvinalvin: txindex is run as a separate process, that triggers off new blocks being accepted, and stores the index data in a separate ldb in .bitcoin/indexes/txindex
<kallewoof> I'm gonna steal a question from the IRC log, as I was actually unsure about this one myself: In which situations is reindex useful? When is it better to use reindex-chainstate?
<kallewoof> The notes say "Reindex uses the CBufferedFile stream, introduced in #1962, which has a buffer to allow “rewinding” the stream position to an earlier position without additional disk access." -- does that mean it keeps all the read data from the blk file in memory until it's destroyed?
<kallewoof> Does anyone have any other questions about the PR? I think these meetings are partially meant to mentor people through the review process, so don't hesitate to speak up even if you feel unsure of yourself.
<fanquake> I'd be interested if someone on Linux could benchmark this with and without the thread prioritization from SCHED_BATCH, to see how much difference it actually makes
<kallewoof> So if I get this right, the entire reindex part is in init.cpp's ThreadImport function. That while loop runs until it gets to the end and then it continues on with the rest.
<kallewoof> Yeah that seems to be the case. The author said he put an assert(0) after the "Reindexing finished" and used time to benchmark it. Guess I'll do that, with and without the ScheduleBatchPriority part.
<anditto> I don't remember last time I used -reindex & never thought about it, so for this particular PR I'm trying to learn as much as I can. It's also nice to have this meeting in an accessible timezone. Thanks everyone.
<kallewoof> I know people were talkign about doing this & a more general bitcoin core related meeting (like the one that is happening tonight in #bitcoin-core-dev). Since we're a smaller crowd it may make sense to combine the two into one.
<coinsureNZ> thanks guys, didnt have much to add just eager to sit in on these discussions for a while and understand how PRs are evaluated. Its been informative and I appreciate being able to added in a workable time slot