incorrect blk file size calculation during reindex results in recoverable blk file corruption (block storage)

https://github.com/bitcoin/bitcoin/pull/24858

Host: LarryRuane  -  PR author: mruddy

The PR branch HEAD was bcb0cac at the time of this review club meeting.

Notes

  • During Initial block download (IBD), a new node automatically initializes and populates its data directory by fetching blocks from peers, validating them, and storing them in the blocks directory within the data directory (default $HOME/.bitcoin).

  • The blocks are stored in files named blknnnnn.dat (for example, blk01234.dat). These files are limited to 128 MiB, so each can hold about 60 blocks or more depending on their sizes.

  • The blknnnnn.dat files are in a custom format: a sequence of blocks, each preceded by a 4-byte “marker” or “magic number” and a 4-byte integer indicating the block’s size or length in bytes. The blocks need not be in height-order, either within a block file, or across block files. For example, block 2000 could be stored in blk00010.dat while block 1500 could be stored in blk00011.dat.

  • In order to save disk space, the node operator can enable pruning using -prune, so the node retains only the most recent few hundred blocks on disk, although IBD still downloads and verifies all blocks since the beginning.

  • During IBD, besides storing the raw blocks, several kinds of state are derived from the blocks and also stored the data directory as entries in LevelDB. The two most prominent are the block index and the chainstate (UTXO set). The block index, contains an entry for every block (including pruned ones) and indexes the locations (file and offset within the file) of unpruned raw blocks.

  • If corruption is suspected in the derived indices (the block index or chainstate), the user has the option of starting over with an empty data directory and performing IBD again, but that’s slow and uses a lot of network bandwidth. If the node isn’t pruned, an alternative is to start the node with the -reindex option. This will use the existing blocks files to rebuild all the derived state. This is simliar to IBD but obtains blocks from local files instead of network peers.

  • PR #24858 fixes a long-standing bug that can cause a form of mild corruption in the way blocks are stored within the blocks files following a reindex.

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK?

  2. Which parts of the bitcoind data directory are not derived from other parts of the data directory? What are some examples of parts that are?

  3. Why are blocks in the block files disordered?

  4. What is CBlockFileInfo used for?

  5. What is reindexing, and how does it differ from IBD (initial block download)?

  6. How does pruning interact with reindexing?

  7. What is the format of the blknnnnn.dat files? Are these files portable across CPU architectures (big-endian, little-endian)?

  8. What happens if a blocks file becomes corrupted? Partial hint: See this call to FindByte

  9. Each network type (mainnet, testnet, signet, regtest) has its own “magic” bytes, for example, mainnet). Where else are these bytes used? Why do they differ across network types?

  10. What’s the purpose of the loop in FindBlockPos()?

  11. What bug does this PR purport to fix? Do you think it fixes it?

  12. Did you reproduce the problem that this PR fixes?

  13. A review comment suggested a slightly different way to fix the bug. Explain the alternate approach. How does it compare?

  14. (Bonus question) The definition of BLOCK_SERIALIZATION_HEADER_SIZE must be the same across platforms (so that the blocks files are portable). Why is it okay to assume an int is 4 bytes? Couldn’t it be different on some platforms?

Meeting Log

  117:00 <larryruane> #startmeeting
  217:00 <josie[m]> hi
  317:00 <michaelfolkson> hi
  417:00 <larryruane> Hi everyone! today we'll review PR 24858, notes are here: https://bitcoincore.reviews/24858
  517:01 <Amirreza> Hi
  617:01 <BlueMoon> Hello!
  717:02 <hernanmarino_> Hello everyone
  817:02 <pablo_martin> Hello!
  917:02 <Lov3r_Of_Bitcoin> hello
 1017:02 <rewe> Hi I'm new here
 1117:03 <brunoerg> hi!!
 1217:03 <brunoerg> rewe: welcome!
 1317:03 <larryruane> Is anyone here for the first time? Please feel free to say hi! Welcome, rewe !
 1417:03 <pablo_martin> yes, first time here... hi all!
 1517:04 <hernanmarino_> Welcome Pablo !
 1617:04 <glozow> hi
 1717:04 <rebroad> first time here also
 1817:04 <adam2k> đź‘‹ hello
 1917:04 <larryruane> Welcome, Pablo! And rebroad!
 2017:05 <rebroad> there's a PR I wanted to ask about - to try to find out why it's not being given any attention
 2117:05 <larryruane> I'm having network trouble, Gloria, could you be host for a minute or 2?
 2217:06 <glozow> sure
 2317:06 <glozow> Did you all get a chance to review the PR or look at the notes? how about a y/n
 2417:06 <BlueMoon> I read a bit, it's interesting.
 2517:06 <brunoerg> y for the notes
 2617:07 <pablo_martin> yes glozow
 2717:07 <schmidty_> hola
 2817:07 <larryruane> glozow: (i'm back, thank you!)
 2917:07 <josie[m]> y - went through the PR and also read the notes
 3017:07 <glozow> Great! could any of you summarize what this PR is doing?
 3117:07 <hernanmarino_> notes n , tested the fix y
 3217:08 <adam2k> y
 3317:09 <michaelfolkson> Did you reproduce the issue hernanmarino_?
 3417:09 <hernanmarino_> michaelfolkson: yes
 3517:09 <michaelfolkson> Cool
 3617:09 <josie[m]> this PR is fixes a bug where extra bytes are added to a serialized during reindexing. the extra bytes cause an error to be printed to the log when it is later de-serialized
 3717:09 <BlueMoon> It deals with the files where the block information is stored, how the size calculation is incorrect.
 3817:10 <adam2k> There is a deserialization error in the log file that appears when doing a reindexing of the blknnnn.dat files.  It's not fatal, but it's confusing and possibly alarming for people that see the issue in their logs.
 3917:10 <josie[m]> s/serialized/serialized block/
 4017:10 <larryruane> josie[m]: adam2k: BlueMoon: yes!
 4117:11 <BlueMoon> :)
 4217:11 <larryruane> Before we get into the questions, does anyone have any questions about the Notes? Anything unclear (or wrong)?
 4317:12 <michaelfolkson> Maybe this should be asked later but you can explain "A reindex, if it occurs, always happens immediately after the node starts up, before any blocks have been added from peers."?
 4417:12 <josie[m]> one thing that was a little unclear to me: the first few questions in the notes didn't seem directly related to the PR (tho helpful background knowledge to have)
 4517:12 <michaelfolkson> Why?
 4617:14 <larryruane> michaelfolkson: Reindex is requested by the user (node operator) as a configuration option (command line or in the config file, tho you probably would never put it in the file, or else it would reindex on every startup!),
 4717:15 <larryruane> and if specified (`-reindex` or `-reindex=1`), it will happen when the node first starts up ... after that process completes (which takes hours usually), then the node syncs with its peers, and you'll add more blocks as usual
 4817:16 <larryruane> josie[m]: yes, that's my fault, I thought it would be helpful to make sure people had the background needed before jumping into this particular PR
 4917:16 <Amirreza> A question. Notes say that blk???.dat files are 128 MiG but store 60 or more blocks. Why 60? I imagined the number is much higher while block size are at most 1 MiG.
 5017:17 <larryruane> Amirreza: well actually, since the segwit upgrade, blocks are almost always larger than 1 MiB .. they're often close to 2
 5117:17 <hernanmarino_> amirezza : that 's not the case really
 5217:18 <BlueMoon> I find this information on IBD very interesting. https://btctranscripts.com/andreas-antonopoulos/2018-10-23-andreas-antonopoulos-initial-blockchain-download/
 5317:18 <michaelfolkson> Yeah the SegWit block size -> block weight https://jimmysong.medium.com/understanding-segwit-block-size-fd901b87c9d4
 5417:18 <Amirreza> larryruane: Thanks, I should read more about segwit.
 5517:20 <larryruane> Thanks, BlueMoon: and michaelfolkson: - very useful links! Okay, here's question 2 (but feel free to bring up what we've already covered): Which parts of the bitcoind data directory are not derived from other parts of the data directory? What are some examples of parts that are?
 5617:20 <larryruane> (I should have said, "... that are part of the data directory")
 5717:22 <michaelfolkson> There are things not related to this PR. Like banlist.dat and peers.dat
 5817:23 <michaelfolkson> peers.dat must be updated from banlist.dat (is that what you mean by "derived")?
 5917:24 <hernanmarino_> Regarding this PR , the index is derived from the blocks
 6017:25 <larryruane> michaelfolkson: that's really good, I actually hadn't thought of P2P! I was thinking more that the `blocks` directory is not derived from other stuff in the datadir, but like ... yes, as hernanmarino_: said, exactly
 6117:25 <larryruane> the block index is derived from the blocks files ... also the chainstate, which is the UTXO set
 6217:28 <larryruane> Things in the data directory that are derived are, in a way, to make performance reasonable ... if the node is looking for information about a block (and has its hash), it would be impractical to linearly search the blocks files (blknnnn.dat)!
 6317:28 <michaelfolkson> For the new people the data directory: https://en.bitcoin.it/wiki/Data_directory
 6417:28 <BlueMoon> Thanks!!
 6517:29 <larryruane> michaelfolkson: +1 thanks! Okay how about question 3: "Why are blocks in the block files disordered?"
 6617:30 <adam2k> The blocks are added as they come in through the network, so there is no guarantee on order.
 6717:30 <BlueMoon> It's not necessary from what I've read, and each of these block files has a marker indicating the size of the block file, I imagine they are accessed via an index.
 6817:32 <josie[m]> the blk*.dat files have orphan blocks as well, which means we wouldn't expect a strict ordering
 6917:32 <michaelfolkson> "It is perfectly possible that *.blk files contains gaps of zeroes, or even partially written blocks" https://bitcoin.stackexchange.com/questions/49615/how-can-you-tell-if-youre-at-the-end-of-an-incomplete-blk-dat-file
 7017:32 <larryruane> adam2k: yes exactly, a long time ago, the blocks were ordered, but initial block download (IBD) got a major performance improvement by a feature called headers-first download
 7117:33 <blocknum256> The order of the blocks in the block files is set by the order you get them from peers during the IBD
 7217:34 <larryruane> So the node first downloads only headers (which are only 80 bytes each), figures out the best chain (assuming the blocks turn out to be valid), knows order of the blocks, so can request many blocks simultaneously from different peers ... and their reply times are kind of random .. so the blocks end up out of order
 7317:34 <larryruane> blocknum256: yes
 7417:35 <larryruane> (again, feel free to keep discussing, even if I go on...) Question 4: "What is CBlockFileInfo used for?"
 7517:36 <nassersaazi> larryruane: just for clarity....does that mean the headers are ordered?
 7617:37 <BlueMoon> Used to get information about the last block of files; blocks, size, heights, time....
 7717:37 <adam2k> CBlockFileInfo is a class that manages information about what is contained within the block files.  Things like the number of blocks.
 7817:37 <josie[m]> CBlockFileInfo contains metadata about a block file, min and max block height in the file, timestamps, size, etc
 7917:38 <michaelfolkson> That's what it contains but what is it used for? :) (I'm still looking)
 8017:39 <larryruane> nassersaazi: Yes the headers come in ordered by height, and there are many in a single message (getheaders P2P message) .. so the node basically says "I know about block hash X, give me up to N headers that build on block X"
 8117:40 <larryruane> josie[m]: yes, there's one of those for each `blknnnn.dat` file ... are they persisted to disk?
 8217:41 <adam2k> it looks like that class is used in several places, like within the blockstorage.cpp there are methods like LoadBlockIndexDB and WriteBlockIndexDB.
 8317:41 <BlueMoon> CBlockFileInfo is used to obtain the block file information entry for a block file.
 8417:42 <larryruane> adam2k: yes exactly, they're written out to LevelDB, part of the "block index" ... BlueMoon: yes!
 8517:42 <BlueMoon> :)
 8617:42 <josie[m]> larryruane: hm, not sure about on disk.. tho it does have a serialize method that looks like it outputs json
 8717:44 <josie[m]> does levelDB write to disk? i thought it was an in memory db
 8817:44 <larryruane> josie[m]: I don't think it's json .. but yes, anytime you see those serialization methods, you know this data structure is getting saved to (and read from) disk or sent over (and received from) the network
 8917:44 <larryruane> josie[m]: it's both, it caches in memory but also persists to disk
 9017:44 <josie[m]> larryruane: cool, TIL!
 9117:45 <larryruane> so if you look in datadir/blocks/index you'll see those *.ldb files, those are leveldb
 9217:46 <larryruane> ok let's see.. question 5 "What is reindexing, and how does it differ from IBD (initial block download)?"
 9317:48 <adam2k> According to the help cmd "Rebuild chain state and block index from the blk*.dat files on disk. This will also rebuild active optional indexes."
 9417:49 <michaelfolkson> It is almost like restarting IBD but with blocks you already have stored
 9517:49 <adam2k> Does that mean it does not re-download the blockchain from the network and use your local file storage instead?
 9617:49 <larryruane> yes, good, reindexing sources blocks from local files, IBD sources blocks from peers
 9717:49 <blocknum256> It rebuilds the index of blocks based on the blk*. dat files saved in . bitcoin/blocks
 9817:50 <larryruane> adam2k: yes, so depending on your network speed, reindex could be quite a bit faster than IBD
 9917:50 <hernanmarino_> adam2k: yes
10017:50 <larryruane> question 6: How does pruning interact with reindexing?
10117:51 <josie[m]> +1 to other answers, but in how it's different from IBD, reindexing does not verify the blocks, right? its assumed the blk files on disk have already been verified?
10217:51 <larryruane> (sorry I'm rushing a little, trying to have time to discuss the PR)
10317:51 <michaelfolkson> If the reindex fails (multiple times) you'd have to instruct it to start IBD afresh. It wouldn't do it automatically?
10417:51 <adam2k> I was looking at this earlier.  I have a pruned client and it appears that I don't have blknnnn.dat files that go beyond the memory limit that I set.
10517:51 <larryruane> josie[m]: I'm pretty sure it does verify the blocks
10617:52 <josie[m]> larryruane: ah, that makes sense, because in the case of a pruned client it is requesting those blocks from peers?
10717:52 <adam2k> So I'm assuming when I reindex with a pruned node I would need to do a partial IBD up until the point where I have the blknnnn.dat files?
10817:52 <larryruane> michaelfolkson: I think reindexing does as much as it can (using the blk files), and then the node goes into its usual "sync with my peers" .. which may amount to IBD
10917:52 <blocknum256> Q6: Since pruning raws block data for blocks older than a givven height, I think indexes are broke at that point
11017:53 <josie[m]> adam2k: i think that's correct, but im guessing you wouldnt store blk files on disk in the case of a pruned node, you would just use the blocks to build the index?
11117:53 <larryruane> adam2k: The reindex request (`-reindex` on the command line) would actually fail if the node is pruned
11217:54 <michaelfolkson> larryruane: I'd guess the optimal behavior would be to after multiple corruptions or whatever just drop all blocks and start afresh without input from user
11317:54 <michaelfolkson> Don't think it does that though
11417:54 <michaelfolkson> I think the user would have to decide that
11517:55 <adam2k> hmm...ok, maybe I don't have something configured correctly because I can see the blknnnn.dat files on disc and it seems like I can reindex without any failures.
11617:55 <larryruane> michaelfolkson: Yes it does do that, or very close.. like say reindex is able to process the first 100,000 blocks and then ran into some kind of corruption ... it will automatically IBD starting at 100,001
11717:56 <larryruane> adam2k: is your node pruned?
11817:57 <adam2k> I know this is off topic, but in my bitcoin.conf I set `prune=10204` isn't that all that I need to do in order to prune?
11917:57 <michaelfolkson> larryruane: So it detects when the corruption happened and drops blocks from the corruption onwards and requests them from peers? That's neat if so
12017:57 <larryruane> yes, and you're able to `-reindex`? I'll have to try that!
12117:58 <josie[m]> larryruane, adam2k: im pretty sure you can run reindex on a pruned node. i just tried by adding prune=1000 in bitcoin.conf and then restarted bitcoind with -reindex
12217:58 <adam2k> yeah, I'm trying now and it appears to work.  I have to run a pruned node because I only have 512GB of disc space total.
12317:58 <larryruane> ok that's cool, I didn't know that! I wonder how that works?
12417:59 <larryruane> Okay we're about out of time, any comments on the bug that the PR fixes, or how the PR fixes it?
12517:59 <josie[m]> i think it behaves the same as IBD (again, TIL), but it deletes older blocks as normal once you hit the pruned size. but yeah, i also need to play with this a bit more
12617:59 <larryruane> (feel free to stay on to continue to discuss, I'll stay for a while)
12717:59 <BlueMoon> No comment, thank you very much for so much valuable information. :)
12818:00 <michaelfolkson> Erm bug should definitely be fixed :)
12918:00 <blocknum256> larryruane: Thank you for your time!
13018:00 <josie[m]> thanks for hosting larryruane! this seemed like a very small change at first, but the discussion on the PR was really interesting
13118:00 <Lov3r_Of_Bitcoin> thank you
13218:00 <hernanmarino_> Thanks Larry
13318:01 <svav> Thanks larryruane
13418:01 <larryruane> #endmeeting