Don't download witnesses for assumed-valid blocks when running in prune mode (p2p, validation)

https://github.com/bitcoin/bitcoin/pull/27050

Host: dergoegge  -  PR author: dergoegge

Notes

  • BIP 141 (SegWit) introduced a new structure called a witness to each transaction. Transactions’ witnesses are commited to blocks separately from the transaction merkle tree. Witnesses contain data required to check transaction validity, but not required to determine transaction effects (output consumption/creation). In other words, witnesses are used to validate the blockchain state, not to determine what that state is.

    • Witnesses are commited to by placing the root of the witness merkle tree in the block’s coinbase transaction. By doing so, the witnesses are also commited to in the transaction merkle tree through the coinbase transaction. Nesting the witness commitment in the coinbase transaction was done to make SegWit soft-fork compatible.
  • Assume-valid is a node setting that makes the node skip some transaction validity checks (signature and script checks) prior to a pre-determined “known to be good” block (assume-valid point). The default assume-valid point is set by the developers and is updated every release, but users have the ability to set their own assume-valid point through the -assumevalid setting.

    • The assume-valid feature does not significantly change Bitcoin’s security assumptions. If developers (and everyone reviewing the code changes) were to conspire with miners to build a more-work chain with invalid signatures (and go undetected for weeks), and then include it as the default assume-valid point, they could get the network to accept an invalid chain. However, those same people already have that power by just changing the code - which would be much less obvious. (quoted) Additionally, as long as the full chain history remains available for auditing it would be hard for such an attack to go unnoticed.

    • It is also important to note that the configured assume-valid point does not dictate which chain a node follows. The node still does Proof of Work checks, meaning that a large reorg would be able to orphan (parts of) the assumed-valid chain.

  • Nodes in prune mode (enabled by the -prune setting) fully download and validate the chain history to build a UTXO set enabling them to fully validate any new transaction, but only store a (configurable) portion of the recent history.

  • PR #27050 proposes to skip downloading the witnesses for blocks prior to the configured assume-valid point, for nodes running in prune mode. The rationale for this change is that pruned nodes currently download witnesses but then (prior to the assume-valid point) don’t validate them and delete them shortly after. So why not skip downloading those witnesses and save some bandwidth?

Questions

  1. Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

  2. How much bandwidth is saved, i.e., what is the cumulative size of all witness data up to block 0000000000000000000013a20dcc8577282e1eabd430592bb8afdd5fe544c05a? (Hint: the getblock RPC returns the size and strippedsize (size excluding witnesses) for each block).

  3. The end goal of the PR can be achieved with very few changes to the code (ignoring edge case scenarios). It essentially only requires two changes, one to the block request logic and one to block validation. Can you (in your own words) describe these two changes in more detail?

  4. Without this PR, script validation is skipped under assume-valid, but other checks that involve witness data are not skipped. What other witness related checks exist as part of validation on master?

  5. With this PR, all additional witness related checks (Q4) will be skipped for assumed-valid blocks. Is it ok to skip these additional checks? Why or why not?

  6. The PR does not include an explicit code change for skipping all the extra checks from Q4. Why does that work out?

  7. Peter Todd left a comment concerning a reduction in security with the changes made in the PR. Can you in your own words summarize his concerns? Do you agree/disagree with them?

Meeting Log

  117:00 <dergoegge> #startmeeting
  217:00 <glozow> hi
  317:00 <DaveBeer> hi
  417:00 <emzy> hi
  517:00 <effexzi> Hi every1
  617:00 <abubakar> Hello
  717:00 <brunoerg> hi
  817:00 <lightlike> hello
  917:00 <dergoegge> Hi everyone, welcome to this week's review club!
 1017:01 <dergoegge> Feel free to say to let people know you are here :)
 1117:01 <dergoegge> Anyone here for the first time?
 1217:01 <fanquake> hi
 1317:01 <LarryRuane> hi
 1417:02 <glozow> it's fanquake's first time
 1517:02 <schmidty> hi
 1617:02 <dergoegge> welcome fanquake!
 1717:02 <pakaro> hi
 1817:02 <vicodark> first timer here
 1917:02 <glozow> welcome vicodark!
 2017:02 <fanquake> yea thanks very new here
 2117:02 <vicodark> thanks!
 2217:03 <dergoegge> This week we are looking at "Don't download witnesses for assumed-valid blocks when running in prune mode" #27050
 2317:03 <dergoegge> Notes and questions are in the usual place: https://bitcoincore.reviews/27050
 2417:05 <dergoegge> Ok lets get started: Who had a chance to look at the notes this week? (y/n)
 2517:05 <previewer> y
 2617:05 <DaveBeer> y
 2717:05 <abubakar> Yes
 2817:05 <pakaro> y
 2917:05 <emzy> n
 3017:06 <brunoerg> y
 3117:06 <pakaro> concept clarification - if prune=0 & and av=1 we still need the witness data because eventually the witness-validity will be checked, perhaps once the node has caught up entirely?
 3217:06 <hernanmarino> Hi !
 3317:07 <Amirreza> Hi
 3417:07 <glozow> pakaro: no, the witnesses are never checked. they may be useful if another node requests the block from us, though
 3517:07 <LarryRuane> pakaro: I think we need to download the witness data in that case so we're able to serve blocks to peers
 3617:08 <dergoegge> pakaro: if prune=0 and av=1, then witness validation is skipped under assume-valid for each block and not validated later on
 3717:08 <dergoegge> glozow: +1
 3817:08 <pakaro> ahh thanks glozow larryruane dergoegge
 3917:08 <_aj_> pakaro: or the node operator might run -reindex with -noassumevalid, or they might lookup a post-segwit tx via getrawtransaction and want to see the witness data
 4017:09 <dergoegge> Next question: Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
 4117:09 <LarryRuane> concept ACK
 4217:10 <Amirreza> Is assumed-valid by default 256?
 4317:10 <abubakar> Concept Ack
 4417:10 <pakaro> good point _aj_
 4517:11 <DaveBeer> yes, read through notes and concerns raised in PR's conversation, haven't made decision regarding ACK
 4617:11 <brunoerg> Concept ACK
 4717:11 <abubakar> I’ve read the PR and tested it on signet network
 4817:12 <dergoegge> amirreza: the default value for assume valid can be found here: https://github.com/bitcoin/bitcoin/blob/fc7c21f664fd24ac17f518d07f04e0a3d9f8681c/src/kernel/chainparams.cpp#L107
 4917:12 <dergoegge> and it is updated before every major release
 5017:14 <Amirreza> So, when we run the bitcoind, it does not necessarily validate blocks from the genesis?
 5117:14 <dergoegge> The next question involved a little bit of homework/prep-work so i am curious to see who did it :)
 5217:14 <dergoegge> How much bandwidth is saved, i.e., what is the cumulative size of all witness data up to block 0000000000000000000013a20dcc8577282e1eabd430592bb8afdd5fe544c05a?
 5317:15 <previewer> I'm gonna cheat, 46% of the whole is saved?
 5417:15 <dergoegge> amirreza: we only skip script validation, all other checks are still done for all blocks
 5517:16 <dergoegge> previewer: how did you get to that number?
 5617:16 <previewer> i read peter todds comment
 5717:16 <hernanmarino> :)
 5817:17 <glozow> you mean from https://github.com/bitcoin/bitcoin/pull/27050#issuecomment-1419611954? I don't see a 46
 5917:17 <dergoegge> so peter got that number from pieters comment (https://github.com/bitcoin/bitcoin/pull/27050#issuecomment-1419611954)
 6017:17 <abubakar1> I ran test on signet network up the bandwidth saved by not downloading
 6117:17 <abubakar1> witness data to a certain block.
 6217:17 <abubakar1> There will be 47% bandwidth to be saved equivalent to 294232577 kb of data
 6317:18 <abubakar1> to the tip of the chain
 6417:18 <dergoegge> but pieter was talking about the last 100k blocks not all blocks
 6517:18 <previewer> oops, he said  "43% savings."
 6617:18 <LarryRuane> "what is the cumulative size of all witness data" -- this is slow but should work, but it goes to the chain tip rather than that specific block: https://gist.github.com/LarryRuane/2c52467afe0090432a2a0ed592054d72
 6717:19 <Amirreza> dergoegge: What checks exist beside the validation? (The checks I can guess are, checking for dust-tx and timestamp)
 6817:19 <sipa> I don't remember how I computed it.
 6917:19 <pakaro> abubakar1 I reached that ballpark number -> 250kB savings per block multiplied by ~750,000 blocks
 7017:19 <dergoegge> LarryRuane: cool! did you run it? I think the number at the tip should be close to the block hash in the question :)
 7117:20 <LarryRuane> it's running right now but probably will take longer than the rest of this meeting.. but i'll post the result (even after the meeting)
 7217:20 <abubakar1> :)
 7317:20 <_aj_> segwit only activated at block 481824, so multiply by 300k blocks at most?
 7417:20 <lightlike> pakaro: there are no savings for pre-segwit blocks, segwit was only activate at height ~480k
 7517:21 <dergoegge> amirreza: i can't list them all but to name a few: making sure inputs exist in the utxo set, checking the proof of work, inflation checks, ...
 7617:21 <ottosch> Amirreza: format, outputs <= inputs, coin maturity etc
 7717:22 <dergoegge> Ok does anybody have a guess in GB? otherwise i can post the result now or at the end
 7817:22 <_aj_> dust is a relay/standardness rule, not a consensus one, so isn't checked for blocks
 7917:22 <Amirreza> dergoegge, ottosch : thanks, where can I find them in the codebase? Are they all in a single place?
 8017:23 <dergoegge> amirreza: a lot of them can be found in `src/validation.cpp`
 8117:24 <abubakar1> larryRuane: took 18 minutes to complete for me, so yeah can take a while.
 8217:24 <dergoegge> So my answer to the question is 110.6 GB by the way :)
 8317:25 <dergoegge> The end goal of the PR can be achieved with very few changes to the code (ignoring edge case scenarios). It essentially only requires two changes, one to the block request logic and one to block validation. Can you (in your own words) describe these two changes in more detail?
 8417:26 <ottosch> Amirreza: src/consensus/tx_verify.cpp too
 8517:26 <LarryRuane> that's very significant! that's about 10% of my monthly cable (comcast/xfinity) limit (which i think is 1.2TB)
 8617:27 <dergoegge> Yea I think shaving off a ~100GB of IBD is quite a nice win
 8717:27 <dergoegge> (and with the recent jpeg hype that number will probably grow)
 8817:28 <lightlike> If pruning and block is assumed valid: 1) In SendMessages, remove MSG_WITNESS_FLAG from fetch flags so our peers don't send us the witness data. 2)In validation, skip witness merkle tree checks because we don't have the witness.
 8917:30 <dergoegge> lightlike: yes that is correct, currently those changes are in these two commits: https://github.com/bitcoin/bitcoin/pull/27050/commits/7aafe8ab5118205ce783ff232535562aa26afae4, https://github.com/bitcoin/bitcoin/pull/27050/commits/282b58ab331cf94ae4d92ef27f6e69bfc40548a9
 9017:31 <dergoegge> i was a bit surprised that it's this easy (ignoring edge cases)
 9117:32 <dergoegge> Without this PR, script validation is skipped under assume-valid, but other checks that involve witness data are not skipped. What other witness related checks exist as part of validation on master?
 9217:34 <LarryRuane> lightlike's point 2 above?
 9317:34 <LarryRuane> (Coinbase merkle root check)
 9417:35 <dergoegge> yes that's one
 9517:35 <ottosch> witnesses size and amount?
 9617:37 <dergoegge> ottosch: yes on witness sizes, what do you mean by amount?
 9717:37 <ottosch> max stack items
 9817:38 <dergoegge> ottosch: +1
 9917:38 <dergoegge> I would also count the block weight check as a witness size check: https://github.com/bitcoin/bitcoin/blob/fc7c21f664fd24ac17f518d07f04e0a3d9f8681c/src/validation.cpp#L3744-L3752
10017:40 <dergoegge> What about blocks that don't commit to witness data, should they be allowed to have witnesses?
10117:42 <pakaro> dergoegge i.e. a block that only have pre-segwit transactions in them?
10217:43 <dergoegge> yea what if someone attaches witnesses to a pre-segwit block?
10317:44 <dergoegge> (that is attach them to transactions in the block, not the block itself)
10417:45 <dergoegge> I am referring to this rule: https://github.com/bitcoin/bitcoin/blob/fc7c21f664fd24ac17f518d07f04e0a3d9f8681c/src/validation.cpp#L3735-L3742
10517:46 <dergoegge> "No witness data is allowed in blocks that don't commit to witness data, as this would otherwise leave room for spam"
10617:47 <pakaro> is there a separate check to ensure that there is no witness data in 1'ordinary' transactions and 2'coinbase' transactions, or does one check suffice?
10717:48 <pakaro> in my understanding a coinbase tx is very similar to a normal tx, really just with nblocktime spending rules , nvalue, etc, so one check should suffice?
10817:48 <dergoegge> pakaro: the check i linked checks all transactions at once i.e. no extra check for coinbases
10917:48 <DaveBeer> do witness data have any size limit at all?
11017:49 <abubakar1> DaveBeer: yes there are policy rules on witness data size
11117:49 <pakaro> thx dergoegge [i'm still not great at c++]
11217:50 <dergoegge> DaveBeer there are also consensus rules on the size
11317:50 <glozow> policy rules don't apply here
11417:51 <dergoegge> The maximum block weight being one of them: https://github.com/bitcoin/bitcoin/blob/fc7c21f664fd24ac17f518d07f04e0a3d9f8681c/src/validation.cpp#L3750
11517:51 <dergoegge> iirc there are also limits on individual witnesses but I am not sure on the specifics
11617:52 <DaveBeer> right, you have already linked that check before, thanks dergoegge
11717:53 <dergoegge> The PR does not include an explicit code change for skipping all witness related checks. It only explicitly skips the witness merkle root check. Why does that work out?
11817:55 <pakaro> dergoegge I dont think there are individual limits because there was that jpg-wizard spend and that file was 4MB, therefore unless the limit was the same was block weight, which would render the rule meaningless anyway
11917:55 <abubakar1> I think without the witness data, there will be no checks to do
12017:56 <dergoegge> pakaro: indeed but there are different segwit version which have slightly different rules *i think*
12117:56 <dergoegge> iirc taproot (segwit v1) removed the individual limits
12217:57 <dergoegge> It turns out that all the extra checks *just* pass when you don't have any witnesses. Which makes sense considering that segwit was a soft-fork. With the PR, we are essentially just pretending like we are a pre-segwit node (up to the assume-valid point).
12317:59 <abubakar1> +1 makes sense legacy outputs don't have witness data and it passes
12417:59 <dergoegge> I don't think we have time for the last question unfortunately. Feel free to stick around and ask more questions!
12517:59 <dergoegge> Thank you all for coming!
12617:59 <dergoegge> #endmeeting