Don't download witnesses for assumed-valid blocks when running in prune mode (p2p, validation)

Mar 22, 2023

https://github.com/bitcoin/bitcoin/pull/27050

Host: dergoegge - PR author: dergoegge

Notes

BIP 141 (SegWit) introduced a new structure called a witness to each transaction. Transactions’ witnesses are commited to blocks separately from the transaction merkle tree. Witnesses contain data required to check transaction validity, but not required to determine transaction effects (output consumption/creation). In other words, witnesses are used to validate the blockchain state, not to determine what that state is.
- Witnesses are commited to by placing the root of the witness merkle tree in the block’s coinbase transaction. By doing so, the witnesses are also commited to in the transaction merkle tree through the coinbase transaction. Nesting the witness commitment in the coinbase transaction was done to make SegWit soft-fork compatible.
Assume-valid is a node setting that makes the node skip some transaction validity checks (signature and script checks) prior to a pre-determined “known to be good” block (assume-valid point). The default assume-valid point is set by the developers and is updated every release, but users have the ability to set their own assume-valid point through the -assumevalid setting.
- The assume-valid feature does not significantly change Bitcoin’s security assumptions. If developers (and everyone reviewing the code changes) were to conspire with miners to build a more-work chain with invalid signatures (and go undetected for weeks), and then include it as the default assume-valid point, they could get the network to accept an invalid chain. However, those same people already have that power by just changing the code - which would be much less obvious. (quoted) Additionally, as long as the full chain history remains available for auditing it would be hard for such an attack to go unnoticed.
- It is also important to note that the configured assume-valid point does not dictate which chain a node follows. The node still does Proof of Work checks, meaning that a large reorg would be able to orphan (parts of) the assumed-valid chain.
Nodes in prune mode (enabled by the -prune setting) fully download and validate the chain history to build a UTXO set enabling them to fully validate any new transaction, but only store a (configurable) portion of the recent history.
PR #27050 proposes to skip downloading the witnesses for blocks prior to the configured assume-valid point, for nodes running in prune mode. The rationale for this change is that pruned nodes currently download witnesses but then (prior to the assume-valid point) don’t validate them and delete them shortly after. So why not skip downloading those witnesses and save some bandwidth?

Questions

Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?
How much bandwidth is saved, i.e., what is the cumulative size of all witness data up to block 0000000000000000000013a20dcc8577282e1eabd430592bb8afdd5fe544c05a? (Hint: the getblock RPC returns the size and strippedsize (size excluding witnesses) for each block).
The end goal of the PR can be achieved with very few changes to the code (ignoring edge case scenarios). It essentially only requires two changes, one to the block request logic and one to block validation. Can you (in your own words) describe these two changes in more detail?
Without this PR, script validation is skipped under assume-valid, but other checks that involve witness data are not skipped. What other witness related checks exist as part of validation on master?
With this PR, all additional witness related checks (Q4) will be skipped for assumed-valid blocks. Is it ok to skip these additional checks? Why or why not?
The PR does not include an explicit code change for skipping all the extra checks from Q4. Why does that work out?
Peter Todd left a comment concerning a reduction in security with the changes made in the PR. Can you in your own words summarize his concerns? Do you agree/disagree with them?

Meeting Log

17:00

<dergoegge> #startmeeting

17:00

<glozow> hi

17:00

<DaveBeer> hi

17:00

<emzy> hi

17:00

<effexzi> Hi every1

17:00

<abubakar> Hello

17:00

<brunoerg> hi

17:00

<lightlike> hello

17:00

<dergoegge> Hi everyone, welcome to this week's review club!

17:01

<dergoegge> Feel free to say to let people know you are here :)

17:01

<dergoegge> Anyone here for the first time?

17:01

<fanquake> hi

17:01

<LarryRuane> hi

17:02

<glozow> it's fanquake's first time

17:02

<schmidty> hi

17:02

<dergoegge> welcome fanquake!

17:02

<pakaro> hi

17:02

<vicodark> first timer here

17:02

<glozow> welcome vicodark!

17:02

<fanquake> yea thanks very new here

17:02

<vicodark> thanks!

17:03

<dergoegge> This week we are looking at "Don't download witnesses for assumed-valid blocks when running in prune mode" #27050

17:03

<dergoegge> Notes and questions are in the usual place: https://bitcoincore.reviews/27050

17:05

<dergoegge> Ok lets get started: Who had a chance to look at the notes this week? (y/n)

17:05

<previewer> y

17:05

<DaveBeer> y

17:05

<abubakar> Yes

17:05

<pakaro> y

17:05

<emzy> n

17:06

<brunoerg> y

17:06

<pakaro> concept clarification - if prune=0 & and av=1 we still need the witness data because eventually the witness-validity will be checked, perhaps once the node has caught up entirely?

17:06

<hernanmarino> Hi !

17:07

<Amirreza> Hi

17:07

<glozow> pakaro: no, the witnesses are never checked. they may be useful if another node requests the block from us, though

17:07

<LarryRuane> pakaro: I think we need to download the witness data in that case so we're able to serve blocks to peers

17:08

<dergoegge> pakaro: if prune=0 and av=1, then witness validation is skipped under assume-valid for each block and not validated later on

17:08

<dergoegge> glozow: +1

17:08

<pakaro> ahh thanks glozow larryruane dergoegge

17:08

<_aj_> pakaro: or the node operator might run -reindex with -noassumevalid, or they might lookup a post-segwit tx via getrawtransaction and want to see the witness data

17:09

<dergoegge> Next question: Did you review the PR? Concept ACK, approach ACK, tested ACK, or NACK? What was your review approach?

17:09

<LarryRuane> concept ACK

17:10

<Amirreza> Is assumed-valid by default 256?

17:10

<abubakar> Concept Ack

17:10

<pakaro> good point _aj_

17:11

<DaveBeer> yes, read through notes and concerns raised in PR's conversation, haven't made decision regarding ACK

17:11

<brunoerg> Concept ACK

17:11

<abubakar> I’ve read the PR and tested it on signet network

17:12

<dergoegge> amirreza: the default value for assume valid can be found here: https://github.com/bitcoin/bitcoin/blob/fc7c21f664fd24ac17f518d07f04e0a3d9f8681c/src/kernel/chainparams.cpp#L107

17:12

<dergoegge> and it is updated before every major release

17:14

<Amirreza> So, when we run the bitcoind, it does not necessarily validate blocks from the genesis?

17:14

<dergoegge> The next question involved a little bit of homework/prep-work so i am curious to see who did it :)

17:14

<dergoegge> How much bandwidth is saved, i.e., what is the cumulative size of all witness data up to block 0000000000000000000013a20dcc8577282e1eabd430592bb8afdd5fe544c05a?

17:15

<previewer> I'm gonna cheat, 46% of the whole is saved?

17:15

<dergoegge> amirreza: we only skip script validation, all other checks are still done for all blocks

17:16

<dergoegge> previewer: how did you get to that number?

17:16

<previewer> i read peter todds comment

17:16

<hernanmarino> :)

17:17

<glozow> you mean from https://github.com/bitcoin/bitcoin/pull/27050#issuecomment-1419611954? I don't see a 46

17:17

<dergoegge> so peter got that number from pieters comment (https://github.com/bitcoin/bitcoin/pull/27050#issuecomment-1419611954)

17:17

<abubakar1> I ran test on signet network up the bandwidth saved by not downloading

17:17

<abubakar1> witness data to a certain block.

17:17

<abubakar1> There will be 47% bandwidth to be saved equivalent to 294232577 kb of data

17:18

<abubakar1> to the tip of the chain

17:18

<dergoegge> but pieter was talking about the last 100k blocks not all blocks

17:18

<previewer> oops, he said "43% savings."

17:18

<LarryRuane> "what is the cumulative size of all witness data" -- this is slow but should work, but it goes to the chain tip rather than that specific block: https://gist.github.com/LarryRuane/2c52467afe0090432a2a0ed592054d72

17:19

<Amirreza> dergoegge: What checks exist beside the validation? (The checks I can guess are, checking for dust-tx and timestamp)

17:19

<sipa> I don't remember how I computed it.

17:19

<pakaro> abubakar1 I reached that ballpark number -> 250kB savings per block multiplied by ~750,000 blocks

17:19

<dergoegge> LarryRuane: cool! did you run it? I think the number at the tip should be close to the block hash in the question :)

17:20

<LarryRuane> it's running right now but probably will take longer than the rest of this meeting.. but i'll post the result (even after the meeting)

17:20

<abubakar1> :)

17:20

<_aj_> segwit only activated at block 481824, so multiply by 300k blocks at most?

17:20

<lightlike> pakaro: there are no savings for pre-segwit blocks, segwit was only activate at height ~480k

17:21

<dergoegge> amirreza: i can't list them all but to name a few: making sure inputs exist in the utxo set, checking the proof of work, inflation checks, ...

17:21

<ottosch> Amirreza: format, outputs <= inputs, coin maturity etc

17:22

<dergoegge> Ok does anybody have a guess in GB? otherwise i can post the result now or at the end

17:22

<_aj_> dust is a relay/standardness rule, not a consensus one, so isn't checked for blocks

17:22

<Amirreza> dergoegge, ottosch : thanks, where can I find them in the codebase? Are they all in a single place?

17:23

<dergoegge> amirreza: a lot of them can be found in `src/validation.cpp`

17:24

<abubakar1> larryRuane: took 18 minutes to complete for me, so yeah can take a while.

17:24

<dergoegge> So my answer to the question is 110.6 GB by the way :)

17:25

<dergoegge> The end goal of the PR can be achieved with very few changes to the code (ignoring edge case scenarios). It essentially only requires two changes, one to the block request logic and one to block validation. Can you (in your own words) describe these two changes in more detail?

17:26

<ottosch> Amirreza: src/consensus/tx_verify.cpp too

17:26

<LarryRuane> that's very significant! that's about 10% of my monthly cable (comcast/xfinity) limit (which i think is 1.2TB)

17:27

<dergoegge> Yea I think shaving off a ~100GB of IBD is quite a nice win

17:27

<dergoegge> (and with the recent jpeg hype that number will probably grow)

17:28

<lightlike> If pruning and block is assumed valid: 1) In SendMessages, remove MSG_WITNESS_FLAG from fetch flags so our peers don't send us the witness data. 2)In validation, skip witness merkle tree checks because we don't have the witness.

17:30

<dergoegge> lightlike: yes that is correct, currently those changes are in these two commits: https://github.com/bitcoin/bitcoin/pull/27050/commits/7aafe8ab5118205ce783ff232535562aa26afae4, https://github.com/bitcoin/bitcoin/pull/27050/commits/282b58ab331cf94ae4d92ef27f6e69bfc40548a9

17:31

<dergoegge> i was a bit surprised that it's this easy (ignoring edge cases)

17:32

<dergoegge> Without this PR, script validation is skipped under assume-valid, but other checks that involve witness data are not skipped. What other witness related checks exist as part of validation on master?

17:34

<LarryRuane> lightlike's point 2 above?

17:34

<LarryRuane> (Coinbase merkle root check)

17:35

<dergoegge> yes that's one

17:35

<ottosch> witnesses size and amount?

17:37

<dergoegge> ottosch: yes on witness sizes, what do you mean by amount?

17:37

<ottosch> max stack items

17:38

<dergoegge> ottosch: +1

17:38

<dergoegge> I would also count the block weight check as a witness size check: https://github.com/bitcoin/bitcoin/blob/fc7c21f664fd24ac17f518d07f04e0a3d9f8681c/src/validation.cpp#L3744-L3752

100

17:40

<dergoegge> What about blocks that don't commit to witness data, should they be allowed to have witnesses?

101

17:42

<pakaro> dergoegge i.e. a block that only have pre-segwit transactions in them?

102

17:43

<dergoegge> yea what if someone attaches witnesses to a pre-segwit block?

103

17:44

<dergoegge> (that is attach them to transactions in the block, not the block itself)

104

17:45

<dergoegge> I am referring to this rule: https://github.com/bitcoin/bitcoin/blob/fc7c21f664fd24ac17f518d07f04e0a3d9f8681c/src/validation.cpp#L3735-L3742

105

17:46

<dergoegge> "No witness data is allowed in blocks that don't commit to witness data, as this would otherwise leave room for spam"

106

17:47

<pakaro> is there a separate check to ensure that there is no witness data in 1'ordinary' transactions and 2'coinbase' transactions, or does one check suffice?

107

17:48

<pakaro> in my understanding a coinbase tx is very similar to a normal tx, really just with nblocktime spending rules , nvalue, etc, so one check should suffice?

108

17:48

<dergoegge> pakaro: the check i linked checks all transactions at once i.e. no extra check for coinbases

109

17:48

<DaveBeer> do witness data have any size limit at all?

110

17:49

<abubakar1> DaveBeer: yes there are policy rules on witness data size

111

17:49

<pakaro> thx dergoegge [i'm still not great at c++]

112

17:50

<dergoegge> DaveBeer there are also consensus rules on the size

113

17:50

<glozow> policy rules don't apply here

114

17:51

<dergoegge> The maximum block weight being one of them: https://github.com/bitcoin/bitcoin/blob/fc7c21f664fd24ac17f518d07f04e0a3d9f8681c/src/validation.cpp#L3750

115

17:51

<dergoegge> iirc there are also limits on individual witnesses but I am not sure on the specifics

116

17:52

<DaveBeer> right, you have already linked that check before, thanks dergoegge

117

17:53

<dergoegge> The PR does not include an explicit code change for skipping all witness related checks. It only explicitly skips the witness merkle root check. Why does that work out?

118

17:55

<pakaro> dergoegge I dont think there are individual limits because there was that jpg-wizard spend and that file was 4MB, therefore unless the limit was the same was block weight, which would render the rule meaningless anyway

119

17:55

<abubakar1> I think without the witness data, there will be no checks to do

120

17:56

<dergoegge> pakaro: indeed but there are different segwit version which have slightly different rules *i think*

121

17:56

<dergoegge> iirc taproot (segwit v1) removed the individual limits

122

17:57

<dergoegge> It turns out that all the extra checks *just* pass when you don't have any witnesses. Which makes sense considering that segwit was a soft-fork. With the PR, we are essentially just pretending like we are a pre-segwit node (up to the assume-valid point).

123

17:59

<abubakar1> +1 makes sense legacy outputs don't have witness data and it passes

124

17:59

<dergoegge> I don't think we have time for the last question unfortunately. Feel free to stick around and ask more questions!

125

17:59

<dergoegge> Thank you all for coming!

126

17:59

<dergoegge> #endmeeting