The PR branch HEAD was 4438aed09 at the time of this review club meeting.
Notes
PR history
The idea to use Muhash in Bitcoin Core was initially introduced by Pieter
Wuille in this mailing list
post
from 2017.
Pieter proposed an implementation in PR
#10434, which still comprises
most of the code in this week’s PR.
Fabian Jahr then picked up the proposal and made further research in 2019. A snapshot
of the work can be seen in this
gist.
Based on further insights and feedback, the idea evolved into implementing
an index for all coin statistics and not only for the hash of the UTXO
set. This was implemented in
PR #18000.
This week’s PR is the first in a series of PRs to incrementally implement
PR #18000.
PR 19055
This PR modifies how the hash of the UTXO set is calculated. That hash, as well
as other coin statistics, can be accessed through the gettxoutsetinfo
RPC. It uses the Muhash
algorithm which allows for
incremental hashing.
The truncated hash SHA512/256 is used
to prepare data for use in Muhash.
Questions
What does “incremental hashing” mean? Which of its properties are interesting
for our use case?
What were the use cases for RPC gettxoutsetinfo described in the resource
links? Can you think of others? Which use cases are the most relevant to you?
What do you think could be the biggest downside to introducing Muhash for the
UTXO set hash?
This PR not only adds Muhash but also TruncatedSHA256Writer. Why?
Why is the Muhash class implemented in Bitcoin Core and not in
libsecp256k1?
Did you look into the other proposed hashing algorithms in some of the
resources? What were their drawbacks? Do you think Muhash is the right
choice?
What are your thoughts on the Muhash benchmarks, e.g. (a) the benchmarking
code, and (b) your observations if you tested the code locally?
Considering the trade-offs, should the old hashing algorithm be kept around
and accessible (using a flag, for example)?
<fjahr> Let's start with the questions I came up with, they are focussed on conceptual understanding, but definitely throw in more technical questions! What does “incremental hashing” mean? Which of its properties are interesting for our use case?
<raj_149> fjahr: adding new items into the hash input set without redoing the full hash. Easy to add and remove items from the set seems disreable for this purpose.
<fjahr> I think these answer were all good, lets move on: What were the use cases for RPC gettxoutsetinfo described in the resource links? Can you think of others? Which use cases are the most relevant to you?
<willcl_ark> I think it might be nice for an SPV wallet, along with checking valid headers (with most work) were provided, to also query multiple nodes' UTXO set hashes
<raj_149> sipa: does that then also imply collisons are much common in case of incremental hashing? or its collison is not related to incremeental property atall?
<sipa> uproar: muhash/ecmh/... don't permit compact proofs of inclusion or exclusion, so no - all you can do is check easily whether two sets you already have are identical
<uproar> fjahr fast amount/supply auditing is something I'd like but on the other hand I question to what end does it serve nodes (rather than users of those nodes), is MuHash decreasing of node-operation costs when the command is run?
<sipa> uproar: muhash itself is slower than the current hash, but the advantage is that it can be incrementally computed and maintained in the background (updating the hash at every block, rather than at RPC time)... making the RPC instantaneous
<uproar> the UX benefit is: look no surprising inflation! but from the node operation perspective is: There shouldn't have been unless some pretty fundamental aspect about validation are broken
<uproar> sipa is it that the first gettxoutsetinfo is going to get substantially slower but the incremental calls on gettxoutsetinfo will get faster relative to old gettxoutsetinfo subsequent calls?
<adiabat> I guess for future PRs there are several options: increment at each block, or update a cache every time gettxouset is called and increment then
<willcl_ark> Would it not be wise to add a commit for the rolling hash as part of this PR? Why would you split this way (with a performance degredation)?
<michaelfolkson> There's another downside in terms of time taken to brute force the pre-image right? Once the pre-image is known from the original hash an attacker can brute force the pre-image for the new hash in a shorter time than normal?
<adiabat> I don't see why updating when requested needs to look at the whole utxo set? Couldn't it replay blocks since last call and perform the additions / deletions that way?
<sipa> michaelfolkson: assuming the discrete logarithm in GF(2^3072 - 1103717) takes is 128-bit secure, that ChaCha20 and truncated SHA512 have their intended security, ...
<uproar> maybe I'm missing something but if currently SHA256 hasing in gettxoutsetinfo makes calls take ~5 minutes, and MuHash increases the time, what's the benefit? being able to do it in the background or that the total sum of updates costs less because of the "homomorphic set" properties?
<uproar> sipa ok, thanks I think i mistook "Muhash is slower" to mean "the entire update process for post PR will be slower" which I take be not the case
<fjahr> ray_149: so the truncated hash does not decrease security but i don't think it increases it either. It is an efficient way to turn the raw data of utxos into a 256bit number.
<sipa> nehan: blocks are "patches" to the UTXO set; if you have the hash state as of block N, you you roll that state forward by applying the blocks after it to it
<sipa> willcl_ark: SHA256 transforms an internal 32-byte state by consuming 64 byte blocks at a time; the runtime is proportional to have many of those 64-byte blocks you need to consume
<sipa> SHA512 transforms an internal 64-byte state by consuming 128 byte blocks at a time; the runtime per consumption is longer than SHA256's, but typically faster per byte
<jonatack> fjahr: it is good that you added the legacy_hash bool to the RPC, but i would suggest making MuHash opt-in rather than the default until the index is merged
<sipa> fjahr: if you're going to cache the hash for every block... that's actually an argument in favor of ECMH, as the minimal "state" to keep for ECMH is 33 bytes only, while for MuHash it's 384 bytes
<thomasb06> jonatack: as expected, the archwiki admins are not interested to make a new page on how to compile and test the Bitcoin core. Your page will remain the reference for Linux builds for another while...
<willcl_ark> I didn't try the PR on my server, but current performance is 60 seconds exactly for gettxoutset so at ~5x slower that will be an issue (that a flag can solve, but still...)
<sipa> nehan: i suspect that once the index code exist, everyone who more than very exceptionally uses gettxoutsetinfo will enable the index (as it's pretty much free)
<jnewbery> if you're someone who wants to query the utxo set hash frequently, then build the index. If you're not, a one-off hit on the rare ocassion you do run it is acceptable
<fjahr> For those that tested the code already: What are your thoughts on the Muhash benchmarks, e.g. (a) the benchmarking code, and (b) your observations if you tested the code locally?
<jnewbery> fjahr: my high-level thought about the PR is that it's good that you've split this from #18000, but there's still a lot of code there to review. It's a mix of python cryptography, C++ crypto, ASM, RPC code,... If you could split off smaller parts to review, it'd be easier to make progress
<fjahr> The two use cases I met mostly were: a) "for fun" to check if the node was running and to se the stats and b) checking circulating supply regularly (Bitmex publishes the numbers for example)
<sipa> fjahr: one thing that could be done if you have the index is make the startup consistency check roll back the utxo hash a few blocks, and compare it with the index
<jonatack> jnewbery: fjahr: i agree, it's a long and hard review because it's so wide ... multi-day if you get into the crypto algo implementation and the assembly optimising
<jnewbery> If there's a split PR on the python implementation, it'd be great to have a review club on that where we can really get into the weeds on the crypto :)