The PR branch HEAD was e19e500134 at the time of this review club meeting.
There was a previous review club meeting on PR 19055 when the PR
included calculating the Muhash of the UTXO set. That review club session
focused on the high-level concepts of using Muhash as a rolling hash for the
UTXO set.
The scope of the PR has since been reduced to only include the implementation
of the Muhash algorithm in C++. In this review club meeting, we’ll dig into the
cryptographic code in detail.
Notes
This PR is an implementation of the Muhash algorithm, which was first described
in the paper A New Paradigm for Collision-free Hashing: Incrementality at
Reduced Cost by Bellare
and Micciancio. Pieter Wuille wrote a mailing list post in 2017 on Rolling
UTXO set
hashes,
which compared Muhash with Elliptic Curve Multiset Hash (another possible way
of implementing rolling hashes). You should read Wuille’s mailing list post
before starting to review the code. You can also look at the Bellare-Micciancio
paper, but it’s more detail than you’ll need in order to review the
implementation.
A Python implementation of the Muhash algorithm was merged in PR
19105 last month. We
discussed that in a previous PR review club meeting. As you review
the new code, you may find it helpful to compare it with the Python
implementation. Python’s built-in support for bignums and modular inverses are
much easier to follow than the optimized C++ code.
The new code is in the
src/crypto
directory, which also includes implementations of other frequently used
cryptographic functions. Take a look at the SHA256, SHA3 and SipHash
implementations, and you’ll notice some similarities in the way the interfaces
are designed.
Questions
Specification
How much state is stored inside the MuHash3072 rolling hash object? How much
data is returned to a user requesting the set hash?
Why was 3072 bits chosen as the size of the group?
Can the Muhash of a single object (eg a transaction) be calculated and cached?
Would we do this in practice?
What is the most expensive operation to carry out in the rolling hash? What can
we do to reduce the number of times we need to carry out this operation?
How can we test for membership in the Muhash set?
Implementation
What public methods does the MuHash3072 object expose to clients? What is the
Span<> class template that’s used in some of those public methods?
<@jnewbery> A reminder of the format: I've prepared some questions to guide the discussion a bit, but feel free to jump in at any point if you have a question
<@jnewbery> that's great. Any first impressions? I really enjoyed getting stuck into the low level cryptographic code. It's not something I look at very much
<@jnewbery> First question: How much state is stored inside the MuHash3072 rolling hash object? How much data is returned to a user requesting the set hash?
<glozow> Just a Num3072 which is an array of “limbs” (represented as uints size depending on what system supports) for a total = 3072 bits. Output is a 384b hash
<stacie> The rolling hash object stores 2 pieces of information - (1) the product of hashes for all the elements that have entered the set, and (2) the product of hashes for all elements that have been removed from the set. (at least that's what I gathered from the mailing list post :) ) When a user requests the hash, only a 256 bit hash is returned (but I see in the code that a 384 byte hash is returned).
<@jnewbery> very good everyone! Yes, the question was a bit vague. The MuHash class returns a 3072 bit object to the client code. When that gets integrated into Bitcoin Core, it'd be hashed down to 256 bits before returning it to the end user
<@jnewbery> stacie: Storing a numerator and denominator would be an optimization. In the actual implementation we just store one Num3072 (the numerator/denominator)
<glozow> I try answer: security = we’re trying to prevent collisions. Usually we’d like something like 128 bits of security, i.e. O(2^128) time to break. Muhash incremental hashing also makes it possible to attack slightly more efficiently than brute force, i.e. in O(2^(2*sqrt(n)-1) time. So 2*sqrt(3072) - 1 ~= 128 bits of security
<stacie> I don’t have a better answer to why 3072 bits other than it’s best practice. Wagner’s Birthday Problem/k-sum paper shows that when MuHash uses a sufficiently large prime for the modulo step, it is provably secure under the discrete log assumption. Common guidelines state that 3072 bits is enough security.
<jesseposner> You could calculate a single object but if you don't need the homomorphic properties then there is no reason to use it because it's slow and complex when compared with sha256
<@jnewbery> again, the question was a bit vague. The idea here is that it doesn't make sense to precompute the muhash of individual transactions in the mempool, since the cached muhash of the transaction would be large
<@jnewbery> sipa: if you were pre-caching, perhaps you'd want to store 384 bytes and then have half the multiplications and no inverse on the critical path?
<@jnewbery> ok, next question has already been answered. What is the most expensive operation to carry out in the rolling hash? What can we do to reduce the number of times we need to carry out this operation?
<stacie> The most expensive operation to carry out in the rolling hash is computing the modular inverse. The rest of my answer to the rest is based on what I learned from the mailing list post, but as of 5 min ago I learned the actual implementation stores just one Num3072 for the numerator/denominator. To minimize the amount of times the inverse computation has to happen, the running hash is stored as a fraction. Newly
<stacie> created UTXOs (aka new elements in the set) are multiplied into the numerator, and spent UTXOs (aka elements needing removal from the set) are multiplied into the denominator. The inverse operation is only performed when a final hash is needed.
<lightlike> In the python implementation, the numerator and denominator are part of the internal state, while here, it seems that the user is responsible for keeping track of them. Why the difference?
<lightlike> glozow: i thought the user would have a local var for both, and multiply the right one for adding/removing, and only doing division at the end - like it is done in crypto_test.cpp unit test.
<@jnewbery> jesseposner: The limb arithmetic is to implement wide (3072 bit) integers. Python allows you to do arithmetic on arbitrarily large ints, so it's not needed in the Python implementation
<@jnewbery> lightlike: That's just for the unit test. In normal usage, the client code would just pass objects to the MuHash object to add and subtract from the set
<lightlike> jnewbery: I thought the user would use the "/" operator as rarely as possible because it involves a costly "Inverse" call. instead, they would keep a local running variable for the denominator, and only divide numerator/denominator at the end.
<sipa> the MuHash3072 class in C++ is *just* a specialized 3072-bit bignum implementation; it's not really the full set hash scheme (as it excludes the ChaCha20 and SHA256 too)
<sipa> though in some cases you don't actually need the num/denom trick (if you're computing the muhash3072 from the utxo set in full, you're never deleting)
<@jnewbery> we have quite a lot of questions to get through, so let's keep moving. What public methods does the MuHash3072 object expose to clients? What is the Span<> class template that’s used in some of those public methods?
<sipa> it's a way of passing a (pointer to array, length of array) conveniently, so it works with any container that stores sequential elements of that type
<sipa> but it's a concrete data type, not a way of just making your function templated in the type of the passed object (which is an alternative that also works0
<willcl_ark> If you have BigInt support, you can use MuHash with fewer, larger limbs to make up your 3072 bits. Does this require fewer inverse/multiplications when computing the hash?
<sipa> willcl_ark: not fewer inverses, but each inverse (and 3072-bit multiplication) consists of 4x fewer limb multiplications if they're twice as big
<willcl_ark> Looks like it must be initialised with a Span of type `unsigned char` containing one 32B key otherwise `assert(key32.size() == INPUT_SIZE)` will trigger
<sipa> lightlike: there is a name for this technique, but if you want to compute (x mod (2^N - C)), where x is up to 2N bits, you can observe that it's equal to (x_low) + (x_high * C), where _low and _high are the bottom and upper half of x
<sipa> exactly, it's the same here: by having a modulus close to a power of two, we can reduce by multiplying the top half by C, and adding to the lower half
<@jnewbery> ok, we're almost out of time, so I'm going to skip to the last question. How is this new code tested? Can you think of other ways that it could be tested?
<@jnewbery> Where the python implementation and c++ implementation are tested against the same random input and checked that they arrive at the same result
<thomasb06> (order of a group G, say o(G), implies that for all g in G, g * g * ... * g done o(G) times always gives the neutral element of G, say 1 with multiplicative notation. But by definition, the order of a group is the number of element of the group. The g * g * ... * g = 1 is a consequence. For example the group of symmetries that keep a cube invariant is of order 48, that is there are 48 symmetries in the group. Bu
<thomasb06> it takes g * g * ... * g done 48 times to get the neutral element, here the identity transformation. If you consider only the rotational symemetries, the group is of order 24 only.)
<jesseposner> So if I'm getting this right, the modulus, 2^3072 - 1103717 (the largest 3072-bit safe prime number), is used to define a finite field. The order of that finite field is the modulus. However, the non-zero elements of the finite field form a multiplicative group, and thus the order of the group is (2^3072-1103717)-1 because it excludes the 0 element.
<jesseposner> It is equal to the number of elements in the finite field, but is not equal to the number of elements in the group (because the group excludes the zero element of the field).
<sipa> jesseposner: and the field we have here has an additive group (which we don't use!) of order MODULUS, and a multiplicative group of order MODULUS-1
<sipa> if MODULUS was not a prime, then (integers mod MODULUS) would not be a field (it would be a ring instead), and its multiplicative group would have an order less than MODULUS-1
<sipa> this is used in RSA, for example, where the ring (integers modulo p*q) is used; that ring has additive group of order p*q, but multiplicative group of order (p-1)*(q-1)
<sipa> the fact that the multiplicative group for us has order MODULUS-1 is used to compute inverses, btw: if x^(MODULUS-1)=1 for all x != 0, then x^(MODULUS-2) must be x^-1