Add MuHash3072 implementation (math and cryptography)

https://github.com/bitcoin/bitcoin/pull/19055

Host: jnewbery  -  PR author: fjahr

The PR branch HEAD was e19e500134 at the time of this review club meeting.

There was a previous review club meeting on PR 19055 when the PR included calculating the Muhash of the UTXO set. That review club session focused on the high-level concepts of using Muhash as a rolling hash for the UTXO set.

The scope of the PR has since been reduced to only include the implementation of the Muhash algorithm in C++. In this review club meeting, we’ll dig into the cryptographic code in detail.

Notes

  • This PR is an implementation of the Muhash algorithm, which was first described in the paper A New Paradigm for Collision-free Hashing: Incrementality at Reduced Cost by Bellare and Micciancio. Pieter Wuille wrote a mailing list post in 2017 on Rolling UTXO set hashes, which compared Muhash with Elliptic Curve Multiset Hash (another possible way of implementing rolling hashes). You should read Wuille’s mailing list post before starting to review the code. You can also look at the Bellare-Micciancio paper, but it’s more detail than you’ll need in order to review the implementation.

  • A Python implementation of the Muhash algorithm was merged in PR 19105 last month. We discussed that in a previous PR review club meeting. As you review the new code, you may find it helpful to compare it with the Python implementation. Python’s built-in support for bignums and modular inverses are much easier to follow than the optimized C++ code.

  • The new code is in the src/crypto directory, which also includes implementations of other frequently used cryptographic functions. Take a look at the SHA256, SHA3 and SipHash implementations, and you’ll notice some similarities in the way the interfaces are designed.

Questions

Specification

  1. How much state is stored inside the MuHash3072 rolling hash object? How much data is returned to a user requesting the set hash?

  2. Why was 3072 bits chosen as the size of the group?

  3. Can the Muhash of a single object (eg a transaction) be calculated and cached? Would we do this in practice?

  4. What is the most expensive operation to carry out in the rolling hash? What can we do to reduce the number of times we need to carry out this operation?

  5. How can we test for membership in the Muhash set?

Implementation

  1. What public methods does the MuHash3072 object expose to clients? What is the Span<> class template that’s used in some of those public methods?

  2. What does the #ifdef HAVE___INT128 code in muhash.h do?

  3. How is a MuHash3072 object constructed and initialized? What happens if the ChaCha20 output is larger than the modulus?

  4. What happens if the result of a multiplication or division is larger than the modulus?

  5. The Finalize() method has a comment “Does not change this object’s value.” Why is the function not marked const?

  6. In some of the multiplication helper functions, we see lines like:

    c1 = t >> LIMB_SIZE;

    e.g. here.

    What are those lines doing?

  7. In some of the helper functions, we see some ternary operators like:

    th += (c0 < tl) ? 1 : 0;

    e.g. here.

    What is th here? Why does it need to be incremented by 1 if c0 < tl?

  8. Both the Multiply() and Square() functions have the following code at the end of the function:

        /* Perform a potential third reduction. */
        if (c0) FullReduce(in_out);
    

    Why is that necessary? What is it doing?

  9. Did you review the Inverse() function? Did AJ’s comment help?

  10. How is this new code tested? Can you think of other ways that it could be tested?

Meeting Log

  117:00 <@jnewbery> #startmeeting
  217:00 <@jnewbery> Hi folks! Welcome to Bitcoin Core PR Review Club.
  317:00 <emzy> hi
  417:00 <felixweis> hi!
  517:00 <kanzure> hi
  617:00 <glozow> hi jnewbery!
  717:00 <stacie> hi
  817:00 <willcl_ark> hi
  917:00 <jesseposner> hi
 1017:00 <@jnewbery> Feel free to say hi to let us all know you're here.
 1117:00 <fjahr> hi
 1217:00 <blueskies> hi
 1317:00 <elle> hi!
 1417:00 <sipa> hi
 1517:00 <lightlike> hi
 1617:00 <michaelfolkson> hi
 1717:00 <jcg> hi
 1817:00 <@jnewbery> Anyone here for the first time?
 1917:00 <buzz08> hi
 2017:01 <blueskies> first time for me
 2117:01 <jcg> me
 2217:01 <@jnewbery> welcome blueskies and jcg :)
 2317:01 <jcg> thanks
 2417:01 <blueskies> thank you :)
 2517:01 <@jnewbery> notes and questions are in the normal place: https://bitcoincore.reviews/19055-2
 2617:02 <@jnewbery> A reminder of the format: I've prepared some questions to guide the discussion a bit, but feel free to jump in at any point if you have a question
 2717:02 <@jnewbery> All questions are welcome. This is a place for us to all learn from each other
 2817:02 <gentile> hi
 2917:03 <@jnewbery> ok, who had a chance to review the PR this week? (y/n)
 3017:03 <jesseposner> y
 3117:03 <felixweis> y
 3217:03 <willcl_ark> y
 3317:03 <jcg> y
 3417:03 <blueskies> n
 3517:03 <michaelfolkson> y
 3617:03 <glozow> 0.5y
 3717:03 <stacie> 50% - got through the concept/spec, but not so much the code
 3817:03 <emzy> n
 3917:03 <lightlike> y
 4017:03 <buzz08> n
 4117:04 <elle> y-ish
 4217:04 <jonatack> hi
 4317:04 <@jnewbery> that's great. Any first impressions? I really enjoyed getting stuck into the low level cryptographic code. It's not something I look at very much
 4417:04 <willcl_ark> some of the operations def appear quite inaccessible at first look
 4517:05 <glozow> i finally got to use math
 4617:05 <@jnewbery> willcl_ark: I agree. Definitely took a lot of thinking to work out what's going on
 4717:05 <@jnewbery> First question: How much state is stored inside the MuHash3072 rolling hash object? How much data is returned to a user requesting the set hash?
 4817:05 <felixweis> yes i spend quite some more time on the python implementation because its a lot easier to understand
 4917:05 <willcl_ark> but they get a _bit_ better once you try to match them up with the crypto scheme itself
 5017:05 <@jnewbery> felixweiss: yeah, it's nice that python just does bignums and modular inverses for you
 5117:06 <glozow> Just a Num3072 which is an array of “limbs” (represented as uints size depending on what system supports) for a total = 3072 bits. Output is a 384b hash
 5217:06 <stacie> The rolling hash object stores 2 pieces of information - (1) the product of hashes for all the elements that have entered the set, and (2) the product of hashes for all elements that have been removed from the set. (at least that's what I gathered from the mailing list post :) ) When a user requests the hash, only a 256 bit hash is returned (but I see in the code that a 384 byte hash is returned).
 5317:06 <willcl_ark> looks like MuHash3072 stores 3072 bits and returns a 256 bit hash to the user
 5417:06 <jesseposner> Finalize returns a 384-byte hash, however, the spec calls for a compression step that reduces it to 256 bits
 5517:06 <willcl_ark> TBH I was confused by the user bit -- the C++ code looked like 384 bit hash, but the python code looked like a 256 bit hash
 5617:07 <sipa> 384 *bytes*
 5717:07 <sipa> aka 3072 bits
 5817:07 <sipa> which is the observable state
 5917:07 <sipa> the hash of that is the output hash
 6017:07 <felixweis> wasn't there both a numerator and denumerator before finalization?
 6117:08 <sipa> yes, but those aren't observable
 6217:08 <@jnewbery> very good everyone! Yes, the question was a bit vague. The MuHash class returns a 3072 bit object to the client code. When that gets integrated into Bitcoin Core, it'd be hashed down to 256 bits before returning it to the end user
 6317:08 <sipa> it's just an optimization to delay divisions/inverses as long as possible
 6417:08 <felixweis> which is quite smart considering how much slower div is to mul
 6517:09 <sipa> so instead of computing a1/d1*a2/d2*a3/d3 (for 3 adds and 3 deletes), it's computing (a1*a2*a3)/(d1*d2*d3)
 6617:09 <willcl_ark> ^ which is the answer to a later question IIRC
 6717:09 <sipa> sorry.
 6817:09 <@jnewbery> stacie: Storing a numerator and denominator would be an optimization. In the actual implementation we just store one Num3072 (the numerator/denominator)
 6917:09 <@jnewbery> Why was 3072 bits chosen as the size of the group?
 7017:09 <jesseposner> 3072-bit DL has about 128 bits of
 7117:10 <jesseposner> security
 7217:10 <jesseposner> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-May/014337.html
 7317:10 <glozow> I try answer: security = we’re trying to prevent collisions. Usually we’d like something like 128 bits of security, i.e. O(2^128) time to break. Muhash incremental hashing also makes it possible to attack slightly more efficiently than brute force, i.e. in O(2^(2*sqrt(n)-1) time. So 2*sqrt(3072) - 1 ~= 128 bits of security
 7417:10 <@jnewbery> (that's question 2, not a question for sipa :) )
 7517:10 <CD36> I am new to this code base. I am just curious if you guys thought of using ISO CPP guideline and Google Cpp Style Guide.
 7617:10 <willcl_ark> 3072-bit DL has about 128 bits of security when used modulo a sufficiently large prime or an elliptic curve group
 7717:10 <stacie> I don’t have a better answer to why 3072 bits other than it’s best practice. Wagner’s Birthday Problem/k-sum paper shows that when MuHash uses a sufficiently large prime for the modulo step, it is provably secure under the discrete log assumption. Common guidelines state that 3072 bits is enough security.
 7817:10 <sipa> in ECC 256-bit is enough for 128 bits of security
 7917:10 <jesseposner> and 128 bits of security is sufficient until the next revolutionary breakthrough in either mathematics or technology
 8017:11 <@jnewbery> jesseposner glozow willcl_ark stacie: yes!
 8117:11 <sipa> in modulo prime groups, 3072 bits is needed
 8217:11 <sipa> jesseposner: also, bitcoin has 128-bit security for lots of things already - it doesn't make sense to target higher
 8317:11 <@jnewbery> 3. Can the Muhash of a single object (eg a transaction) be calculated and cached? Would we do this in practice?
 8417:11 <stacie> jnewbery ah! noted. I thought the numerator/denominator thing was so cool
 8517:11 <willcl_ark> sipa: is that because of the RIPMD160?
 8617:11 <michaelfolkson> Yes. No
 8717:12 <sipa> willcl_ark: no... secp256k1 has 128 bit security, and 256-bit hashes for transactions have 128 bit collision resistance
 8817:12 <willcl_ark> jnewbery: because we can add or delete in any order, there's no need to cache individual elements
 8917:12 <@jnewbery> CD36: That's a bit off topic for now, but you can find our style guide at https://github.com/bitcoin/bitcoin/blob/master/doc/developer-notes.md#coding-style-general
 9017:12 <jesseposner> You could calculate a single object but if you don't need the homomorphic properties then there is no reason to use it because it's slow and complex when compared with sha256
 9117:12 <jonatack> CD36: see https://github.com/bitcoin/bitcoin/blob/master/doc/developer-notes.md ... the guides you refer to are indeed sometimes referenced by reviewers
 9217:12 <michaelfolkson> Caches are large
 9317:13 <michaelfolkson> (768 bytes)
 9417:13 <sipa> willcl_ark: P2SH has only 80 bits of collision resistance which matters for some uses, which is why P2WSH only has a 256-bit script hash mode
 9517:13 <willcl_ark> sipa: thanks, that's very helpful!
 9617:14 <@jnewbery> michaelfolkson: right (althought with this implementation, it'd be 384 bytes since we're not storing numerator and denominator)
 9717:14 <sipa> it depends on whether you'd be willing to do the inverse for every transactions
 9817:15 <@jnewbery> again, the question was a bit vague. The idea here is that it doesn't make sense to precompute the muhash of individual transactions in the mempool, since the cached muhash of the transaction would be large
 9917:15 <sipa> yeah, 384 or 768, it's both pretty big compared to average transaction-in-memory sizes
10017:16 <@jnewbery> under a different scheme like ECMH, the cache would only be 64 bytes per object
10117:16 <michaelfolkson> "slow and complex when compared with sha256" This isn't the case I don't think. Or at least not specific jesseposner
10217:16 <sipa> michaelfolkson: it is many times slower than just SHA256'ing the whole UTXO set
10317:17 <@jnewbery> sipa: if you were pre-caching, perhaps you'd want to store 384 bytes and then have half the multiplications and no inverse on the critical path?
10417:17 <sipa> jnewbery: or you'd use ECMH instead :)
10517:17 <@jnewbery> indeed
10617:17 <@jnewbery> ok, next question has already been answered. What is the most expensive operation to carry out in the rolling hash? What can we do to reduce the number of times we need to carry out this operation?
10717:18 <@jnewbery> (partly answered)
10817:18 <jesseposner> the modular inverse is the most expensive operation so we wait to compute an inverse until the final hash is needed
10917:18 <willcl_ark> calculating the modular inverse
11017:18 <stacie> The most expensive operation to carry out in the rolling hash is computing the modular inverse. The rest of my answer to the rest is based on what I learned from the mailing list post, but as of 5 min ago I learned the actual implementation stores just one Num3072 for the numerator/denominator. To minimize the amount of times the inverse computation has to happen, the running hash is stored as a fraction. Newly
11117:18 <stacie> created UTXOs (aka new elements in the set) are multiplied into the numerator, and spent UTXOs (aka elements needing removal from the set) are multiplied into the denominator. The inverse operation is only performed when a final hash is needed.
11217:18 <michaelfolkson> That is very quick typing stacie
11317:18 <@jnewbery> jesseposner willcl_ark stacie: exactly right!
11417:18 * michaelfolkson needs time to read
11517:19 <lightlike> In the python implementation, the numerator and denominator are part of the internal state, while here, it seems that the user is responsible for keeping track of them. Why the difference?
11617:19 <jesseposner> I was wondering about that as well
11717:19 <glozow> lightlike what do you mean user is responsible for keeping track?
11817:19 <@jnewbery> lightlike: what do you mean by the user keeping track of them? The user just passes objects to add to and remove from the set
11917:19 <stacie> michaelfolkson haha I did (some) of my homework this time ;) but I'm reaching the end of my answer sheet :P
12017:20 <jesseposner> the numerator/denominator logic seems like it maps to the limb arithmetic
12117:21 <lightlike> glozow: i thought the user would have a local var for both, and multiply the right one for adding/removing, and only doing division at the end - like it is done in crypto_test.cpp unit test.
12217:21 <sipa> i think the reason is just that in python the bignum arithmetic is trivial, and doesn't deserve a class on its own - it's just a number
12317:21 <willcl_ark> I would also say I was surprised to see the optimisation left on the table for follow up; was it much harder to implement?
12417:21 <@jnewbery> jesseposner: The limb arithmetic is to implement wide (3072 bit) integers. Python allows you to do arithmetic on arbitrarily large ints, so it's not needed in the Python implementation
12517:21 <felixweis> my guess is limb arithmetic is used because unlike python it doesn't do infinitly large integers
12617:22 <sipa> so the data structure there is a higher-level thing doing something more high-level
12717:22 <jesseposner> jnewbery: ah that makes sense, thanks!
12817:22 <sipa> in the C++ code, MuHash3072 is really just a specialized 3072-bit bignum implementation
12917:22 <@jnewbery> lightlike: That's just for the unit test. In normal usage, the client code would just pass objects to the MuHash object to add and subtract from the set
13017:22 <sipa> if we'd want you could write a higher-level wrapper around that for the actual multiset hash scheme
13117:23 <@jnewbery> Next question: How can we test for membership in the Muhash set?
13217:23 <glozow> nope
13317:23 <willcl_ark> You can't?
13417:23 <michaelfolkson> Trick question
13517:23 <jesseposner> We can't. In fact, it's not a set, but rather a hash representing a set: https://github.com/bitcoin/bitcoin/pull/19105/files#r438847029
13617:23 <@jnewbery> very good. None of you fell for the trick question
13717:23 <sipa> it's trivial, you recompute it from the entire set ;)
13817:23 <jesseposner> :-)
13917:23 <lightlike> jnewbery: I thought the user would use the "/" operator as rarely as possible because it involves a costly "Inverse" call. instead, they would keep a local running variable for the denominator, and only divide numerator/denominator at the end.
14017:24 <michaelfolkson> Though sipa did say in the mailing list post that it doesn't support compact proofs of existence
14117:24 <sipa> Here's why to do if you want to compactly test for membership in MuHash: lie down, cry, cry a lot
14217:24 <@jnewbery> yes, this is a hashed representation of a set, not an accumulator
14317:24 <felixweis> so for c++ we do the combination of numerator and denominator later outside of 2 MuHash3072 instances
14417:25 <@jnewbery> lightlike: that'd be quite an imposition to place on the client code I think
14517:25 <lightlike> jnewbery: but otherwise, if we'd call "/" immediately, it seem really inefficient
14617:25 <sipa> that's what the C++ code does, though?
14717:26 <sipa> let the user maintain numerator and denominator explicitly
14817:26 <@jnewbery> It'd be trivial to enhance the implementation to store both the numerator and denominator
14917:26 <fjahr> I am doing it in the implementation of the index right now, yes
15017:26 <@jnewbery> sipa: oh! Is that how it's supposed to be used?
15117:26 <sipa> jnewbery: yes
15217:26 <willcl_ark> lightbulb.gif
15317:27 <sipa> it would be totally reasonable to provide a higher-level class that does that for you
15417:27 <sipa> matching the Python code
15517:27 <@jnewbery> lightlike: I apologise. You were right. I haven't looked at how this code is integrated into the UTXO index
15617:27 <sipa> the MuHash3072 class in C++ is *just* a specialized 3072-bit bignum implementation; it's not really the full set hash scheme (as it excludes the ChaCha20 and SHA256 too)
15717:28 <sipa> ah no, it does the ChaCha20'ing, but not the hashing
15817:29 <willcl_ark> a little confusingly-named if it's not the full implementation, but makes a lot of sense now
15917:29 <sipa> it may make sense to refactor that
16017:29 <sipa> into a Num3072 class that's just the numbers, and MuHash3072 that does num/denom/chacha20/sha256
16117:29 <willcl_ark> what would you call the higher-level class otherwise
16217:29 <willcl_ark> I think that would be nice
16317:29 <gentile> agreed
16417:30 <@jnewbery> sipa: Don't we already have that? Num3072 is the bignum class
16517:30 <jesseposner> +1
16617:30 <sipa> jnewbery: eh, it's just the storage without operations
16717:30 <sipa> but good point
16817:31 <@jnewbery> so you're saying move all the functions into class methods in Num3072?
16917:31 <PatrickLemke> sipa Is there a specific reason why you implemented your own bignum support for C++?
17017:31 <sipa> though in some cases you don't actually need the num/denom trick (if you're computing the muhash3072 from the utxo set in full, you're never deleting)
17117:31 <sipa> PatrickLemke: avoiding a dependency for something that's possibly consensus critical at some point
17217:32 <@jnewbery> we have quite a lot of questions to get through, so let's keep moving. What public methods does the MuHash3072 object expose to clients? What is the Span<> class template that’s used in some of those public methods?
17317:32 <willcl_ark> The *= and /= operators
17417:32 <jesseposner> *=, /=, Finalize
17517:32 <nehan> Span is sort of like a slice in Go. It points to some range in another datastructure
17617:33 <nehan> hi
17717:33 <felixweis> i wasn't quite sure about the span because the INPUT_SIZE is supposedly fixed
17817:33 <@jnewbery> hi nehan!
17917:33 <felixweis> makes sense if we hash in the instanciation tho
18017:33 <nehan> i wasn't clear on why Span either
18117:33 <@jnewbery> yes, we have multiplication and division operators, a constructor and a Finalize() function
18217:34 <willcl_ark> A span is just a template for a sequence of value of the same type I think
18317:34 <@jnewbery> We talked about Spans in a previous review club meeting: https://bitcoincore.reviews/18468
18417:34 <sipa> willcl_ark: it's not a template
18517:34 <sipa> it's a way of passing a (pointer to array, length of array) conveniently, so it works with any container that stores sequential elements of that type
18617:35 <@jnewbery> they're a lightwight, non-owning reference to some contiguous sequence of objects (bytes in this case)
18717:35 <sipa> but it's a concrete data type, not a way of just making your function templated in the type of the passed object (which is an alternative that also works0
18817:35 <jonatack> A Span represents a vector-like view to a range of contiguous elements in memory analogous to std::span in C++20
18917:35 <jonatack> (one more definition ;)
19017:36 <@jnewbery> Next question. What does the #ifdef HAVE___INT128 code in muhash.h do?
19117:36 <jonatack> they're becoming quite ubiquitous in the codebase
19217:36 <glozow> HAVE___INT128 = whether your system supports 128b integers?
19317:36 <willcl_ark> If you have BigInt support, you can use MuHash with fewer, larger limbs to make up your 3072 bits. Does this require fewer inverse/multiplications when computing the hash?
19417:37 <@jnewbery> willcl_ark: not fewer inverse/multiplications, but each inverse/multiplication requires fewer operations
19517:37 <glozow> I imagine bigger integers = fewer operations = more optimized ?
19617:37 <@jnewbery> glozow: right
19717:37 <jesseposner> the HAVE___INT128 branch is optimized for 64-bit multiplication hardware (https://github.com/bitcoin/bitcoin/pull/19055/files#r507238291)
19817:37 <willcl_ark> jnewbery: ah right, thanks
19917:37 <glozow> How would one test consistency between HAVE___INT128 and !HAVE___INT128 platforms?
20017:38 <sipa> willcl_ark: not fewer inverses, but each inverse (and 3072-bit multiplication) consists of 4x fewer limb multiplications if they're twice as big
20117:38 <felixweis> https://www.youtube.com/watch?v=FRkJCvHWdwQ from 45:25 has a good short overview of <span> in c++20. Span behaves similar
20217:39 <@jnewbery> fjahr helpfully did some benchmarking of the performance difference between using int128 and not: https://github.com/bitcoin/bitcoin/pull/19055#discussion_r508093977
20317:39 <syrex> Which C++ standard is used?
20417:39 <@jnewbery> glozow: very good question. What do people think?
20517:39 <willcl_ark> That's a decent speedup
20617:40 <nehan> well you need the hardware, but presumably you could #define it off and compare the results in a test?
20717:40 <sipa> syrex: bitcoin core master is C++11 right now, but we'll transition to C++17 probably in the next few weeks (after 0.21 branch off)
20817:40 <jonatack> syrex: c++11, migration to 17 is planned during the next release or so.
20917:41 <@jnewbery> nehan: yeah, that sounds sensible. It'd be quite nice to do it automatically though
21017:41 <@jnewbery> 4. How is a MuHash3072 object constructed and initialized? What happens if the ChaCha20 output is larger than the group order?
21117:41 <willcl_ark> glozow: perhaps you exercise each mathmatical operation with and without it using some edge-sized values?
21217:42 <michaelfolkson> You take the output modulo the group order..?
21317:43 <willcl_ark> Looks like it must be initialised with a Span of type `unsigned char` containing one 32B key otherwise `assert(key32.size() == INPUT_SIZE)` will trigger
21417:43 <sipa> michaelfolkson: you're confused; there is no group here (except the multiplicative group we're working in)
21517:43 <@jnewbery> michaelfolkson: that's a good guess but it's not the right answer
21617:44 <sipa> it's just taken modulo the modulus
21717:44 <sipa> that modulus is not the order of some other group
21817:44 <@jnewbery> sipa: I don't understand
21917:45 <sipa> MuHash3072 is doing arithmetic in the field of integers modulo 2^3072 - 1103717
22017:45 <sipa> in particular, the multiplication subgroup of that field is what we use
22117:46 <sipa> but 2^3072 - 1103717 isn't the order of some other group (like scalar arithmetic is arithmetic modulo the order of an EC group)
22217:46 <sipa> it's just a constant we've chosen
22317:46 <jesseposner> This comment refers to the "order of the group" (perhaps it should be revised?): https://github.com/bitcoin/bitcoin/pull/19055/files#diff-ccad840af4d1bda6dda986297fdd142a8cf433cd4ab4222eea20fe1fd229a158R16
22417:47 <sipa> no, that's correct
22517:47 <sipa> actually, it's not
22617:47 * Murch is waiting for sipa's computing process to halt
22717:47 <sipa> it is the order of the additive group of integers modulus that number, trivially
22817:48 <sipa> but that additive group isn't relevant to us here; we only multiply and divide
22917:48 <sipa> it should just say "is chosen as modulus"
23017:48 <fjahr> noted :)
23117:48 <@jnewbery> ah! I think that's what's confused me
23217:48 <sipa> jesseposner: thanks for pointing that out, i guess past-me made the same mistake
23317:49 <lightlike> hmm, where exactly is the modulus taken? No "%" exists in all of muhash.cpp
23417:49 <sipa> lightlike: thankfully! modulus operations are slow
23517:49 <@jnewbery> ok, so let me revise my question. What happens if the output of the chacha20 is greater than the modulus?
23617:49 <sipa> lightlike: like 100x slower than a multiplication
23717:50 <sipa> lightlike: and the modulus is computed inside the Multiply and Square functions, simultaneously with the multiplication
23817:50 <lightlike> sipa: ah, thanks!
23917:51 <gentile> 9 minutes left
24017:51 <@jnewbery> right, so if the output of the chacha20 is greater than the modulus, we don't do anything, because we reduce in the Finalize() function
24117:51 <sipa> lightlike: there is a name for this technique, but if you want to compute (x mod (2^N - C)), where x is up to 2N bits, you can observe that it's equal to (x_low) + (x_high * C), where _low and _high are the bottom and upper half of x
24217:52 <@jnewbery> Next question: The Finalize() method has a comment “Does not change this object’s value.” Why is the function not marked const?
24317:52 <felixweis> I probably have to read this 2^N times to understand it...
24417:53 <glozow> Wale, the effective value is never changed, but Finalize might call FullReduce() in case IsOverflow(), which mutates the Num3072 object?
24517:53 <@jnewbery> glozow: exactly right!
24617:53 <sipa> felixweis: what is 1437 mod 99?
24717:53 <sipa> quick
24817:54 <Murch> 51?
24917:54 <sipa> indeed, why?
25017:54 <Murch> 37+14
25117:54 <sipa> bingo; 1400 mod 99 = 14
25217:54 <glozow> 🤯
25317:54 <sipa> what is 1437 mod 97?
25417:54 <@jnewbery> FullReduce() doesn't change the value modulo 2^3072 - 1103717, but it may change the internal Num3072 representation
25517:55 <Murch> 79
25617:55 <sipa> yup, why?
25717:55 <glozow> i needed some time to multiply 14*3 tho
25817:55 <sipa> :D
25917:55 <Murch> 3*14+37 < 97
26017:55 <sipa> exactly, it's the same here: by having a modulus close to a power of two, we can reduce by multiplying the top half by C, and adding to the lower half
26117:56 <@jnewbery> ok, we're almost out of time, so I'm going to skip to the last question. How is this new code tested? Can you think of other ways that it could be tested?
26217:56 <felixweis> interesting
26317:56 <felixweis> TIL
26417:57 <glozow> we can test by having sipa do it in his head and compare with the code's results
26517:57 <buzz08> :-D
26617:57 <sipa> or use the python code
26717:57 <sipa> it's faster
26817:58 <@jnewbery> sipa: good answer!
26917:58 <michaelfolkson> There are some unit test cases and some fuzzing.
27017:58 <buzz08> sipa: good comeback
27117:58 <@jnewbery> I think it'd be really cool if we integrated something like this: https://github.com/bitcoin/bitcoin/pull/19841#issuecomment-687667841
27217:59 <@jnewbery> Where the python implementation and c++ implementation are tested against the same random input and checked that they arrive at the same result
27317:59 <buzz08> a functional test case with sample I/O values ?
27417:59 <Murch> jnewbery: Of course that only proves that they are coming to the same result, not that the result is correct ;)
27518:00 <willcl_ark> Murch: we only need consensus though, right :P
27618:00 <@jnewbery> buzz08: yeah, we do have test vectors, but it's also nice to have more random coverage
27718:00 <Murch> willcl_ark: touche
27818:00 <@jnewbery> ok, that's time. Thanks everyone! Hope you enjoyed looking at some lower level code this week
27918:00 <sipa> if there is a bug in this code, it'll likely result in different results on 32-bit and 64-bit code
28018:00 <willcl_ark> thanks jnewbery, sipa!
28118:01 <stacie> thanks for hosting jnewbery!
28218:01 <felixweis> great thanks jnewbery for hosting :-)
28318:01 <glozow> thanks jnewbery!
28418:01 <buzz08> great info guys, thanks to all
28518:01 <nehan> thanks!
28618:01 <blueskies> this was fascinating. thank you all!
28718:01 <fjahr> thanks jnewbery :)
28818:01 <thomasb06> (order of a group G, say o(G), implies that for all g in G, g * g * ... * g done o(G) times always gives the neutral element of G, say 1 with multiplicative notation. But by definition, the order of a group is the number of element of the group. The g * g * ... * g = 1 is a consequence. For example the group of symmetries that keep a cube invariant is of order 48, that is there are 48 symmetries in the group. Bu
28918:01 <thomasb06> it takes g * g * ... * g done 48 times to get the neutral element, here the identity transformation. If you consider only the rotational symemetries, the group is of order 24 only.)
29018:01 <lightlike> thanks!
29118:01 <willcl_ark> Who is compiling the review-club compendium of factoids anyway
29218:02 <thomasb06> thanks jnewberry
29318:02 <michaelfolkson> "do you plan to add tests using the just-added Python MuHash3072 implementation?" jonatack
29418:02 <gentile> thanks jnewbery
29518:02 <elle> thanks!
29618:02 <sipa> thomasb06: groups can be larger than their order
29718:03 <michaelfolkson> Thanks jnewbery
29818:03 <jonatack> michaelfolkson: is that from my review last May?
29918:03 <fjahr> michaelfolkson: they are there in the follow-up pr
30018:03 <michaelfolkson> Yup
30118:03 <michaelfolkson> Ah nice fjahr
30218:03 <sipa> thomasb06: the addition group of GF(2^2) has order 2, but size 4
30318:03 <emzy> Tnx anyone
30418:03 <jonatack> thanks jnewbery and fjahr, I'll re-review completely after branch-off
30518:04 <jonatack> I have a fun favor to ask while you are here: if you haven't seen it yet, check out this tweet by Murch https://twitter.com/murchandamus/status/1318898781618917378
30618:04 <jonatack> and cast your vote in these two twitter polls about bitcoin feerate units https://twitter.com/jonatack/status/1318890833131823104
30718:05 <thomasb06> sipa well, as far as I remember, the order was the number of element when talking of a group
30818:05 <michaelfolkson> Surely for users it has to be sat/VB... (don't want to swing poll)
30918:05 <Murch> jonatack: So far people seem to be fairly unambiguous
31018:05 <thomasb06> (my computer is swaping)
31118:06 <jonatack> Murch: the first poll, definitely
31218:08 <jonatack> michaelfolkson: seems so but i'm unsure how much of that preference is habit or unfamiliarity. for now proceeding with sat/vB
31318:09 <michaelfolkson> Right thomasb06
31418:10 <michaelfolkson> https://en.wikipedia.org/wiki/Order_(group_theory)
31518:12 <thomasb06> hehe, the memories were not bad. It starts to be far though...
31618:16 <michaelfolkson> I've been flailing around in EC world too recently. Have to remember which world you are operating in
31718:18 <glozow> wait so group order != size?
31818:19 <thomasb06> glozow: when talking of the set, no
31918:19 <thomasb06> but in the case of sepc256k1, the group is cyclic so it's the same
32018:20 <thomasb06> G = <g>
32118:20 <thomasb06> so o(G) = o(g)
32218:22 <thomasb06> michaelfolkson: yes, the ECs are defined over finite fields, it's always confusing
32318:24 <glozow> the group = the underlying set + the binary operation, in this case mod multiplication?
32418:26 <thomasb06> glozow: leave the mod multiplication, takes the symmetries that keep a cube invariant. It's a simple group
32518:26 <thomasb06> *take
32618:30 <sipa> glozow: the group we're working over here is the multiplication modulo 2^3072-1103717
32718:30 <sipa> the _order_ of that group is (2^3072-1103717)-1, as it excludes the 0 element
32818:30 <sipa> (which is not considered part of the multiplicative group)
32918:31 <sipa> so the integers modulo P=2^3072-1103717 has P elements, but only P-1 of those participate in the multiplicative group
33018:32 <sipa> that also means that if you take any number (except 0), and raise it to the power P-1, modulo P, you get 1
33118:32 <sipa> which is the identity in that group
33218:32 <sipa> as it's a cyclic group, its size is equal to its order, but that order is _not_ the modulus
33318:39 <glozow> sipa that makes sense, thank you
33419:04 <jesseposner> So if I'm getting this right, the modulus, 2^3072 - 1103717 (the largest 3072-bit safe prime number), is used to define a finite field. The order of that finite field is the modulus. However, the non-zero elements of the finite field form a multiplicative group, and thus the order of the group is (2^3072-1103717)-1 because it excludes the 0 element.
33519:06 <thomasb06> jesseposner: it sounds good but I don't know what a modulus is... The number of elements?
33619:07 <jesseposner> It is equal to the number of elements in the finite field, but is not equal to the number of elements in the group (because the group excludes the zero element of the field).
33719:07 <thomasb06> jesseposner: order is for groups rather, so a finite field has a modulus but no order
33819:08 <jesseposner> I believe both a group and a finite field have an order.
33919:08 <thomasb06> then, the field is of modulus 2^3072 - 1103717
34019:08 <jesseposner> yes
34119:09 <thomasb06> and the invertibles are a group of order 2^3072 - 1103717 - 1
34219:09 <thomasb06> (not used to the order term for fields though)
34319:10 <jesseposner> I believe so
34419:11 <sipa> jesseposner: yes, "order of a field" is really "order of its additive group"
34519:11 <sipa> jesseposner: and the field we have here has an additive group (which we don't use!) of order MODULUS, and a multiplicative group of order MODULUS-1
34619:11 <sipa> that is the case only because MODULUS is a prime
34719:12 <sipa> if MODULUS was not a prime, then (integers mod MODULUS) would not be a field (it would be a ring instead), and its multiplicative group would have an order less than MODULUS-1
34819:12 <jesseposner> ah, interesting!
34919:13 <sipa> this is used in RSA, for example, where the ring (integers modulo p*q) is used; that ring has additive group of order p*q, but multiplicative group of order (p-1)*(q-1)
35019:13 <sipa> and security relies on attackers not being able to figure out the order (p-1)*(q-1) given just p*q
35119:13 <sipa> (obviously, if you have p and q individually, that's trivial, but they don't)
35219:15 <sipa> the fact that the multiplicative group for us has order MODULUS-1 is used to compute inverses, btw: if x^(MODULUS-1)=1 for all x != 0, then x^(MODULUS-2) must be x^-1
35319:15 <sipa> just divide both sides by x