Add MuHash3072 implementation (
cryptography) Oct 21, 2020
The PR branch HEAD was e19e500134 at the time of this review club meeting.
There was a
previous review club meeting on PR 19055 when the PR
included calculating the Muhash of the UTXO set. That review club session
focused on the high-level concepts of using Muhash as a rolling hash for the
The scope of the PR has since been reduced to only include the implementation
of the Muhash algorithm in C++. In this review club meeting, we’ll dig into the
cryptographic code in detail.
This PR is an implementation of the Muhash algorithm, which was first described
in the paper
A New Paradigm for Collision-free Hashing: Incrementality at
Reduced Cost by Bellare
and Micciancio. Pieter Wuille wrote a mailing list post in 2017 on Rolling
which compared Muhash with Elliptic Curve Multiset Hash (another possible way
of implementing rolling hashes). You should read Wuille’s mailing list post
before starting to review the code. You can also look at the Bellare-Micciancio
paper, but it’s more detail than you’ll need in order to review the
A Python implementation of the Muhash algorithm was merged in
19105 last month. We
discussed that in a previous PR review club meeting. As you review
the new code, you may find it helpful to compare it with the Python
implementation. Python’s built-in support for bignums and modular inverses are
much easier to follow than the optimized C++ code.
The new code is in the
directory, which also includes implementations of other frequently used
cryptographic functions. Take a look at the SHA256, SHA3 and SipHash
implementations, and you’ll notice some similarities in the way the interfaces
are designed. Questions
How much state is stored inside the MuHash3072 rolling hash object? How much
data is returned to a user requesting the set hash?
Why was 3072 bits chosen as the size of the group?
Can the Muhash of a single object (eg a transaction) be calculated and cached?
Would we do this in practice?
What is the most expensive operation to carry out in the rolling hash? What can
we do to reduce the number of times we need to carry out this operation?
How can we test for membership in the Muhash set?
What public methods does the
MuHash3072 object expose to clients? What is the
Span<> class template that’s used in some of those public methods?
What does the
How is a
MuHash3072 object constructed and initialized? What happens if the
ChaCha20 output is larger than the modulus?
What happens if the result of a multiplication or division is larger than
Finalize() method has a comment “Does not change this object’s value.”
Why is the function not marked
In some of the multiplication helper functions, we see lines like:
c1 = t >> LIMB_SIZE;
What are those lines doing?
In some of the helper functions, we see some ternary operators like:
th += (c0 < tl) ? 1 : 0;
th here? Why does it need to be incremented by 1 if
Square() functions have the following code at
the end of the function:
/* Perform a potential third reduction. */
if (c0) FullReduce(in_out);
Why is that necessary? What is it doing?
Did you review the
Inverse() function? Did
How is this new code tested? Can you think of other ways that it could be
1 17:00 <@jnewbery> #startmeeting
2 17:00 <@jnewbery> Hi folks! Welcome to Bitcoin Core PR Review Club.
6 17:00 <glozow> hi jnewbery!
10 17:00 <@jnewbery> Feel free to say hi to let us all know you're here.
16 17:00 <michaelfolkson> hi
18 17:00 <@jnewbery> Anyone here for the first time?
20 17:01 <blueskies> first time for me
22 17:01 <@jnewbery> welcome blueskies and jcg :)
24 17:01 <blueskies> thank you :)
26 17:02 <@jnewbery> A reminder of the format: I've prepared some questions to guide the discussion a bit, but feel free to jump in at any point if you have a question
27 17:02 <@jnewbery> All questions are welcome. This is a place for us to all learn from each other
29 17:03 <@jnewbery> ok, who had a chance to review the PR this week? (y/n)
35 17:03 <michaelfolkson> y
37 17:03 <stacie> 50% - got through the concept/spec, but not so much the code
43 17:04 <@jnewbery> that's great. Any first impressions? I really enjoyed getting stuck into the low level cryptographic code. It's not something I look at very much
44 17:04 <willcl_ark> some of the operations def appear quite inaccessible at first look
45 17:05 <glozow> i finally got to use math
46 17:05 <@jnewbery> willcl_ark: I agree. Definitely took a lot of thinking to work out what's going on
47 17:05 <@jnewbery> First question: How much state is stored inside the MuHash3072 rolling hash object? How much data is returned to a user requesting the set hash?
48 17:05 <felixweis> yes i spend quite some more time on the python implementation because its a lot easier to understand
49 17:05 <willcl_ark> but they get a _bit_ better once you try to match them up with the crypto scheme itself
50 17:05 <@jnewbery> felixweiss: yeah, it's nice that python just does bignums and modular inverses for you
51 17:06 <glozow> Just a Num3072 which is an array of “limbs” (represented as uints size depending on what system supports) for a total = 3072 bits. Output is a 384b hash
52 17:06 <stacie> The rolling hash object stores 2 pieces of information - (1) the product of hashes for all the elements that have entered the set, and (2) the product of hashes for all elements that have been removed from the set. (at least that's what I gathered from the mailing list post :) ) When a user requests the hash, only a 256 bit hash is returned (but I see in the code that a 384 byte hash is returned).
53 17:06 <willcl_ark> looks like MuHash3072 stores 3072 bits and returns a 256 bit hash to the user
54 17:06 <jesseposner> Finalize returns a 384-byte hash, however, the spec calls for a compression step that reduces it to 256 bits
55 17:06 <willcl_ark> TBH I was confused by the user bit -- the C++ code looked like 384 bit hash, but the python code looked like a 256 bit hash
56 17:07 <sipa> 384 *bytes*
57 17:07 <sipa> aka 3072 bits
58 17:07 <sipa> which is the observable state
59 17:07 <sipa> the hash of that is the output hash
60 17:07 <felixweis> wasn't there both a numerator and denumerator before finalization?
61 17:08 <sipa> yes, but those aren't observable
62 17:08 <@jnewbery> very good everyone! Yes, the question was a bit vague. The MuHash class returns a 3072 bit object to the client code. When that gets integrated into Bitcoin Core, it'd be hashed down to 256 bits before returning it to the end user
63 17:08 <sipa> it's just an optimization to delay divisions/inverses as long as possible
64 17:08 <felixweis> which is quite smart considering how much slower div is to mul
65 17:09 <sipa> so instead of computing a1/d1*a2/d2*a3/d3 (for 3 adds and 3 deletes), it's computing (a1*a2*a3)/(d1*d2*d3)
66 17:09 <willcl_ark> ^ which is the answer to a later question IIRC
68 17:09 <@jnewbery> stacie: Storing a numerator and denominator would be an optimization. In the actual implementation we just store one Num3072 (the numerator/denominator)
69 17:09 <@jnewbery> Why was 3072 bits chosen as the size of the group?
70 17:09 <jesseposner> 3072-bit DL has about 128 bits of
71 17:10 <jesseposner> security
73 17:10 <glozow> I try answer: security = we’re trying to prevent collisions. Usually we’d like something like 128 bits of security, i.e. O(2^128) time to break. Muhash incremental hashing also makes it possible to attack slightly more efficiently than brute force, i.e. in O(2^(2*sqrt(n)-1) time. So 2*sqrt(3072) - 1 ~= 128 bits of security
74 17:10 <@jnewbery> (that's question 2, not a question for sipa :) )
75 17:10 <CD36> I am new to this code base. I am just curious if you guys thought of using ISO CPP guideline and Google Cpp Style Guide.
76 17:10 <willcl_ark> 3072-bit DL has about 128 bits of security when used modulo a sufficiently large prime or an elliptic curve group
77 17:10 <stacie> I don’t have a better answer to why 3072 bits other than it’s best practice. Wagner’s Birthday Problem/k-sum paper shows that when MuHash uses a sufficiently large prime for the modulo step, it is provably secure under the discrete log assumption. Common guidelines state that 3072 bits is enough security.
78 17:10 <sipa> in ECC 256-bit is enough for 128 bits of security
79 17:10 <jesseposner> and 128 bits of security is sufficient until the next revolutionary breakthrough in either mathematics or technology
80 17:11 <@jnewbery> jesseposner glozow willcl_ark stacie: yes!
81 17:11 <sipa> in modulo prime groups, 3072 bits is needed
82 17:11 <sipa> jesseposner: also, bitcoin has 128-bit security for lots of things already - it doesn't make sense to target higher
83 17:11 <@jnewbery> 3. Can the Muhash of a single object (eg a transaction) be calculated and cached? Would we do this in practice?
84 17:11 <stacie> jnewbery ah! noted. I thought the numerator/denominator thing was so cool
85 17:11 <willcl_ark> sipa: is that because of the RIPMD160?
86 17:11 <michaelfolkson> Yes. No
87 17:12 <sipa> willcl_ark: no... secp256k1 has 128 bit security, and 256-bit hashes for transactions have 128 bit collision resistance
88 17:12 <willcl_ark> jnewbery: because we can add or delete in any order, there's no need to cache individual elements
90 17:12 <jesseposner> You could calculate a single object but if you don't need the homomorphic properties then there is no reason to use it because it's slow and complex when compared with sha256
92 17:12 <michaelfolkson> Caches are large
93 17:13 <michaelfolkson> (768 bytes)
94 17:13 <sipa> willcl_ark: P2SH has only 80 bits of collision resistance which matters for some uses, which is why P2WSH only has a 256-bit script hash mode
95 17:13 <willcl_ark> sipa: thanks, that's very helpful!
96 17:14 <@jnewbery> michaelfolkson: right (althought with this implementation, it'd be 384 bytes since we're not storing numerator and denominator)
97 17:14 <sipa> it depends on whether you'd be willing to do the inverse for every transactions
98 17:15 <@jnewbery> again, the question was a bit vague. The idea here is that it doesn't make sense to precompute the muhash of individual transactions in the mempool, since the cached muhash of the transaction would be large
99 17:15 <sipa> yeah, 384 or 768, it's both pretty big compared to average transaction-in-memory sizes
100 17:16 <@jnewbery> under a different scheme like ECMH, the cache would only be 64 bytes per object
101 17:16 <michaelfolkson> "slow and complex when compared with sha256" This isn't the case I don't think. Or at least not specific jesseposner
102 17:16 <sipa> michaelfolkson: it is many times slower than just SHA256'ing the whole UTXO set
103 17:17 <@jnewbery> sipa: if you were pre-caching, perhaps you'd want to store 384 bytes and then have half the multiplications and no inverse on the critical path?
104 17:17 <sipa> jnewbery: or you'd use ECMH instead :)
105 17:17 <@jnewbery> indeed
106 17:17 <@jnewbery> ok, next question has already been answered. What is the most expensive operation to carry out in the rolling hash? What can we do to reduce the number of times we need to carry out this operation?
107 17:18 <@jnewbery> (partly answered)
108 17:18 <jesseposner> the modular inverse is the most expensive operation so we wait to compute an inverse until the final hash is needed
109 17:18 <willcl_ark> calculating the modular inverse
110 17:18 <stacie> The most expensive operation to carry out in the rolling hash is computing the modular inverse. The rest of my answer to the rest is based on what I learned from the mailing list post, but as of 5 min ago I learned the actual implementation stores just one Num3072 for the numerator/denominator. To minimize the amount of times the inverse computation has to happen, the running hash is stored as a fraction. Newly
111 17:18 <stacie> created UTXOs (aka new elements in the set) are multiplied into the numerator, and spent UTXOs (aka elements needing removal from the set) are multiplied into the denominator. The inverse operation is only performed when a final hash is needed.
112 17:18 <michaelfolkson> That is very quick typing stacie
113 17:18 <@jnewbery> jesseposner willcl_ark stacie: exactly right!
114 17:18 * michaelfolkson needs time to read
115 17:19 <lightlike> In the python implementation, the numerator and denominator are part of the internal state, while here, it seems that the user is responsible for keeping track of them. Why the difference?
116 17:19 <jesseposner> I was wondering about that as well
117 17:19 <glozow> lightlike what do you mean user is responsible for keeping track?
118 17:19 <@jnewbery> lightlike: what do you mean by the user keeping track of them? The user just passes objects to add to and remove from the set
119 17:19 <stacie> michaelfolkson haha I did (some) of my homework this time ;) but I'm reaching the end of my answer sheet :P
120 17:20 <jesseposner> the numerator/denominator logic seems like it maps to the limb arithmetic
121 17:21 <lightlike> glozow: i thought the user would have a local var for both, and multiply the right one for adding/removing, and only doing division at the end - like it is done in crypto_test.cpp unit test.
122 17:21 <sipa> i think the reason is just that in python the bignum arithmetic is trivial, and doesn't deserve a class on its own - it's just a number
123 17:21 <willcl_ark> I would also say I was surprised to see the optimisation left on the table for follow up; was it much harder to implement?
124 17:21 <@jnewbery> jesseposner: The limb arithmetic is to implement wide (3072 bit) integers. Python allows you to do arithmetic on arbitrarily large ints, so it's not needed in the Python implementation
125 17:21 <felixweis> my guess is limb arithmetic is used because unlike python it doesn't do infinitly large integers
126 17:22 <sipa> so the data structure there is a higher-level thing doing something more high-level
127 17:22 <jesseposner> jnewbery: ah that makes sense, thanks!
128 17:22 <sipa> in the C++ code, MuHash3072 is really just a specialized 3072-bit bignum implementation
129 17:22 <@jnewbery> lightlike: That's just for the unit test. In normal usage, the client code would just pass objects to the MuHash object to add and subtract from the set
130 17:22 <sipa> if we'd want you could write a higher-level wrapper around that for the actual multiset hash scheme
131 17:23 <@jnewbery> Next question: How can we test for membership in the Muhash set?
133 17:23 <willcl_ark> You can't?
134 17:23 <michaelfolkson> Trick question
136 17:23 <@jnewbery> very good. None of you fell for the trick question
137 17:23 <sipa> it's trivial, you recompute it from the entire set ;)
138 17:23 <jesseposner> :-)
139 17:23 <lightlike> jnewbery: I thought the user would use the "/" operator as rarely as possible because it involves a costly "Inverse" call. instead, they would keep a local running variable for the denominator, and only divide numerator/denominator at the end.
140 17:24 <michaelfolkson> Though sipa did say in the mailing list post that it doesn't support compact proofs of existence
141 17:24 <sipa> Here's why to do if you want to compactly test for membership in MuHash: lie down, cry, cry a lot
142 17:24 <@jnewbery> yes, this is a hashed representation of a set, not an accumulator
143 17:24 <felixweis> so for c++ we do the combination of numerator and denominator later outside of 2 MuHash3072 instances
144 17:25 <@jnewbery> lightlike: that'd be quite an imposition to place on the client code I think
145 17:25 <lightlike> jnewbery: but otherwise, if we'd call "/" immediately, it seem really inefficient
146 17:25 <sipa> that's what the C++ code does, though?
147 17:26 <sipa> let the user maintain numerator and denominator explicitly
148 17:26 <@jnewbery> It'd be trivial to enhance the implementation to store both the numerator and denominator
149 17:26 <fjahr> I am doing it in the implementation of the index right now, yes
150 17:26 <@jnewbery> sipa: oh! Is that how it's supposed to be used?
151 17:26 <sipa> jnewbery: yes
152 17:26 <willcl_ark> lightbulb.gif
153 17:27 <sipa> it would be totally reasonable to provide a higher-level class that does that for you
154 17:27 <sipa> matching the Python code
155 17:27 <@jnewbery> lightlike: I apologise. You were right. I haven't looked at how this code is integrated into the UTXO index
156 17:27 <sipa> the MuHash3072 class in C++ is *just* a specialized 3072-bit bignum implementation; it's not really the full set hash scheme (as it excludes the ChaCha20 and SHA256 too)
157 17:28 <sipa> ah no, it does the ChaCha20'ing, but not the hashing
158 17:29 <willcl_ark> a little confusingly-named if it's not the full implementation, but makes a lot of sense now
159 17:29 <sipa> it may make sense to refactor that
160 17:29 <sipa> into a Num3072 class that's just the numbers, and MuHash3072 that does num/denom/chacha20/sha256
161 17:29 <willcl_ark> what would you call the higher-level class otherwise
162 17:29 <willcl_ark> I think that would be nice
163 17:29 <gentile> agreed
164 17:30 <@jnewbery> sipa: Don't we already have that? Num3072 is the bignum class
165 17:30 <jesseposner> +1
166 17:30 <sipa> jnewbery: eh, it's just the storage without operations
167 17:30 <sipa> but good point
168 17:31 <@jnewbery> so you're saying move all the functions into class methods in Num3072?
169 17:31 <PatrickLemke> sipa Is there a specific reason why you implemented your own bignum support for C++?
170 17:31 <sipa> though in some cases you don't actually need the num/denom trick (if you're computing the muhash3072 from the utxo set in full, you're never deleting)
171 17:31 <sipa> PatrickLemke: avoiding a dependency for something that's possibly consensus critical at some point
172 17:32 <@jnewbery> we have quite a lot of questions to get through, so let's keep moving. What public methods does the MuHash3072 object expose to clients? What is the Span<> class template that’s used in some of those public methods?
173 17:32 <willcl_ark> The *= and /= operators
174 17:32 <jesseposner> *=, /=, Finalize
175 17:32 <nehan> Span is sort of like a slice in Go. It points to some range in another datastructure
177 17:33 <felixweis> i wasn't quite sure about the span because the INPUT_SIZE is supposedly fixed
178 17:33 <@jnewbery> hi nehan!
179 17:33 <felixweis> makes sense if we hash in the instanciation tho
180 17:33 <nehan> i wasn't clear on why Span either
181 17:33 <@jnewbery> yes, we have multiplication and division operators, a constructor and a Finalize() function
182 17:34 <willcl_ark> A span is just a template for a sequence of value of the same type I think
184 17:34 <sipa> willcl_ark: it's not a template
185 17:34 <sipa> it's a way of passing a (pointer to array, length of array) conveniently, so it works with any container that stores sequential elements of that type
186 17:35 <@jnewbery> they're a lightwight, non-owning reference to some contiguous sequence of objects (bytes in this case)
187 17:35 <sipa> but it's a concrete data type, not a way of just making your function templated in the type of the passed object (which is an alternative that also works0
188 17:35 <jonatack> A Span represents a vector-like view to a range of contiguous elements in memory analogous to std::span in C++20
189 17:35 <jonatack> (one more definition ;)
190 17:36 <@jnewbery> Next question. What does the #ifdef HAVE___INT128 code in muhash.h do?
191 17:36 <jonatack> they're becoming quite ubiquitous in the codebase
192 17:36 <glozow> HAVE___INT128 = whether your system supports 128b integers?
193 17:36 <willcl_ark> If you have BigInt support, you can use MuHash with fewer, larger limbs to make up your 3072 bits. Does this require fewer inverse/multiplications when computing the hash?
194 17:37 <@jnewbery> willcl_ark: not fewer inverse/multiplications, but each inverse/multiplication requires fewer operations
195 17:37 <glozow> I imagine bigger integers = fewer operations = more optimized ?
196 17:37 <@jnewbery> glozow: right
198 17:37 <willcl_ark> jnewbery: ah right, thanks
199 17:37 <glozow> How would one test consistency between HAVE___INT128 and !HAVE___INT128 platforms?
200 17:38 <sipa> willcl_ark: not fewer inverses, but each inverse (and 3072-bit multiplication) consists of 4x fewer limb multiplications if they're twice as big
203 17:39 <syrex> Which C++ standard is used?
204 17:39 <@jnewbery> glozow: very good question. What do people think?
205 17:39 <willcl_ark> That's a decent speedup
206 17:40 <nehan> well you need the hardware, but presumably you could #define it off and compare the results in a test?
207 17:40 <sipa> syrex: bitcoin core master is C++11 right now, but we'll transition to C++17 probably in the next few weeks (after 0.21 branch off)
208 17:40 <jonatack> syrex: c++11, migration to 17 is planned during the next release or so.
209 17:41 <@jnewbery> nehan: yeah, that sounds sensible. It'd be quite nice to do it automatically though
210 17:41 <@jnewbery> 4. How is a MuHash3072 object constructed and initialized? What happens if the ChaCha20 output is larger than the group order?
211 17:41 <willcl_ark> glozow: perhaps you exercise each mathmatical operation with and without it using some edge-sized values?
212 17:42 <michaelfolkson> You take the output modulo the group order..?
213 17:43 <willcl_ark> Looks like it must be initialised with a Span of type `unsigned char` containing one 32B key otherwise `assert(key32.size() == INPUT_SIZE)` will trigger
214 17:43 <sipa> michaelfolkson: you're confused; there is no group here (except the multiplicative group we're working in)
215 17:43 <@jnewbery> michaelfolkson: that's a good guess but it's not the right answer
216 17:44 <sipa> it's just taken modulo the modulus
217 17:44 <sipa> that modulus is not the order of some other group
218 17:44 <@jnewbery> sipa: I don't understand
219 17:45 <sipa> MuHash3072 is doing arithmetic in the field of integers modulo 2^3072 - 1103717
220 17:45 <sipa> in particular, the multiplication subgroup of that field is what we use
221 17:46 <sipa> but 2^3072 - 1103717 isn't the order of some other group (like scalar arithmetic is arithmetic modulo the order of an EC group)
222 17:46 <sipa> it's just a constant we've chosen
224 17:47 <sipa> no, that's correct
225 17:47 <sipa> actually, it's not
226 17:47 * Murch is waiting for sipa's computing process to halt
227 17:47 <sipa> it is the order of the additive group of integers modulus that number, trivially
228 17:48 <sipa> but that additive group isn't relevant to us here; we only multiply and divide
229 17:48 <sipa> it should just say "is chosen as modulus"
230 17:48 <fjahr> noted :)
231 17:48 <@jnewbery> ah! I think that's what's confused me
232 17:48 <sipa> jesseposner: thanks for pointing that out, i guess past-me made the same mistake
233 17:49 <lightlike> hmm, where exactly is the modulus taken? No "%" exists in all of muhash.cpp
234 17:49 <sipa> lightlike: thankfully! modulus operations are slow
235 17:49 <@jnewbery> ok, so let me revise my question. What happens if the output of the chacha20 is greater than the modulus?
236 17:49 <sipa> lightlike: like 100x slower than a multiplication
237 17:50 <sipa> lightlike: and the modulus is computed inside the Multiply and Square functions, simultaneously with the multiplication
238 17:50 <lightlike> sipa: ah, thanks!
239 17:51 <gentile> 9 minutes left
240 17:51 <@jnewbery> right, so if the output of the chacha20 is greater than the modulus, we don't do anything, because we reduce in the Finalize() function
241 17:51 <sipa> lightlike: there is a name for this technique, but if you want to compute (x mod (2^N - C)), where x is up to 2N bits, you can observe that it's equal to (x_low) + (x_high * C), where _low and _high are the bottom and upper half of x
242 17:52 <@jnewbery> Next question: The Finalize() method has a comment “Does not change this object’s value.” Why is the function not marked const?
243 17:52 <felixweis> I probably have to read this 2^N times to understand it...
244 17:53 <glozow> Wale, the effective value is never changed, but Finalize might call FullReduce() in case IsOverflow(), which mutates the Num3072 object?
245 17:53 <@jnewbery> glozow: exactly right!
246 17:53 <sipa> felixweis: what is 1437 mod 99?
249 17:54 <sipa> indeed, why?
251 17:54 <sipa> bingo; 1400 mod 99 = 14
253 17:54 <sipa> what is 1437 mod 97?
254 17:54 <@jnewbery> FullReduce() doesn't change the value modulo 2^3072 - 1103717, but it may change the internal Num3072 representation
256 17:55 <sipa> yup, why?
257 17:55 <glozow> i needed some time to multiply 14*3 tho
259 17:55 <Murch> 3*14+37 < 97
260 17:55 <sipa> exactly, it's the same here: by having a modulus close to a power of two, we can reduce by multiplying the top half by C, and adding to the lower half
261 17:56 <@jnewbery> ok, we're almost out of time, so I'm going to skip to the last question. How is this new code tested? Can you think of other ways that it could be tested?
262 17:56 <felixweis> interesting
264 17:57 <glozow> we can test by having sipa do it in his head and compare with the code's results
266 17:57 <sipa> or use the python code
267 17:57 <sipa> it's faster
268 17:58 <@jnewbery> sipa: good answer!
269 17:58 <michaelfolkson> There are some unit test cases and some fuzzing.
270 17:58 <buzz08> sipa: good comeback
272 17:59 <@jnewbery> Where the python implementation and c++ implementation are tested against the same random input and checked that they arrive at the same result
273 17:59 <buzz08> a functional test case with sample I/O values ?
274 17:59 <Murch> jnewbery: Of course that only proves that they are coming to the same result, not that the result is correct ;)
275 18:00 <willcl_ark> Murch: we only need consensus though, right :P
276 18:00 <@jnewbery> buzz08: yeah, we do have test vectors, but it's also nice to have more random coverage
277 18:00 <Murch> willcl_ark: touche
278 18:00 <@jnewbery> ok, that's time. Thanks everyone! Hope you enjoyed looking at some lower level code this week
279 18:00 <sipa> if there is a bug in this code, it'll likely result in different results on 32-bit and 64-bit code
280 18:00 <willcl_ark> thanks jnewbery, sipa!
281 18:01 <stacie> thanks for hosting jnewbery!
282 18:01 <felixweis> great thanks jnewbery for hosting :-)
283 18:01 <glozow> thanks jnewbery!
284 18:01 <buzz08> great info guys, thanks to all
286 18:01 <blueskies> this was fascinating. thank you all!
287 18:01 <fjahr> thanks jnewbery :)
288 18:01 <thomasb06> (order of a group G, say o(G), implies that for all g in G, g * g * ... * g done o(G) times always gives the neutral element of G, say 1 with multiplicative notation. But by definition, the order of a group is the number of element of the group. The g * g * ... * g = 1 is a consequence. For example the group of symmetries that keep a cube invariant is of order 48, that is there are 48 symmetries in the group. Bu
289 18:01 <thomasb06> it takes g * g * ... * g done 48 times to get the neutral element, here the identity transformation. If you consider only the rotational symemetries, the group is of order 24 only.)
290 18:01 <lightlike> thanks!
291 18:01 <willcl_ark> Who is compiling the review-club compendium of factoids anyway
292 18:02 <thomasb06> thanks jnewberry
293 18:02 <michaelfolkson> "do you plan to add tests using the just-added Python MuHash3072 implementation?" jonatack
294 18:02 <gentile> thanks jnewbery
296 18:02 <sipa> thomasb06: groups can be larger than their order
297 18:03 <michaelfolkson> Thanks jnewbery
298 18:03 <jonatack> michaelfolkson: is that from my review last May?
299 18:03 <fjahr> michaelfolkson: they are there in the follow-up pr
300 18:03 <michaelfolkson> Yup
301 18:03 <michaelfolkson> Ah nice fjahr
302 18:03 <sipa> thomasb06: the addition group of GF(2^2) has order 2, but size 4
303 18:03 <emzy> Tnx anyone
304 18:03 <jonatack> thanks jnewbery and fjahr, I'll re-review completely after branch-off
307 18:05 <thomasb06> sipa well, as far as I remember, the order was the number of element when talking of a group
308 18:05 <michaelfolkson> Surely for users it has to be sat/VB... (don't want to swing poll)
309 18:05 <Murch> jonatack: So far people seem to be fairly unambiguous
310 18:05 <thomasb06> (my computer is swaping)
311 18:06 <jonatack> Murch: the first poll, definitely
312 18:08 <jonatack> michaelfolkson: seems so but i'm unsure how much of that preference is habit or unfamiliarity. for now proceeding with sat/vB
313 18:09 <michaelfolkson> Right thomasb06
315 18:12 <thomasb06> hehe, the memories were not bad. It starts to be far though...
316 18:16 <michaelfolkson> I've been flailing around in EC world too recently. Have to remember which world you are operating in
317 18:18 <glozow> wait so group order != size?
318 18:19 <thomasb06> glozow: when talking of the set, no
319 18:19 <thomasb06> but in the case of sepc256k1, the group is cyclic so it's the same
320 18:20 <thomasb06> G = <g>
321 18:20 <thomasb06> so o(G) = o(g)
322 18:22 <thomasb06> michaelfolkson: yes, the ECs are defined over finite fields, it's always confusing
323 18:24 <glozow> the group = the underlying set + the binary operation, in this case mod multiplication?
324 18:26 <thomasb06> glozow: leave the mod multiplication, takes the symmetries that keep a cube invariant. It's a simple group
325 18:26 <thomasb06> *take
326 18:30 <sipa> glozow: the group we're working over here is the multiplication modulo 2^3072-1103717
327 18:30 <sipa> the _order_ of that group is (2^3072-1103717)-1, as it excludes the 0 element
328 18:30 <sipa> (which is not considered part of the multiplicative group)
329 18:31 <sipa> so the integers modulo P=2^3072-1103717 has P elements, but only P-1 of those participate in the multiplicative group
330 18:32 <sipa> that also means that if you take any number (except 0), and raise it to the power P-1, modulo P, you get 1
331 18:32 <sipa> which is the identity in that group
332 18:32 <sipa> as it's a cyclic group, its size is equal to its order, but that order is _not_ the modulus
333 18:39 <glozow> sipa that makes sense, thank you
334 19:04 <jesseposner> So if I'm getting this right, the modulus, 2^3072 - 1103717 (the largest 3072-bit safe prime number), is used to define a finite field. The order of that finite field is the modulus. However, the non-zero elements of the finite field form a multiplicative group, and thus the order of the group is (2^3072-1103717)-1 because it excludes the 0 element.
335 19:06 <thomasb06> jesseposner: it sounds good but I don't know what a modulus is... The number of elements?
336 19:07 <jesseposner> It is equal to the number of elements in the finite field, but is not equal to the number of elements in the group (because the group excludes the zero element of the field).
337 19:07 <thomasb06> jesseposner: order is for groups rather, so a finite field has a modulus but no order
338 19:08 <jesseposner> I believe both a group and a finite field have an order.
339 19:08 <thomasb06> then, the field is of modulus 2^3072 - 1103717
340 19:08 <jesseposner> yes
341 19:09 <thomasb06> and the invertibles are a group of order 2^3072 - 1103717 - 1
342 19:09 <thomasb06> (not used to the order term for fields though)
343 19:10 <jesseposner> I believe so
344 19:11 <sipa> jesseposner: yes, "order of a field" is really "order of its additive group"
345 19:11 <sipa> jesseposner: and the field we have here has an additive group (which we don't use!) of order MODULUS, and a multiplicative group of order MODULUS-1
346 19:11 <sipa> that is the case only because MODULUS is a prime
347 19:12 <sipa> if MODULUS was not a prime, then (integers mod MODULUS) would not be a field (it would be a ring instead), and its multiplicative group would have an order less than MODULUS-1
348 19:12 <jesseposner> ah, interesting!
349 19:13 <sipa> this is used in RSA, for example, where the ring (integers modulo p*q) is used; that ring has additive group of order p*q, but multiplicative group of order (p-1)*(q-1)
350 19:13 <sipa> and security relies on attackers not being able to figure out the order (p-1)*(q-1) given just p*q
351 19:13 <sipa> (obviously, if you have p and q individually, that's trivial, but they don't)
352 19:15 <sipa> the fact that the multiplicative group for us has order MODULUS-1 is used to compute inverses, btw: if x^(MODULUS-1)=1 for all x != 0, then x^(MODULUS-2) must be x^-1
353 19:15 <sipa> just divide both sides by x