Add Python implementation of Minisketch, part 1 (math and cryptography)

https://github.com/sipa/minisketch/pull/26

Host: sipa  -  PR author: sipa

Notes

  • libminisketch is a library implementing the PinSketch set reconciliation algorithm. It is the basis for the efficient relay protocol in Erlay (covered in a previous review club), but is generally usable for various applications that use set reconciliation. PinSketch is effectively a collection of algorithms doing the following:

    • Computing a “sketch” of a set of data elements (which are treated as numbers)

    • Combining two sketches to compute a sketch of the symmetric difference between the two original sets (i.e., the set of elements occurring in one of the original sets, but not both).

    • Recovering the set of elements from its sketch, under the assumption the sketch is large enough.

  • In this review club, we will discuss some of the math behind the algorithm, by reviewing the Python reimplementation, rather than the C++ code itself. It is a full implementation of all of libminisketch’s functions natively in Python3. At a high level it uses the same algorithms as the C++ implementation, but lacking several optimizations, and using Python data types to represent everything. This makes it several orders of magnitude slower, but hopefully also makes it easier to review the algorithms.

  • The algorithms rely heavily on properties of finite fields. Fields are sets with an associated addition and multiplication operation, with all the properties you want for solving sets of linear equations (commutativity, associativity, distributivity, a “0” that is the neutral element for addition, a “1” that is the neutral element for multiplication, and every element except 0 having a multiplicative inverse). An example of a field is the rational numbers or the real numbers. In order to be able to represent these objects on a computer we however want fields with a finite number of elements. This 9-minute video explains how to construct finite fields with any size that is a power of a prime number, using polynomials. In PinSketch, only finite fields whose size is a power of 2 is used.

  • The PinSketch algorithm goes a step further, and actually operates on polynomials whose coefficients are finite field elements. This may be confusing, as the construction of those finite fields themselves also uses polynomials, so it’s important to realize there are two layers.

  • A description of PinSketch’s algorithms at a high level is explained in this document.

Questions

  1. How are finite field elements represented in Python? How do we perform additions, subtractions, multiplications, and divisions on them? How are polynomials with finite field coefficients represented in Python?

  2. Imagine Alice and Bob have sets {a,b,c,d,f} and {b,d,e,f} respectively, where the variables represent distinct non-zero 8-bit field elements. Assume that Alice knows ahead of time that the (symmetric) difference between those two sets is not more than 3. Alice is going to send a sketch of her elements to Bob, so that Bob can learn the differences. How many bits will her sketch be in size? Looking at add, what field elements will that sketch consist of?

  3. Following the same example, Bob will compute a sketch over his own elements, and combine it with the sketch received from Alice. What field elements will that combined sketch contain? Noting that it only contains odd powers, how will Bob restore the even ones (see decode)? Would this work if we weren’t using power-of-two fields?

  4. The next step is converting the power sums to a polynomial using the Berlekamp-Massey. What does this algorithm do for us?

  5. The final, and most complicated step to recovering the set of differences is finding the roots of the obtained polynomial. To do so, we first verify that the obtained nth degree polynomial actually has n distinct roots. It relies on the property that (x^(fieldsize) - x) has every field element exactly once as root. How does that let us test if the polynomial is fully factorizable into n roots? Why is this test necessary?

  6. To actually find the roots, the Berlekamp Trace Algorithm is used. It uses the trace function t(x) = x + x^2 + x^4 + ... + x^(fieldsize/2) which maps every element of a field of size 2^n to 0 or 1. In our 8-bit field that means t(x) = x + x^2 + x^4 + x^8 + x^16 + x^32 + x^64 + x^128. This means that for any non-zero field element p, tr(p*x) also has this property, and every choice of p will map a different subset of field elements to 0 (and the others to 1). How is this property used to recursively split the polynomial into smaller and smaller ones?

Appendix

(notes added after the meeting)

Notes on the math used in Minisketch:

Meeting Log

  118:00 <jnewbery> #startmeeting
  218:00 <jonatack> hi
  318:00 <sipa> hi
  418:00 <emzy> hi
  518:00 <elle> hi
  618:00 <jnewbery> hi folks! Welcome to Bitcoin Core PR Review Club
  718:00 <glozow> hi!
  818:00 <OliP> Hi!
  918:00 <dergoegge> hi
 1018:00 <iodize> hi
 1118:00 <aferreira44> Hi
 1218:00 <jnewbery> Feel free to say hi to let everyone know you're here. And let us know if this is your first time.
 1318:00 <aferreira44> First time here
 1418:00 <prayank> hi
 1518:00 <michaelfolkson> hi
 1618:00 <dylanm> Hi there, second time
 1718:00 <Caralie> hi
 1818:00 <iodize> welcome aferreira44
 1918:00 <jnewbery> welcome aferreira44!
 2018:01 <sdaftuar> hi
 2118:01 <schmidty> hi
 2218:01 <andozw> yo
 2318:01 <ecola> hi
 2418:01 <sipa> welcome all!
 2518:01 <jnewbery> This week is a little bit different. It's the first time we're talking about a PR that isn't in the Bitcoin Core repo.
 2618:01 <lightlike> Hi
 2718:01 <Shing> hi, first time here
 2818:01 <jnewbery> We'll be covering some of the math in Minisketch. Notes and questions kindly provided by sipa are here: https://bitcoincore.reviews/minisketch-26
 2918:02 <jnewbery> ok, take it away, sipa!
 3018:02 <sipa> cool, thanks jnewbery!
 3118:02 <sipa> this will also be a bit different as it's primarily reviewing the algorithm/math itself, and not so much the implementation
 3218:03 <sipa> i have no idea how this is going to go, but i'm sure it'll be interesting
 3318:03 <amiti> hi!
 3418:03 <fodediop> hi
 3518:03 <sipa> who has had a chance to look over the notes on https://bitcoincore.reviews/minisketch-26 ?
 3618:03 <jnewbery> y
 3718:03 <michaelfolkson> I have but not for long enough :(
 3818:03 <jonatack> a bit
 3918:03 <dergoegge> ~y
 4018:03 <Murch> hi
 4118:03 <glozow> y
 4218:03 <am74> I took a look but not a mathematician so don't fully understand everything
 4318:04 <shafiunmiraz0> a bit
 4418:04 <jnewbery> Shing: welcome!
 4518:04 <eoin> Hi, first time here
 4618:04 <sipa> that's perfectly fine... i don't expect that we'll go over all of it
 4718:05 <emzy> y
 4818:05 <sipa> so let's dive in with the first question, and we'll see where we go from there
 4918:05 <willcl_ark> hi
 5018:05 <sipa> How are finite field elements represented in the Python code?
 5118:05 <ecola> just lurking here, totally out of my depth
 5218:05 <jnewbery> eoin: welcome!
 5318:06 <glozow> In `GF2Ops`, elements are represented as integers. In the generation code I found it helpful to note that `GF(2^field_size)` is isomorphic to `Z_2[x] / (f)` for polynomial f of degree `field_size`, so when generating the field, the integers are like polynomials where the nth bit (from the right) is the coefficient for `x^{n-1}`.
 5418:06 <am74> There are some libraries like pyfinite
 5518:06 <jnewbery> ecola: lurking is also fine. We're all learning
 5618:06 <eoin> Thank you jnewbury
 5718:06 <lightlike> as numbers from 1 to 2^b-1 where b is field size
 5818:06 <sipa> glozow: yes, exactly :)
 5918:06 <sipa> lightlike: 0 as well
 6018:06 <glozow> Oh, and polynomials in the poly_* functions with finite field coefficients are arrays of integers with the ith element of the array corresponding to the coefficient for `x^i` (with no trailing zeroes).
 6118:07 <sipa> someone prepared :)
 6218:07 <sipa> glozow: exactly
 6318:07 <michaelfolkson> For those who feel completely out their depth this might be helpful: https://bitcoin.stackexchange.com/questions/102582/what-problem-set-is-the-minisketch-library-designed-for-what-is-it-currently-us
 6418:07 <michaelfolkson> "In theory it could be used for any problem where you have two sets of data and want to work out how those data sets differ (which specific elements are present in one but missing in another) as efficiently as possible."
 6518:08 <sipa> so, our field elements are represented as the integers from 0... 2^bits - 1, inclusive
 6618:08 <emzy> Can you think of it as a more general bloomfilter?
 6718:08 <sipa> but they are not integers
 6818:09 <sipa> emzy: there is some similarity, but it's closer to IBLT
 6918:09 <sipa> the big difference with normal bloom filters is that you can recover the actual elements from a sketch
 7018:09 <sipa> how do we perform addition between our field elements?
 7118:10 <glozow> simply xor!
 7218:10 <dergoegge> addition is a simple XOR (i think subtraction as well, not to confident about that tho)
 7318:10 <sipa> dergoegge: very good, you answered my next question too
 7418:11 <sipa> can you see why?
 7518:11 <jnewbery> dergoegge: if you have two field elements a and b, and a + b is a ^ b, what is a + b + b?
 7618:12 <dergoegge> jnewbery: a :)
 7718:12 <sipa> so if a + b + b = a, then a + b = a - b :)
 7818:12 <jonatack> right, it cancels itself out (set semantics)
 7918:12 <dergoegge> so subtraction is also a simple xor
 8018:12 <sipa> this self-cancelling aspect will become important later
 8118:12 <sipa> what about multiplication?
 8218:13 <glozow> more generally, when characteristic 2, additive inverse of any element is equal to itself?
 8318:14 <glozow> Multiplication (x, y): For each bit in y (from LSB to MSB), add x if the bit = 1, then multiply by 2. Equivalent to adding y copies of x.
 8418:14 <jnewbery> glozow: exactly! True for any GF(2^n)
 8518:14 <sipa> glozow: indeed, that's almost the definition of characteristic (the characteristic of a field is the maximum number of times you need to add an element to itself to be guaranteed to reach 0)
 8618:14 <sipa> every 2^n sized field has characteristic 2
 8718:15 <sipa> so if we're working in the 8-bit field, what would {3} * {3} be (i'm going to write field elements in {} to not give the impression they're actually integers)
 8818:15 <dergoegge> i understand multiplication except the part with the irreducible polynomials.
 8918:16 <dergoegge> is that comparable with modulo but for polynomials?
 9018:16 <sipa> dergoegge: let's go into that then
 9118:16 <sipa> yes, it is exactly that
 9218:16 <sipa> the irreducible polynomials correspond to the prime numbers in Z
 9318:16 <sipa> if you work modulo a non-prime number, you get zero divisors (two non-zero numbers that multiplied together give 0)
 9418:16 <sipa> e.g. mod 9, 3*3 = 0
 9518:17 <sipa> if you work modulo a non-irreducible polynomial, you also get zero divisors
 9618:17 <sipa> say we were trying to construct GF(4), but were working modulo x^2+1 ({5}), what would (x+1)*(x+1) be? ({3} * {3}) ?
 9718:18 <jnewbery> dergoegge: it's the generalization of that. Integers are polynomials of degree 0, so p is an irreducible polynomial of degree 0
 9818:18 <sipa> what's (x+1) * (x+1) in general?
 9918:19 <sdaftuar> x^2+1, so the product you asked about would be 0, when working modulo x^2+1
10018:19 <glozow> x^2 + 1?
10118:19 <lightlike> x^2 + 1?
10218:19 <sipa> right
10318:19 <sdaftuar> and hence that would not be a field
10418:19 <sipa> in general it is (x^2 + 2x + 1), but that's the integer 2=1+1, not {2}
10518:19 <sipa> 1+1 = 0 in our field
10618:20 <prayank> sipa: (x+1)^2 = x^2 + 1 + x
10718:20 <sipa> prayank: no
10818:20 <sipa> x^2 + 1 + x + x
10918:20 <prayank> Sorry 2x
11018:20 <prayank> Yeah
11118:20 <glozow> prayank: 2x = x + x = 0 here
11218:20 <sipa> so if we were working modulo (x^2 + 1) {5}, the field element x+1 {3} would not have an inverse
11318:21 <sipa> and it's a requirement for a field that every non-zero element has an inverse
11418:21 <sipa> otherwise we can't solve linear equations over it
11518:22 <sipa> ok, so, multiplication over the integers-representing-field-elements is really: taking them apart, interpreting the bits as coefficients of a polynomial over Z2 (integers mod 2), multiplying those polynomials, and then doing the result modulo an irreducible polynomial of degree (bits)
11618:22 <sipa> what irreducible polynomial is used for the 256-sized field?
11718:23 <jnewbery> GF2_MODULI[8]
11818:23 <glozow> x^9 + x + 1?
11918:23 <glozow> WAIT OOPS I MISCOUNTED
12018:23 <sipa> glozow: that's for a field with 512 elements
12118:24 <glozow> x^8 + x^4 + x^3 + x + 1
12218:24 <sipa> indeed
12318:24 <jnewbery> so I was wrong as well, should be GF2_MODULI[7] ?
12418:24 <glozow> no, the first 2 are None
12518:24 <sipa> jnewbery: no, GF2_MODULI[8]
12618:24 <jnewbery> oh sorry. None, None
12718:24 <jnewbery> got it
12818:24 <sipa> in minisketch the lowest-degree irreducible polynomial is used for every field size, and it appears no trinomial exists for size 2^8
12918:25 <sipa> so a pentanomial is used
13018:25 <jnewbery> The moduli are here in the code: https://github.com/sipa/minisketch/pull/26/files#diff-c1cbd3b5834cdb8c5408a4527171bfa260f4a104e89e051700a9da3dc69f56a4R18-R83
13118:25 <sipa> so: integers represent field elements
13218:25 <sipa> but in what follows we're going to work with polynomials _over_ those field elements
13318:25 <sipa> and it's important not to confuse the two
13418:26 <sipa> as polynomials are also used in the definition of the field elements themselves
13518:26 <sipa> the field elements are represented as integers in the python code; polynomials are represented as lists of integers (if p is a list, p[0] is the 0th degree coefficvients, p[17] is the 17th degree coefficient, etc)
13618:27 <sipa> let's move on to the next question
13718:27 <sipa> Imagine Alice and Bob have sets {a,b,c,d,f} and {b,d,e,f} respectively, where the variables represent distinct non-zero 8-bit field elements. Assume that Alice knows ahead of time that the (symmetric) difference between those two sets is not more than 3. Alice is going to send a sketch of her elements to Bob, so that Bob can learn the differences. How many bits will her sketch be in size?
13818:27 <jonatack> A sketch of b-bit elements with capacity c can be stored in bc bits, so Alice's sketch will be max 5 elements x 8 bits = 40 bits in size?
13918:27 <dergoegge> 8*3 so 24
14018:27 <lightlike> 3*8=24 bits?
14118:27 <glozow> I got 24, `capacity` = 3, 8-bit elements (`field_size` = 8), we get bc = 24 bits.
14218:27 <sipa> feel free to ask me any time if something isn't clear in the question
14318:27 <jonatack> or capacity 3 x 8 bits = 24
14418:28 <sipa> yes, exactly 24 bits
14518:28 <jonatack> was unsure here
14618:28 <dergoegge> in serialized_size whats the +7 for?
14718:28 <sipa> and what field elements would the sketch Alice compute for her set, with capacity 3?
14818:28 <sipa> dergoegge: rounding up to a multiple of bytes
14918:28 <glozow> it's ceil division for bytes
15018:28 <sipa> look at the add function
15118:29 <jonatack> field elements {a,c}
15218:29 <glozow> oo, typo in add comment? `1 <= element < 2**field_size`
15318:29 <sipa> jonatack: that's not right, we're looking for something with 3 elements
15418:30 <sipa> glozow: i believe that comment is correct
15518:30 <sipa> add requires "numbers" as input in range [1,255]
15618:30 <glozow> er i mean, it currently says 1**field_size
15718:30 <sipa> oh, oops, yes!
15818:30 <sipa> absolutely
15918:30 <sipa> it should be 2**field_size
16018:31 <sipa> so, Alice is creating a Minisketch object, with field_size=8 capacity=3
16118:31 <sipa> what are its _odd_syndromes initialized to?
16218:32 <glozow> [0, 0, 0]?
16318:32 <sipa> self._odd_syndromes = [0] * capacity
16418:32 <sipa> indeed
16518:32 <sipa> now Alice calls sketch.add(a)
16618:32 <lightlike> should alice have a+b+c+d+f, a**3+b**3+c**3+d**3+f**3, a**5+b**5+c**5+d**5+f**5?
16718:32 <sipa> what will the _odd_syndromes become?
16818:32 <sipa> lightlike: exactly
16918:33 <sipa> (where + and ** are field operations, not integer operations)
17018:33 <sipa> add(a) will change _odd_syndromes to [a, a**3, a**5]
17118:33 <sipa> add(b) will change it to [a+b, a**3 + b**3, a**5 + b**5]
17218:33 <sipa> etc
17318:34 <glozow> noice
17418:34 <sipa> and serialize will output 3 bytes that pack those 3 field elements
17518:35 <sipa> if Bob wants to learn what elements Alice has he doesn't, and the other way around, he will receive this sketch [a+b+c+d+f, a**3+b**3+c**3+d**3+f**3, a**5+b**5+c**5+d**5+f**5] from Alice
17618:35 <sipa> and compute his own sketch
17718:35 <sipa> and then combine the two to find the sketch of the symmetric difference of their sets
17818:35 <sipa> how does he perform this combining?
17918:35 <dergoegge> i though the // 8 in serialize size was a comment, lol oh man
18018:35 <glozow> xor them!
18118:35 <sipa> haha
18218:35 <sipa> glozow: exactly
18318:36 <sipa> and this is why the a+a = 0 property matters: simply adding field elements gives you naturally a sketch of the symmetric difference
18418:36 <amiti> wow, thats so cool.
18518:36 <sipa> so what's the combined sketch?
18618:36 <sipa> for elements {a,b,c,d,f} and {b,d,e,f}
18718:36 <lightlike> a+c+e, a**3+c**3+e**3, a**5+c**5+e**5
18818:36 <emzy> nice math. :)
18918:37 <sipa> yeah
19018:37 <sipa> after doing that, Bob needs to decode the sketch to find the elements it corresponds to
19118:37 <jnewbery> dergoegge: to answer your question from earlier, it's to get the smallest number of bytes that will fit the right number of bits. // in python is floor division
19218:38 <sipa> there are 3 steps in that: first computing the even power-sums from just the odd-power ones, applying Berlekamp-Massey to convert it to a polynomial, and then finding the roots of that polynomial
19318:38 <sipa> how is it possible to compute the even power-sums from just the odd ones?
19418:38 <dergoegge> jnewbery: yea thanks i was just confused because i am looking at c code all day and // normally indicates comments :D
19518:39 <sipa> so we're given [a+c+e, a**3 + c**3 + e**3, a**5 + c**5 + e**5]
19618:39 <glozow> Since the polynomials are over a finite field with a characteristic 2, `a + a = 0` for every `a` in the field and thus `(x + y)^2 = x^2 + y^2`. It's also known that `s_0 = n` This makes it possible to compute all even syndromes of the sketch from the odd syndromes since every even is some `s_{2i} = x^{2i} + y^{2i} + ... = (x + y + ...)^{2i} = s_i^2`
19718:39 <sipa> and we want [a+c+e, a**2 + c**2 + e**2, a**3 + c**3 + e**3, a**4 + c**4 + e**4, a**5 + c**5 + e**5, a**6 + c**6 + e**6]
19818:40 <jonatack> looking at pyminisketch.py::L430-434
19918:40 <sipa> yeah, indeed, the sum of squares is just the square of the sums
20018:40 <lightlike> so square the last odd one for the next even one
20118:40 <sipa> and the sum of 4th powers is the square of the sums of squares
20218:40 <sipa> and the sum of 6th powers is the square of the sums of cubes
20318:41 <sipa> not necessarily the last odd one
20418:41 <sipa> e.g. the 4th power sum is the square of sum of 2nd powers
20518:41 <lightlike> oh, ok
20618:42 <sipa> and this is indeed special about characteristic 2
20718:42 <sipa> if we were working in a different characteristic field, we'd need to send the odd powers too
20818:42 <sipa> giving an instant factor 2 size increase
20918:42 <glozow> :math-fireworks:
21018:42 <prayank> lol
21118:43 <sipa> the next step in the recovery process is the Berlekamp-Massey algorithm
21218:43 <jnewbery> *need to send the even powers too ?
21318:43 <sipa> jnewbery: yes!
21418:43 <sipa> unfortunately, i have no clue *why* the BM algorithm works
21518:43 <sipa> but we can talk about *what* it does for us
21618:43 <sipa> anyone have an idea?
21718:44 <michaelfolkson> I've read the wiki entry :)
21818:44 <glozow> BM finds the polynomnial (determined by coefficients) that generates the the syndromes. i.e. minimal polynomial L (its coefficients `l_0 ... l_n`) s.t. `S(M)*L` = the power sums.
21918:44 <sipa> if our sketch just contained the single element a
22018:45 <sipa> so [a, a**2, a**3, a**4, a**5, a**6] is our expanded syndromes
22118:45 <sipa> what polynomial would BM output?
22218:46 <sipa> note that every element in the list is the previous one times a
22318:46 <sipa> that's a geometric series, remember?
22418:46 <jnewbery> I'm going to guess (x-a)
22518:47 <sipa> wrong ;)
22618:47 <sipa> but very close
22718:47 <jnewbery> is anyone else brave enough to guess or do I have to be wrong again?
22818:48 <jnewbery> (x-a)^6 ?
22918:48 <sipa> i'll help: it's (1-a*x)
23018:48 <sipa> or [1, a] for us in list representation
23118:49 <sdaftuar> the roots of the polynomial berlekamp massey spits out are the inverses of what we want?
23218:49 <sipa> yeah
23318:49 <sdaftuar> crazy
23418:49 <sipa> so what BM does
23518:49 <sipa> in a way
23618:49 <sipa> is find a list of coefficients, which when convoluted with that many subsequent elements in the input, gives 0
23718:49 <sipa> so for our 1st degree example here
23818:50 <sipa> the input lists are [a, a**2], [a**2, a**3], [a**3, a**4], ...
23918:50 <glozow> berlekamp massey doesn't give us the roots right? just coefficients - we get roots afterward?
24018:50 <sipa> and for each of them, multiplying the first with -a, and adding the second times 1, gives 0
24118:50 <sipa> right?
24218:51 <sipa> (a * -a) + (a * 1) = 0
24318:51 <sipa> it gives a way to compute the next term of the list in function of the previous one
24418:51 <sipa> like the fibonacci series
24518:52 <sipa> has rule x_{n+2} = x_{n} + x_{n+1}
24618:52 <sipa> this one has rule x_{n+1} = a*x_{n}
24718:52 <sipa> the coefficients of that rule are [-a, 1]
24818:52 <sipa> and that's what BM finds
24918:53 <sipa> given a list generated by a fibonacci-like rule, find the rule
25018:53 <sipa> better: find the simplest rule
25118:54 <sdaftuar> so BM finds a polynomial in GF(2^n)[x] of degree at most c, in our application?
25218:54 <sdaftuar> i guess it can fail sometimes too?
25318:54 <sipa> and it turns out that if you have a list [a, a**2, a**3, a**4, ...], that means the rule (-a)*x_{n} * 1*x_{n+1}
25418:55 <sipa> if your list is [a+b, a**2 + b**2, a**3 + b**3, ...], it finds the rule ((-a)*x_{n} * 1*x_{n+1}) * ((-b)*x_{n} * 1*x_{n+1})
25518:55 <jonatack> beautifully elegant
25618:55 <sipa> sdaftuar: BM will always find something, but if it finds a polynomial with degree higher than the capacity, we know something went wrong (as that would mean it's generated from more than capacity elements)
25718:57 <sipa> so applying BM to the sketch in our problem, will find [a*b*c, a*b + b*c + a*c, a + b + c, 1] (i.e. the polynomial (a+b+c) + (a*b + b*c + a*c)*x + (a+b+c)*x^2 + x^3)
25818:57 <sipa> and the inverses of its roots are a, b, c
25918:57 <sdaftuar> sipa: from a quick look at the python code, the runtime won't be crazy in those scenarios right? looks like it's just got runtime based on the number of syndromes?
26018:58 <sipa> BM does a quadratic number of field operations i believe
26118:58 <sipa> or rather O(sketch_size * differences)
26218:59 <sipa> of course, the field operations do become slower in function of the field size
26318:59 <sipa> i'm afraid that's our time
26418:59 <lightlike> so the most expensive step is the calculation of the roots, not BM?
26518:59 <sipa> the roots
26618:59 <sipa> which, i'm afraid time doesn't allow us to go into
26718:59 <sipa> but i'm happy to stay longer
26818:59 <glozow> 😢
26918:59 <jnewbery> we could also do a follow-up another week if you're happy to do that, sipa?
27019:00 <jnewbery> might give people a bit more time to go over notes again and read up on BM
27119:00 <michaelfolkson> +1
27219:00 <jonatack> sipa: pyminisketch.py runs in about a minute for me...was anyone able to build the c++ libminisketch? make did not run for me
27319:00 <sipa> the next step is about root finding :)
27419:01 <jnewbery> I think glozow wants you to ask q5
27519:01 <sipa> jnewbery: sure, find be me... though it may be less reviewy
27619:01 <michaelfolkson> Did you run it on test vectors jonatack?
27719:01 <sipa> jonatack: it works for me, though the unit tests run forever, which is kind of weird and may make it look like failure
27819:01 <jonatack> michaelfolkson: i ran "time pyminisketch.py"
27919:01 <sipa> 5. The final, and most complicated step to recovering the set of differences is finding the roots of the obtained polynomial. To do so, we first verify that the obtained nth degree polynomial actually has n distinct roots. It relies on the property that (x^(fieldsize) - x) has every field element exactly once as root. How does that let us test if the polynomial is fully factorizable into n
28019:02 <sipa> roots? Why is this test necessary?
28119:02 <michaelfolkson> Presumably you can feed two sets into it and it finds the elements that aren't in both sets?
28219:02 <sipa> michaelfolkson: that's what the tests do
28319:02 <glozow> roots are where our actual stuff is!
28419:03 <glozow> A n-degree polynomial can have at most n roots. It _may_ have fewer roots that are repeated (note n-degree polynomial with unique roots is synonymous with having n roots).
28519:03 <sipa> glozow: could it also just have fewer roots?
28619:03 <sipa> (even when counting repetition)
28719:03 <glozow> :think
28819:03 <sipa> hint: we answered this in question 1 ;)
28919:04 <michaelfolkson> The tests of the C++ code
29019:04 <sipa> well, touched on it
29119:04 <michaelfolkson> Did anyone feed two sets into the Python code?
29219:04 <sipa> michaelfolkson: that's what the tests do
29319:04 <glozow> is the answer just no o.O
29419:05 <sdaftuar> depends on what field you're looking for roots in!
29519:05 <sipa> glozow: the answer is yes, you can definitely have fewer roots
29619:05 <glozow> aw man, how have fewer roots?
29719:05 <sipa> in non-algebraically-closed fields, but all finite fields are not algebraically closed ;)
29819:05 <michaelfolkson> sipa: Oh within this Python file right, I was looking elsewhere
29919:05 <sipa> glozow: what are the roots of x^2 + x + 1?
30019:06 <glozow> oh, das not reducible
30119:06 <michaelfolkson> das correct
30219:06 <sipa> yes, so what are its roots?
30319:06 <jnewbery> there are none?
30419:06 <sipa> yes, exactly
30519:06 <sdaftuar> in GF(4), it has roots
30619:06 <sipa> sdaftuar: ha, right
30719:06 <sdaftuar> i mean, it really depends on the question :)
30819:07 <sipa> but every finite has polynomials (with coefficients in that field) which are irreducible
30919:07 <sdaftuar> right
31019:07 <sipa> and irreduclbe polynomials have no roots by definition
31119:07 <sipa> you can also have reducible polynomials which have no roots though
31219:07 <sipa> e.g. a 4th degree polynomial may be factorizable into 2 2nd degree polynomials, which are individually irreducible
31319:08 <jnewbery> ok, shall we wrap it up there and schedule a follow-up? It'd be a shame for people who had a hard stop to miss out on too much of the fun
31419:09 <sipa> sgtm
31519:09 <jnewbery> glozow prepared some excellent additional notes here for anyone who wants a bit more explanatory material: https://github.com/glozow/bitcoin-notes/tree/master/minisketch
31619:09 <jnewbery> thanks so much sipa. This has been 🤯🤯🤯
31719:09 <fodediop> This was a heavy one, but I beleive lol
31819:10 <fodediop> thank you sipa
31919:10 <glozow> thanks professor sipa!!!!
32019:10 <dergoegge> thanks sipa!
32119:10 <dergoegge> 🤯
32219:10 <OliP> Thanks sipa!
32319:10 <amiti> yeah this was awesome, I learned a lot from lurking :)
32419:10 <prayank> Thanks everyone
32519:10 <jonatack> great stuff, thank you sipa
32619:10 <emzy> Thanks sipa!
32719:10 <shafiunmiraz0> Thank you sipa. Thank you jnewbery. Thank you everyone
32819:10 <lightlike> thanks sipa!
32919:10 <michaelfolkson> Cool, thanks sipa. Will check out the glozow notes too
33019:10 <jnewbery> #endmeeting
33119:11 <sipa> glozow: This is important because our set {m_1 ... m_n} is derived from the roots of the polynomial. The Berklekamp Trace Algorithm also requires this I believe.
33219:12 <sipa> that's exactly the answer; our root finding algorithm (next club!) requires that the polynomial is fully factorizable into distinct 1st degree factors
33319:12 <sipa> (it'll go into an infinite loop otherwise)