Add Python implementation of Minisketch, part 1 (math and cryptography)

Feb 17, 2021

https://github.com/sipa/minisketch/pull/26

Host: sipa - PR author: sipa

Notes

libminisketch is a library implementing the PinSketch set reconciliation algorithm. It is the basis for the efficient relay protocol in Erlay (covered in a previous review club), but is generally usable for various applications that use set reconciliation. PinSketch is effectively a collection of algorithms doing the following:
- Computing a “sketch” of a set of data elements (which are treated as numbers)
- Combining two sketches to compute a sketch of the symmetric difference between the two original sets (i.e., the set of elements occurring in one of the original sets, but not both).
- Recovering the set of elements from its sketch, under the assumption the sketch is large enough.
In this review club, we will discuss some of the math behind the algorithm, by reviewing the Python reimplementation, rather than the C++ code itself. It is a full implementation of all of libminisketch’s functions natively in Python3. At a high level it uses the same algorithms as the C++ implementation, but lacking several optimizations, and using Python data types to represent everything. This makes it several orders of magnitude slower, but hopefully also makes it easier to review the algorithms.
The algorithms rely heavily on properties of finite fields. Fields are sets with an associated addition and multiplication operation, with all the properties you want for solving sets of linear equations (commutativity, associativity, distributivity, a “0” that is the neutral element for addition, a “1” that is the neutral element for multiplication, and every element except 0 having a multiplicative inverse). An example of a field is the rational numbers or the real numbers. In order to be able to represent these objects on a computer we however want fields with a finite number of elements. This 9-minute video explains how to construct finite fields with any size that is a power of a prime number, using polynomials. In PinSketch, only finite fields whose size is a power of 2 is used.
The PinSketch algorithm goes a step further, and actually operates on polynomials whose coefficients are finite field elements. This may be confusing, as the construction of those finite fields themselves also uses polynomials, so it’s important to realize there are two layers.
A description of PinSketch’s algorithms at a high level is explained in this document.

Questions

How are finite field elements represented in Python? How do we perform additions, subtractions, multiplications, and divisions on them? How are polynomials with finite field coefficients represented in Python?
Imagine Alice and Bob have sets {a,b,c,d,f} and {b,d,e,f} respectively, where the variables represent distinct non-zero 8-bit field elements. Assume that Alice knows ahead of time that the (symmetric) difference between those two sets is not more than 3. Alice is going to send a sketch of her elements to Bob, so that Bob can learn the differences. How many bits will her sketch be in size? Looking at add, what field elements will that sketch consist of?
Following the same example, Bob will compute a sketch over his own elements, and combine it with the sketch received from Alice. What field elements will that combined sketch contain? Noting that it only contains odd powers, how will Bob restore the even ones (see decode)? Would this work if we weren’t using power-of-two fields?
The next step is converting the power sums to a polynomial using the Berlekamp-Massey. What does this algorithm do for us?
The final, and most complicated step to recovering the set of differences is finding the roots of the obtained polynomial. To do so, we first verify that the obtained nth degree polynomial actually has n distinct roots. It relies on the property that (x^(fieldsize) - x) has every field element exactly once as root. How does that let us test if the polynomial is fully factorizable into n roots? Why is this test necessary?
To actually find the roots, the Berlekamp Trace Algorithm is used. It uses the trace function t(x) = x + x^2 + x^4 + ... + x^(fieldsize/2) which maps every element of a field of size 2^n to 0 or 1. In our 8-bit field that means t(x) = x + x^2 + x^4 + x^8 + x^16 + x^32 + x^64 + x^128. This means that for any non-zero field element p, tr(p*x) also has this property, and every choice of p will map a different subset of field elements to 0 (and the others to 1). How is this property used to recursively split the polynomial into smaller and smaller ones?

Appendix

(notes added after the meeting)

Notes on the math used in Minisketch:

Meeting Log

18:00

<jnewbery> #startmeeting

18:00

<jonatack> hi

18:00

<sipa> hi

18:00

<emzy> hi

18:00

<elle> hi

18:00

<jnewbery> hi folks! Welcome to Bitcoin Core PR Review Club

18:00

<glozow> hi!

18:00

<OliP> Hi!

18:00

<dergoegge> hi

18:00

<iodize> hi

18:00

<aferreira44> Hi

18:00

<jnewbery> Feel free to say hi to let everyone know you're here. And let us know if this is your first time.

18:00

<aferreira44> First time here

18:00

<prayank> hi

18:00

<michaelfolkson> hi

18:00

<dylanm> Hi there, second time

18:00

<Caralie> hi

18:00

<iodize> welcome aferreira44

18:00

<jnewbery> welcome aferreira44!

18:01

<sdaftuar> hi

18:01

<schmidty> hi

18:01

<andozw> yo

18:01

<ecola> hi

18:01

<sipa> welcome all!

18:01

<jnewbery> This week is a little bit different. It's the first time we're talking about a PR that isn't in the Bitcoin Core repo.

18:01

<lightlike> Hi

18:01

<Shing> hi, first time here

18:01

<jnewbery> We'll be covering some of the math in Minisketch. Notes and questions kindly provided by sipa are here: https://bitcoincore.reviews/minisketch-26

18:02

<jnewbery> ok, take it away, sipa!

18:02

<sipa> cool, thanks jnewbery!

18:02

<sipa> this will also be a bit different as it's primarily reviewing the algorithm/math itself, and not so much the implementation

18:03

<sipa> i have no idea how this is going to go, but i'm sure it'll be interesting

18:03

<amiti> hi!

18:03

<fodediop> hi

18:03

<sipa> who has had a chance to look over the notes on https://bitcoincore.reviews/minisketch-26 ?

18:03

<jnewbery> y

18:03

<michaelfolkson> I have but not for long enough :(

18:03

<jonatack> a bit

18:03

<dergoegge> ~y

18:03

<Murch> hi

18:03

<glozow> y

18:03

<am74> I took a look but not a mathematician so don't fully understand everything

18:04

<shafiunmiraz0> a bit

18:04

<jnewbery> Shing: welcome!

18:04

<eoin> Hi, first time here

18:04

<sipa> that's perfectly fine... i don't expect that we'll go over all of it

18:05

<emzy> y

18:05

<sipa> so let's dive in with the first question, and we'll see where we go from there

18:05

<willcl_ark> hi

18:05

<sipa> How are finite field elements represented in the Python code?

18:05

<ecola> just lurking here, totally out of my depth

18:05

<jnewbery> eoin: welcome!

18:06

<glozow> In `GF2Ops`, elements are represented as integers. In the generation code I found it helpful to note that `GF(2^field_size)` is isomorphic to `Z_2[x] / (f)` for polynomial f of degree `field_size`, so when generating the field, the integers are like polynomials where the nth bit (from the right) is the coefficient for `x^{n-1}`.

18:06

<am74> There are some libraries like pyfinite

18:06

<jnewbery> ecola: lurking is also fine. We're all learning

18:06

<eoin> Thank you jnewbury

18:06

<lightlike> as numbers from 1 to 2^b-1 where b is field size

18:06

<sipa> glozow: yes, exactly :)

18:06

<sipa> lightlike: 0 as well

18:06

<glozow> Oh, and polynomials in the poly_* functions with finite field coefficients are arrays of integers with the ith element of the array corresponding to the coefficient for `x^i` (with no trailing zeroes).

18:07

<sipa> someone prepared :)

18:07

<sipa> glozow: exactly

18:07

<michaelfolkson> For those who feel completely out their depth this might be helpful: https://bitcoin.stackexchange.com/questions/102582/what-problem-set-is-the-minisketch-library-designed-for-what-is-it-currently-us

18:07

<michaelfolkson> "In theory it could be used for any problem where you have two sets of data and want to work out how those data sets differ (which specific elements are present in one but missing in another) as efficiently as possible."

18:08

<sipa> so, our field elements are represented as the integers from 0... 2^bits - 1, inclusive

18:08

<emzy> Can you think of it as a more general bloomfilter?

18:08

<sipa> but they are not integers

18:09

<sipa> emzy: there is some similarity, but it's closer to IBLT

18:09

<sipa> the big difference with normal bloom filters is that you can recover the actual elements from a sketch

18:09

<sipa> how do we perform addition between our field elements?

18:10

<glozow> simply xor!

18:10

<dergoegge> addition is a simple XOR (i think subtraction as well, not to confident about that tho)

18:10

<sipa> dergoegge: very good, you answered my next question too

18:11

<sipa> can you see why?

18:11

<jnewbery> dergoegge: if you have two field elements a and b, and a + b is a ^ b, what is a + b + b?

18:12

<dergoegge> jnewbery: a :)

18:12

<sipa> so if a + b + b = a, then a + b = a - b :)

18:12

<jonatack> right, it cancels itself out (set semantics)

18:12

<dergoegge> so subtraction is also a simple xor

18:12

<sipa> this self-cancelling aspect will become important later

18:12

<sipa> what about multiplication?

18:13

<glozow> more generally, when characteristic 2, additive inverse of any element is equal to itself?

18:14

<glozow> Multiplication (x, y): For each bit in y (from LSB to MSB), add x if the bit = 1, then multiply by 2. Equivalent to adding y copies of x.

18:14

<jnewbery> glozow: exactly! True for any GF(2^n)

18:14

<sipa> glozow: indeed, that's almost the definition of characteristic (the characteristic of a field is the maximum number of times you need to add an element to itself to be guaranteed to reach 0)

18:14

<sipa> every 2^n sized field has characteristic 2

18:15

<sipa> so if we're working in the 8-bit field, what would {3} * {3} be (i'm going to write field elements in {} to not give the impression they're actually integers)

18:15

<dergoegge> i understand multiplication except the part with the irreducible polynomials.

18:16

<dergoegge> is that comparable with modulo but for polynomials?

18:16

<sipa> dergoegge: let's go into that then

18:16

<sipa> yes, it is exactly that

18:16

<sipa> the irreducible polynomials correspond to the prime numbers in Z

18:16

<sipa> if you work modulo a non-prime number, you get zero divisors (two non-zero numbers that multiplied together give 0)

18:16

<sipa> e.g. mod 9, 3*3 = 0

18:17

<sipa> if you work modulo a non-irreducible polynomial, you also get zero divisors

18:17

<sipa> say we were trying to construct GF(4), but were working modulo x^2+1 ({5}), what would (x+1)*(x+1) be? ({3} * {3}) ?

18:18

<jnewbery> dergoegge: it's the generalization of that. Integers are polynomials of degree 0, so p is an irreducible polynomial of degree 0

18:18

<sipa> what's (x+1) * (x+1) in general?

18:19

<sdaftuar> x^2+1, so the product you asked about would be 0, when working modulo x^2+1

100

18:19

<glozow> x^2 + 1?

101

18:19

<lightlike> x^2 + 1?

102

18:19

<sipa> right

103

18:19

<sdaftuar> and hence that would not be a field

104

18:19

<sipa> in general it is (x^2 + 2x + 1), but that's the integer 2=1+1, not {2}

105

18:19

<sipa> 1+1 = 0 in our field

106

18:20

<prayank> sipa: (x+1)^2 = x^2 + 1 + x

107

18:20

<sipa> prayank: no

108

18:20

<sipa> x^2 + 1 + x + x

109

18:20

<prayank> Sorry 2x

110

18:20

<prayank> Yeah

111

18:20

<glozow> prayank: 2x = x + x = 0 here

112

18:20

<sipa> so if we were working modulo (x^2 + 1) {5}, the field element x+1 {3} would not have an inverse

113

18:21

<sipa> and it's a requirement for a field that every non-zero element has an inverse

114

18:21

<sipa> otherwise we can't solve linear equations over it

115

18:22

<sipa> ok, so, multiplication over the integers-representing-field-elements is really: taking them apart, interpreting the bits as coefficients of a polynomial over Z2 (integers mod 2), multiplying those polynomials, and then doing the result modulo an irreducible polynomial of degree (bits)

116

18:22

<sipa> what irreducible polynomial is used for the 256-sized field?

117

18:23

<jnewbery> GF2_MODULI[8]

118

18:23

<glozow> x^9 + x + 1?

119

18:23

<glozow> WAIT OOPS I MISCOUNTED

120

18:23

<sipa> glozow: that's for a field with 512 elements

121

18:24

<glozow> x^8 + x^4 + x^3 + x + 1

122

18:24

<sipa> indeed

123

18:24

<jnewbery> so I was wrong as well, should be GF2_MODULI[7] ?

124

18:24

<glozow> no, the first 2 are None

125

18:24

<sipa> jnewbery: no, GF2_MODULI[8]

126

18:24

<jnewbery> oh sorry. None, None

127

18:24

<jnewbery> got it

128

18:24

<sipa> in minisketch the lowest-degree irreducible polynomial is used for every field size, and it appears no trinomial exists for size 2^8

129

18:25

<sipa> so a pentanomial is used

130

18:25

<jnewbery> The moduli are here in the code: https://github.com/sipa/minisketch/pull/26/files#diff-c1cbd3b5834cdb8c5408a4527171bfa260f4a104e89e051700a9da3dc69f56a4R18-R83

131

18:25

<sipa> so: integers represent field elements

132

18:25

<sipa> but in what follows we're going to work with polynomials _over_ those field elements

133

18:25

<sipa> and it's important not to confuse the two

134

18:26

<sipa> as polynomials are also used in the definition of the field elements themselves

135

18:26

<sipa> the field elements are represented as integers in the python code; polynomials are represented as lists of integers (if p is a list, p[0] is the 0th degree coefficvients, p[17] is the 17th degree coefficient, etc)

136

18:27

<sipa> let's move on to the next question

137

18:27

<sipa> Imagine Alice and Bob have sets {a,b,c,d,f} and {b,d,e,f} respectively, where the variables represent distinct non-zero 8-bit field elements. Assume that Alice knows ahead of time that the (symmetric) difference between those two sets is not more than 3. Alice is going to send a sketch of her elements to Bob, so that Bob can learn the differences. How many bits will her sketch be in size?

138

18:27

<jonatack> A sketch of b-bit elements with capacity c can be stored in bc bits, so Alice's sketch will be max 5 elements x 8 bits = 40 bits in size?

139

18:27

<dergoegge> 8*3 so 24

140

18:27

<lightlike> 3*8=24 bits?

141

18:27

<glozow> I got 24, `capacity` = 3, 8-bit elements (`field_size` = 8), we get bc = 24 bits.

142

18:27

<sipa> feel free to ask me any time if something isn't clear in the question

143

18:27

<jonatack> or capacity 3 x 8 bits = 24

144

18:28

<sipa> yes, exactly 24 bits

145

18:28

<jonatack> was unsure here

146

18:28

<dergoegge> in serialized_size whats the +7 for?

147

18:28

<sipa> and what field elements would the sketch Alice compute for her set, with capacity 3?

148

18:28

<sipa> dergoegge: rounding up to a multiple of bytes

149

18:28

<glozow> it's ceil division for bytes

150

18:28

<sipa> look at the add function

151

18:29

<jonatack> field elements {a,c}

152

18:29

<glozow> oo, typo in add comment? `1 <= element < 2**field_size`

153

18:29

<sipa> jonatack: that's not right, we're looking for something with 3 elements

154

18:30

<sipa> glozow: i believe that comment is correct

155

18:30

<sipa> add requires "numbers" as input in range [1,255]

156

18:30

<glozow> er i mean, it currently says 1**field_size

157

18:30

<sipa> oh, oops, yes!

158

18:30

<sipa> absolutely

159

18:30

<sipa> it should be 2**field_size

160

18:31

<sipa> so, Alice is creating a Minisketch object, with field_size=8 capacity=3

161

18:31

<sipa> what are its _odd_syndromes initialized to?

162

18:32

<glozow> [0, 0, 0]?

163

18:32

<sipa> self._odd_syndromes = [0] * capacity

164

18:32

<sipa> indeed

165

18:32

<sipa> now Alice calls sketch.add(a)

166

18:32

<lightlike> should alice have a+b+c+d+f, a**3+b**3+c**3+d**3+f**3, a**5+b**5+c**5+d**5+f**5?

167

18:32

<sipa> what will the _odd_syndromes become?

168

18:32

<sipa> lightlike: exactly

169

18:33

<sipa> (where + and ** are field operations, not integer operations)

170

18:33

<sipa> add(a) will change _odd_syndromes to [a, a**3, a**5]

171

18:33

<sipa> add(b) will change it to [a+b, a**3 + b**3, a**5 + b**5]

172

18:33

<sipa> etc

173

18:34

<glozow> noice

174

18:34

<sipa> and serialize will output 3 bytes that pack those 3 field elements

175

18:35

<sipa> if Bob wants to learn what elements Alice has he doesn't, and the other way around, he will receive this sketch [a+b+c+d+f, a**3+b**3+c**3+d**3+f**3, a**5+b**5+c**5+d**5+f**5] from Alice

176

18:35

<sipa> and compute his own sketch

177

18:35

<sipa> and then combine the two to find the sketch of the symmetric difference of their sets

178

18:35

<sipa> how does he perform this combining?

179

18:35

<dergoegge> i though the // 8 in serialize size was a comment, lol oh man

180

18:35

<glozow> xor them!

181

18:35

<sipa> haha

182

18:35

<sipa> glozow: exactly

183

18:36

<sipa> and this is why the a+a = 0 property matters: simply adding field elements gives you naturally a sketch of the symmetric difference

184

18:36

<amiti> wow, thats so cool.

185

18:36

<sipa> so what's the combined sketch?

186

18:36

<sipa> for elements {a,b,c,d,f} and {b,d,e,f}

187

18:36

<lightlike> a+c+e, a**3+c**3+e**3, a**5+c**5+e**5

188

18:36

<emzy> nice math. :)

189

18:37

<sipa> yeah

190

18:37

<sipa> after doing that, Bob needs to decode the sketch to find the elements it corresponds to

191

18:37

<jnewbery> dergoegge: to answer your question from earlier, it's to get the smallest number of bytes that will fit the right number of bits. // in python is floor division

192

18:38

<sipa> there are 3 steps in that: first computing the even power-sums from just the odd-power ones, applying Berlekamp-Massey to convert it to a polynomial, and then finding the roots of that polynomial

193

18:38

<sipa> how is it possible to compute the even power-sums from just the odd ones?

194

18:38

<dergoegge> jnewbery: yea thanks i was just confused because i am looking at c code all day and // normally indicates comments :D

195

18:39

<sipa> so we're given [a+c+e, a**3 + c**3 + e**3, a**5 + c**5 + e**5]

196

18:39

<glozow> Since the polynomials are over a finite field with a characteristic 2, `a + a = 0` for every `a` in the field and thus `(x + y)^2 = x^2 + y^2`. It's also known that `s_0 = n` This makes it possible to compute all even syndromes of the sketch from the odd syndromes since every even is some `s_{2i} = x^{2i} + y^{2i} + ... = (x + y + ...)^{2i} = s_i^2`

197

18:39

<sipa> and we want [a+c+e, a**2 + c**2 + e**2, a**3 + c**3 + e**3, a**4 + c**4 + e**4, a**5 + c**5 + e**5, a**6 + c**6 + e**6]

198

18:40

<jonatack> looking at pyminisketch.py::L430-434

199

18:40

<sipa> yeah, indeed, the sum of squares is just the square of the sums

200

18:40

<lightlike> so square the last odd one for the next even one

201

18:40

<sipa> and the sum of 4th powers is the square of the sums of squares

202

18:40

<sipa> and the sum of 6th powers is the square of the sums of cubes

203

18:41

<sipa> not necessarily the last odd one

204

18:41

<sipa> e.g. the 4th power sum is the square of sum of 2nd powers

205

18:41

<lightlike> oh, ok

206

18:42

<sipa> and this is indeed special about characteristic 2

207

18:42

<sipa> if we were working in a different characteristic field, we'd need to send the odd powers too

208

18:42

<sipa> giving an instant factor 2 size increase

209

18:42

<glozow> :math-fireworks:

210

18:42

<prayank> lol

211

18:43

<sipa> the next step in the recovery process is the Berlekamp-Massey algorithm

212

18:43

<jnewbery> *need to send the even powers too ?

213

18:43

<sipa> jnewbery: yes!

214

18:43

<sipa> unfortunately, i have no clue *why* the BM algorithm works

215

18:43

<sipa> but we can talk about *what* it does for us

216

18:43

<sipa> anyone have an idea?

217

18:44

<michaelfolkson> I've read the wiki entry :)

218

18:44

<glozow> BM finds the polynomnial (determined by coefficients) that generates the the syndromes. i.e. minimal polynomial L (its coefficients `l_0 ... l_n`) s.t. `S(M)*L` = the power sums.

219

18:44

<sipa> if our sketch just contained the single element a

220

18:45

<sipa> so [a, a**2, a**3, a**4, a**5, a**6] is our expanded syndromes

221

18:45

<sipa> what polynomial would BM output?

222

18:46

<sipa> note that every element in the list is the previous one times a

223

18:46

<sipa> that's a geometric series, remember?

224

18:46

<jnewbery> I'm going to guess (x-a)

225

18:47

<sipa> wrong ;)

226

18:47

<sipa> but very close

227

18:47

<jnewbery> is anyone else brave enough to guess or do I have to be wrong again?

228

18:48

<jnewbery> (x-a)^6 ?

229

18:48

<sipa> i'll help: it's (1-a*x)

230

18:48

<sipa> or [1, a] for us in list representation

231

18:49

<sdaftuar> the roots of the polynomial berlekamp massey spits out are the inverses of what we want?

232

18:49

<sipa> yeah

233

18:49

<sdaftuar> crazy

234

18:49

<sipa> so what BM does

235

18:49

<sipa> in a way

236

18:49

<sipa> is find a list of coefficients, which when convoluted with that many subsequent elements in the input, gives 0

237

18:49

<sipa> so for our 1st degree example here

238

18:50

<sipa> the input lists are [a, a**2], [a**2, a**3], [a**3, a**4], ...

239

18:50

<glozow> berlekamp massey doesn't give us the roots right? just coefficients - we get roots afterward?

240

18:50

<sipa> and for each of them, multiplying the first with -a, and adding the second times 1, gives 0

241

18:50

<sipa> right?

242

18:51

<sipa> (a * -a) + (a * 1) = 0

243

18:51

<sipa> it gives a way to compute the next term of the list in function of the previous one

244

18:51

<sipa> like the fibonacci series

245

18:52

<sipa> has rule x_{n+2} = x_{n} + x_{n+1}

246

18:52

<sipa> this one has rule x_{n+1} = a*x_{n}

247

18:52

<sipa> the coefficients of that rule are [-a, 1]

248

18:52

<sipa> and that's what BM finds

249

18:53

<sipa> given a list generated by a fibonacci-like rule, find the rule

250

18:53

<sipa> better: find the simplest rule

251

18:54

<sdaftuar> so BM finds a polynomial in GF(2^n)[x] of degree at most c, in our application?

252

18:54

<sdaftuar> i guess it can fail sometimes too?

253

18:54

<sipa> and it turns out that if you have a list [a, a**2, a**3, a**4, ...], that means the rule (-a)*x_{n} * 1*x_{n+1}

254

18:55

<sipa> if your list is [a+b, a**2 + b**2, a**3 + b**3, ...], it finds the rule ((-a)*x_{n} * 1*x_{n+1}) * ((-b)*x_{n} * 1*x_{n+1})

255

18:55

<jonatack> beautifully elegant

256

18:55

<sipa> sdaftuar: BM will always find something, but if it finds a polynomial with degree higher than the capacity, we know something went wrong (as that would mean it's generated from more than capacity elements)

257

18:57

<sipa> so applying BM to the sketch in our problem, will find [a*b*c, a*b + b*c + a*c, a + b + c, 1] (i.e. the polynomial (a+b+c) + (a*b + b*c + a*c)*x + (a+b+c)*x^2 + x^3)

258

18:57

<sipa> and the inverses of its roots are a, b, c

259

18:57

<sdaftuar> sipa: from a quick look at the python code, the runtime won't be crazy in those scenarios right? looks like it's just got runtime based on the number of syndromes?

260

18:58

<sipa> BM does a quadratic number of field operations i believe

261

18:58

<sipa> or rather O(sketch_size * differences)

262

18:59

<sipa> of course, the field operations do become slower in function of the field size

263

18:59

<sipa> i'm afraid that's our time

264

18:59

<lightlike> so the most expensive step is the calculation of the roots, not BM?

265

18:59

<sipa> the roots

266

18:59

<sipa> which, i'm afraid time doesn't allow us to go into

267

18:59

<sipa> but i'm happy to stay longer

268

18:59

<glozow> 😢

269

18:59

<jnewbery> we could also do a follow-up another week if you're happy to do that, sipa?

270

19:00

<jnewbery> might give people a bit more time to go over notes again and read up on BM

271

19:00

<michaelfolkson> +1

272

19:00

<jonatack> sipa: pyminisketch.py runs in about a minute for me...was anyone able to build the c++ libminisketch? make did not run for me

273

19:00

<sipa> the next step is about root finding :)

274

19:01

<jnewbery> I think glozow wants you to ask q5

275

19:01

<sipa> jnewbery: sure, find be me... though it may be less reviewy

276

19:01

<michaelfolkson> Did you run it on test vectors jonatack?

277

19:01

<sipa> jonatack: it works for me, though the unit tests run forever, which is kind of weird and may make it look like failure

278

19:01

<jonatack> michaelfolkson: i ran "time pyminisketch.py"

279

19:01

<sipa> 5. The final, and most complicated step to recovering the set of differences is finding the roots of the obtained polynomial. To do so, we first verify that the obtained nth degree polynomial actually has n distinct roots. It relies on the property that (x^(fieldsize) - x) has every field element exactly once as root. How does that let us test if the polynomial is fully factorizable into n

280

19:02

<sipa> roots? Why is this test necessary?

281

19:02

<michaelfolkson> Presumably you can feed two sets into it and it finds the elements that aren't in both sets?

282

19:02

<sipa> michaelfolkson: that's what the tests do

283

19:02

<glozow> roots are where our actual stuff is!

284

19:03

<glozow> A n-degree polynomial can have at most n roots. It _may_ have fewer roots that are repeated (note n-degree polynomial with unique roots is synonymous with having n roots).

285

19:03

<sipa> glozow: could it also just have fewer roots?

286

19:03

<sipa> (even when counting repetition)

287

19:03

<glozow> :think

288

19:03

<sipa> hint: we answered this in question 1 ;)

289

19:04

<michaelfolkson> The tests of the C++ code

290

19:04

<sipa> well, touched on it

291

19:04

<michaelfolkson> Did anyone feed two sets into the Python code?

292

19:04

<sipa> michaelfolkson: that's what the tests do

293

19:04

<glozow> is the answer just no o.O

294

19:05

<sdaftuar> depends on what field you're looking for roots in!

295

19:05

<sipa> glozow: the answer is yes, you can definitely have fewer roots

296

19:05

<glozow> aw man, how have fewer roots?

297

19:05

<sipa> in non-algebraically-closed fields, but all finite fields are not algebraically closed ;)

298

19:05

<michaelfolkson> sipa: Oh within this Python file right, I was looking elsewhere

299

19:05

<sipa> glozow: what are the roots of x^2 + x + 1?

300

19:06

<glozow> oh, das not reducible

301

19:06

<michaelfolkson> das correct

302

19:06

<sipa> yes, so what are its roots?

303

19:06

<jnewbery> there are none?

304

19:06

<sipa> yes, exactly

305

19:06

<sdaftuar> in GF(4), it has roots

306

19:06

<sipa> sdaftuar: ha, right

307

19:06

<sdaftuar> i mean, it really depends on the question :)

308

19:07

<sipa> but every finite has polynomials (with coefficients in that field) which are irreducible

309

19:07

<sdaftuar> right

310

19:07

<sipa> and irreduclbe polynomials have no roots by definition

311

19:07

<sipa> you can also have reducible polynomials which have no roots though

312

19:07

<sipa> e.g. a 4th degree polynomial may be factorizable into 2 2nd degree polynomials, which are individually irreducible

313

19:08

<jnewbery> ok, shall we wrap it up there and schedule a follow-up? It'd be a shame for people who had a hard stop to miss out on too much of the fun

314

19:09

<sipa> sgtm

315

19:09

<jnewbery> glozow prepared some excellent additional notes here for anyone who wants a bit more explanatory material: https://github.com/glozow/bitcoin-notes/tree/master/minisketch

316

19:09

<jnewbery> thanks so much sipa. This has been 🤯🤯🤯

317

19:09

<fodediop> This was a heavy one, but I beleive lol

318

19:10

<fodediop> thank you sipa

319

19:10

<glozow> thanks professor sipa!!!!

320

19:10

<dergoegge> thanks sipa!

321

19:10

<dergoegge> 🤯

322

19:10

<OliP> Thanks sipa!

323

19:10

<amiti> yeah this was awesome, I learned a lot from lurking :)

324

19:10

<prayank> Thanks everyone

325

19:10

<jonatack> great stuff, thank you sipa

326

19:10

<emzy> Thanks sipa!

327

19:10

<shafiunmiraz0> Thank you sipa. Thank you jnewbery. Thank you everyone

328

19:10

<lightlike> thanks sipa!

329

19:10

<michaelfolkson> Cool, thanks sipa. Will check out the glozow notes too

330

19:10

<jnewbery> #endmeeting

331

19:11

<sipa> glozow: This is important because our set {m_1 ... m_n} is derived from the roots of the polynomial. The Berklekamp Trace Algorithm also requires this I believe.

332

19:12

<sipa> that's exactly the answer; our root finding algorithm (next club!) requires that the polynomial is fully factorizable into distinct 1st degree factors

333

19:12

<sipa> (it'll go into an infinite loop otherwise)