Minisketch C++ code (math and cryptography)

https://github.com/sipa/minisketch/tree/master

Host: sipa  -  PR author: sipa

Notes

  • libminisketch is a library implementing the PinSketch set reconciliation algorithm. It is the basis for the efficient relay protocol in Erlay (covered in a previous review club), but is generally usable for various applications that use set reconciliation.

  • In yet another previous review club we covered some of the algorithms involved by looking at the Python reimplementation. In this review club we will be looking at some of the C++ code instead that implements it efficiently.

  • The codebase is roughly organized as follows:

    • The main entrance point into the library is minisketch.cpp. It is really just a collection of functions that construct Sketch objects (defined in sketch.h) for various fields, and then exposes C functions that invoke the corresponding Sketch member functions.

    • The actual meat of the PinSketch algorithm (including Berlekamp-Massey and the root finding algorithm) is in sketch_impl.h. The implementation code is instantiated separately using templates for every field implementation, but they all implement the virtual Sketch class, presenting a uniform interface that minisketch.cpp can use.

    • There are up to 3 distinct field implementations for every field size (libminisketch currently supports field sizes from 2 to 64 bits, inclusive):

      • The generic one that works on every platform, with common code in fields/generic_common_impl.h and specific definitions in fields/generic_*.cpp. It relies on a number of shared integer functions that help represent GF(2^bits) field elements as integers, found in int_utils.h.

      • Two “clmul”-based implementations that use intrinsics to access special instructions, and only run on certain x86/64 CPUs. These instructions are specifically designed to help with GF(2^n) multiplications, and they greatly improve performance when available (which is the case for pretty much all x86/64 CPUs since 2013 or so). One implementation is optimized for fields that have a modulus of the form x^bits + x^a + 1, while another works for any modulus. The common code for these can be found in fields/clmul_common_impl.h, and specific field definitions in fields/clmul_*.cpp.

    • Finally there are tests in test-exhaust.cpp (extended and renamed to test.cpp in PR #33) and benchmarks in bench.cpp.

Questions

  1. Why is there a separate instantiation of the PinSketch algorithm for every field?

  2. If you look at the fields/*.cpp files, you will notice there are large amounts of hardcoded linear transformation tables (with code for defining them in lintrans.h. Which tables do you see, and why are they useful? What do they expand to at compile time?

  3. Why are there instances of Sketch for different fields in separate .cpp files? Could all the generic ones be merged into a single file? Could the generic ones and the clmul ones be merged?

  4. Do you notice any optimizations in sketch_impl.h that weren’t present in the Python code?

  5. What is tested in test-exhaust.cpp? This may be clearer if you look at PR #33.

  6. For every field implementation Field there is a member type Field::Elem (which for all current fields is just an integer type), and invoking field operations goes through Field (e.g. multiplying two field elements a and b is done through field.Mul(a,b). A more intuitive design would be to make Field::Elem a separate class for each field, with its own member functions for finite field arithmetic (so that the above could be done simply using a*b for example). Any idea why this approach is not used?

  7. Bonus question: what is special about field size 58 (look at fields/clmul_8bytes.cpp)?

Meeting Log

  119:00 <@jnewbery> #startmeeting
  219:00 <jeremyrubin> hi
  319:00 <@jnewbery> hi folks! Welcome to Bitcoin Core PR Review Club. A club about review Bitcoin Core PRs.
  419:00 <@jnewbery> *reviewing
  519:00 <schmidty> gi
  619:00 <gleb> hi
  719:00 <glozow> hi
  819:00 <@jnewbery> feel free to say hi to let everyone know you're here
  919:00 <lightlike> hi
 1019:00 <pglazman> hi
 1119:00 <svav> hi
 1219:00 <michaelfolkson> hi
 1319:00 <emzy> hi
 1419:01 <@jnewbery> anyone here for the first time?
 1519:01 <wiscojabroni> yes me!
 1619:01 <@jnewbery> welcome wiscojabroni!
 1719:02 <wiscojabroni> thank you!
 1819:02 <dictation> hi
 1919:02 <larryruane_> hi everyone
 2019:02 <sipa> hi everyone and welcome
 2119:02 <ssd> hi
 2219:02 <@jnewbery> just a couple of reminders: the host is here to guide the discussion with some prepared questions (here https://bitcoincore.reviews/minisketch), but feel free to ask questions at any time
 2319:03 <@jnewbery> no need to ask if you can ask. Just ask away! We're all here to learn
 2419:03 <@jnewbery> ok, over to sipa
 2519:03 <andrewtoth_> hi
 2619:03 <sipa> hi everyone
 2719:03 <murchin> hey
 2819:03 <hernanmarino> hi ! First timer here
 2919:03 <murchin> Hi Hernan :)
 3019:03 <sipa> this is a bit of an unusual review club too, as we're nkt reviewing a PR
 3119:04 <@jnewbery> welcome hernanmarino!
 3219:04 <sipa> but an enture project/repository
 3319:04 <dulcedu> hola!
 3419:04 <hernanmarino> thanks
 3519:04 <sipa> that means we're obviously not going as deep
 3619:05 <sipa> and i've tried to make the questions/summary primarily about code organization
 3719:05 <svav> Is the main purpose of libminisketch being used in Bitcoin Core to provide efficiency gains to the relay protocol Erlay?
 3819:05 <sipa> svav: indeed
 3919:05 <sipa> that's the only reason (for now) why we'd want it in bitcoin core
 4019:06 <jeremyrubin> sipa: not sure if covered before or if you want the focus to be on the algo, but maybe you could set up the general problem we're trying to solve for
 4119:06 <@jnewbery> jeremyrubin: https://bitcoincore.reviews/minisketch-26
 4219:06 <sipa> jeremyrubin: we already did two review clubs on the algo
 4319:06 <svav> When you refer to "fields", how many fields is this, and are these just the common fields of a Bitcoin transaction?
 4419:06 <sipa> svav: they refer to mathematical fields
 4519:07 <@jnewbery> We've gone over the high level concepts in a couple of previous review club sessions. I think it makes sense to focus on the c++ implementation here
 4619:07 <jeremyrubin> gotcha; will look at earlier notes!
 4719:07 <sipa> search for galois field on wikipedia, or read the transceipts of tbe previous review clubs
 4819:07 <jonatack> hi
 4919:07 <Gambo> hello!
 5019:08 <sipa> also i apologize for my slow and slightly erratic typing; i:m currently unable to do this from anywhere my phone
 5119:08 <sipa> so let's dive in with the first question
 5219:08 <sipa> Why is there a separate instantiation of the PinSketch algorithm for every field?
 5319:08 <dictation> Why is there a separate instantiation of the PinSketch algorithm for every field?
 5419:09 <sipa> and again, this refers to the specific galois fields used in the algorithm
 5519:09 <larryruane_> is this a classic space-time tradeoff? Separate instatiations means the compiler can optimize better?
 5619:09 <glozow> I thought (1) composability and (2) performance
 5719:09 <sipa> erlay specifically only uses the 32-bit field
 5819:09 <glozow> The fields have been chosen so that some sketch algorithms work for all of our fields. However, some operations are optimized per field so that you can just say like `field.`multiply these elements or `field.`solve quadratic root.
 5919:09 <lightlike> Because various precomputed values are used, which are different for different-sized field
 6019:09 <jeremyrubin> Also i'd imagine type safety is nice
 6119:09 <sipa> but even for 32 bits, we have 2 or 3 different implementation of that field
 6219:10 <glozow> e.g. I think we have a table `QRT_TABLE_N` for each field so that during poly root finding, we can quickly look up the solution for x^2 + x = a for each element in the field? (is that right?)
 6319:10 <sipa> yeah, all good answers
 6419:10 <dkf> This is out of my domain but a thought: because due to linearity we need to be able to accumulate all the fields for certain checks?
 6519:10 <sipa> glozow: that's correct, but it doesn't really require fulky instanticating the full algorithm fkr every fiekd
 6619:11 <sipa> it coukd just have a dispatch table too that says "if field_size == 32 use this QRT table"
 6719:11 <glozow> ah mhm
 6819:11 <sipa> but yes, in general the answer is just performance
 6919:11 <@jnewbery> What do SQR and QRT stand for in the precomputed values?
 7019:11 <sipa> we get an enormous speedup from being to inline everythig
 7119:11 <jonatack> istm it was for optimizing for some platforms
 7219:12 <svav> I've been told we are talking about mathematical fields, so if anyone needs a definition https://en.wikipedia.org/wiki/Field_(mathematics)
 7319:12 <glozow> jonatack: I think that's the templating by implementation
 7419:12 <larryruane_> jnewbery: I think square root and quadratic root
 7519:12 <jonatack> (and chip architectures)
 7619:12 <sipa> larryruane_: *square* and quadraric root
 7719:12 <sipa> not square root
 7819:13 <sipa> svav: indeed
 7919:13 <sipa> svav: but look at the previous two meetups
 8019:13 <@jnewbery> How long do those precomputed values take to calculate. Could it be done at compile time?
 8119:13 <sipa> jnewbery: they are primarily computed ar compile time, actually :)
 8219:14 <sipa> only a few constants are included in the source code
 8319:14 <sipa> they're generated using a sage script that takes a few minutes afaik
 8419:14 <glozow> is this the linear transformations of the tables in fields/*.cpp?
 8519:14 <sipa> with c++17 we could in theory do everything at compile time, but i don't know how slow it'f be
 8619:14 <glozow> oh the sage script
 8719:15 <sipa> i guess we're on question 2 nlw
 8819:15 <sipa> now
 8919:15 <larryruane_> very productive for me learning about fields was chapter 1 of Jimmy Song's book, Programming Bitcoin
 9019:15 <jeremyrubin> sipa: with incremental compilation should be a one-time cost if you put it in a depends-light header
 9119:15 <@jnewbery> any maybe less readable/reviewable to do it using c++ metaprograming than using a sage script?
 9219:15 <sipa> so: If you look at the fields/*.cpp files, you will notice there are large amounts of hardcoded linear transformation tables (with code for defining them in lintrans.h. Which tables do you see, and why are they useful?
 9319:15 <jeremyrubin> might be worth doing so so that it's "trust minimized"
 9419:16 <sipa> jeremyrubin: wut
 9519:16 <jeremyrubin> (carry on)
 9619:17 <glozow> I was confused what the `RecLinTrans` part does
 9719:17 <sipa> ok
 9819:17 <glozow> but I see: A table SQR_TABLE_N gives us a map from elem a -> square of a for the field GF(2^N).
 9919:17 <glozow> and A table QRT_TABLE_N gives us a map from a -> x such that x^2 + x = a for the field GF(2^N).
10019:17 <sipa> correct
10119:17 <sipa> also correct
10219:17 <sipa> there are more
10319:18 <glozow> There's also SQR2_TABLE_N, SQR4_TABLE_N,
10419:18 <glozow> are those ^4 and ^8 or?
10519:18 <sipa> close, but no
10619:18 <sipa> SQR4 is a table going x -> x^(2^4)
10719:19 <sipa> i.e. squaring 4 times
10819:19 <sipa> why is it possible to have a table for that?
10919:20 <glozow> why it's possible, like why we can calculate them ahead of time?
11019:20 <sipa> maybe a better first question: what do these tables expand to at compile time?
11119:21 <sipa> i'll give the answer, it's quite abstracted away
11219:21 <glozow> compiler makes a `RecLinTrans<>` of the table -> makes a 2^N size array?
11319:21 <glozow> 1 slot for each element in the field?
11419:22 <sipa> not a 2^N size array, that'd be a bjt big if N=64
11519:22 <sipa> it creates a series of tables of size 64
11619:22 <@jnewbery> I only see SQR2_TABLE, SQR4_TABLE, etc in the clmul files. Is that right?
11719:22 <sipa> jnewbery: correct
11819:23 <sipa> jnewbery: they're used for computing inverses
11919:23 <sipa> for clmul fields, multiplication is very fast, so fermat's little theorem is used
12019:23 <sipa> well, i guess FLT doesn't actually apply here because it's not modulo a prime
12119:23 <jeremyrubin> Is it so that we can factor an operation into a polynomial for inverse eqt and then do simpler operations?
12219:23 <jeremyrubin> is that what you're asking?
12319:24 <sipa> but for every field a constant a exists such that x^a is the inverse of x
12419:24 <sipa> and for clmul fields, that is used tocinvert elememts
12519:24 <sipa> and the SQTn tables are used for helping with that
12619:25 <sipa> they let us "skip" a whole bunch of squarings at once
12719:25 <sipa> for non-clmul fields, extgcd based inverses are used
12819:25 <sipa> because it appears faster to do so
12919:25 <glozow> just to clarify, are the fields not the same for clmul and non-clmul?
13019:26 <sipa> the fields are mostly the same
13119:26 <sipa> but the implementations differ
13219:26 <sipa> so back to my earlier questions about the tables
13319:27 <sipa> RecLinTrans expands to a compile-time *list* of tables, each with 64 entries
13419:27 <jeremyrubin> because x^a = (y*z)^a, or x^a = x^b*x^c where a = b+c? so if we can get to known factored form we already have the ops done?
13519:27 <sipa> and then actual evaluation looks at groups of 6 bits of the input field element, and looks up each in a different table
13619:27 <sipa> and xors them (= adding in the field?
13719:28 <sipa> jeremyrubin: not quite; the answer is simply that all these operations are 2-linear operations
13819:28 <jonatack> groups of 8 bits?
13919:28 <sipa> jonatack: no, 6
14019:29 <sipa> jeremyrubin: because in GF(2^n) it is the case that (a+b)^2 = a^2 + b^2
14119:29 <jeremyrubin> ah ok; I could probably answer this by looking at the actual inverse algorithm what it factors to. Can you define "2 linear"
14219:29 <sipa> jeremyrubin: can we written as a multiplication by a matrix over GF(2)
14319:29 <glozow> GF(2)-linear
14419:29 <jeremyrubin> sipa: that seems like an important/cool property, makes sense. sorry if this was answered in prev session
14519:29 <sipa> jeremyrubin: yes
14619:30 <sipa> interpret the input element as a vector of bits, apply a GF(2) square matrix, and reinterpret the result as a field element
14719:30 <sipa> this can be do for anything that raises to a power that is itself a power of 2
14819:30 <jonatack> sipa: 6 as is? e.g. typedef RecLinTrans<uint64_t, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5> StatTable57
14919:30 <jonatack> ah no nvm
15019:31 <sipa> jonatack: so that specfically means the 57-bit transformation is decomposed into 7 6-bit tables and 3 5-bit tables
15119:31 <sipa> and that decomposition onjy works because the transformatjkn is linear
15219:32 <sipa> otherwise we'd need a table entry for every field elememt
15319:32 <sipa> which would be way too large
15419:32 <sipa> does tbat make.sense?
15519:32 <@jnewbery> it sounds like it makes sense :)
15619:32 <sipa> haha
15719:33 <jonatack> sipa: yes. i'm still looking for the 6 bit input though
15819:33 <jonatack> :)
15919:33 <sipa> jonatack: it's all generated at compile time
16019:33 <sipa> in lintrans.h
16119:33 <glozow> so when we're evaluating the square of an element in the 57-bit field,
16219:33 <glozow> we split it into 7 groups of 6 bits + 3 groups of 5 bits, and we look it up in the table
16319:33 <sipa> indeed
16419:34 <jonatack> lintrans.h : "over 3 to 8 bits"
16519:34 <glozow> and then xor the answers we get from these 10 tables
16619:34 <sipa> jonatack: and here specifically 6
16719:34 <glozow> and that gives us the square of the element?
16819:34 <sipa> se all the 6es in the type definition
16919:34 <sipa> glozow: indeed
17019:34 <glozow> ahhhhhhh 🧠
17119:35 <sipa> the Rec stands for recursive fwiw
17219:35 <jeremyrubin> oh, thought it meant rectangular XD
17319:35 <sipa> because it chops off M bits (5 or 6) evaluates them, and xors with a sub table which is again RecLinTrans, but for fewer bits
17419:36 <jeremyrubin> Did I miss the justification for say 6 v.s. 7 bits?
17519:36 <sipa> in c++14 i think there are cleaner ways of doibg so
17619:36 <jeremyrubin> Experimentally picked?
17719:36 <sipa> jeremyrubin: good questiion
17819:36 <sipa> indeed, experimentally decided that mkre wasn't worth it
17919:36 <sipa> on a few platforms
18019:37 <sipa> ok
18119:37 <sipa> Why are there instances of Sketch for different fields in separate .cpp files? Could all the generic ones be merged into a single file? Could the generic ones and the clmul ones be merged?
18219:37 <jeremyrubin> gotcha. and it's a pure function so if that changes it can be adapter per platform
18319:39 <sipa> hint: have you tried compiling the code?
18419:39 <svav> Are there different instances of Sketch to account for different processors?
18519:39 <@jnewbery> Can you build a version of this that only contains the field size that you want to use?
18619:39 <jeremyrubin> https://github.com/sipa/minisketch/blob/93519923665787de63310e32d1188d7cd15cb4e9/src/minisketch.cpp
18719:39 <jeremyrubin> this looks non conditional?
18819:39 <sipa> jnewbery: i think i had a PR for that at some point
18919:40 <sipa> jeremyrubin: what do you mean?
19019:40 <@jnewbery> so that's not the reason for splitting them out into different files
19119:40 <glozow> so i don't think it would make sense to merge clmul and non-clmul given that you'd only use 1 based on your architecture?
19219:40 <jonatack> (a) maybe, (b) no? in minisketch.cpp, the Construct functions depend on #ifdef HAVE_CLMUL, so they need to be separate
19319:40 <sipa> glozow: exactly
19419:40 <jeremyrubin> as in if you use minisketch.cpp it pulls in all of the definitions
19519:40 <sipa> you need separate compilation flags for building clmul code
19619:41 <sipa> and the resulting code *cannot* be invoked unless you knkw yiu're on a clmul-supporting platform (at runtime)
19719:41 <jeremyrubin> It's still not clear to me that the single file can't also ifdef?
19819:41 <lightlike> when compiling, it seemt that the different types go into different libraries e.g. (libminisketch_field_clmul.la, libminisketch_field_generic.la)
19919:42 <@jnewbery> Graph 4 in https://github.com/sipa/minisketch/blob/master/README.md seems to show that clmul is faster than generic for certain field sizes and slower for others
20019:42 <sipa> so it would be possible to merge all the clmul_1byte clmul_2bytes ... etc into ome
20119:42 <sipa> it's just split out so building is fadter and needs less ram
20219:42 <sipa> it's pretty heavy already
20319:42 <sipa> but the clmul_* and generic_* ones cannot be merged
20419:43 <sipa> because they're built with strictly different compilation flags
20519:43 <@jnewbery> because all this template instantiation use a lot of memory to compile?
20619:43 <sipa> jnewbery: yeah, end user code should benchmark what is fastest
20719:43 <sipa> jnewbery: indeed
20819:44 <glozow> if you can use CLMUL implementation, why do you need both the CLMUL and CLMULTRI implementation for a field that has a `x^b + x^a + 1` modulus?
20919:44 <sipa> glozow: either may be faster
21019:44 <sipa> depending on tje hardware
21119:45 <sipa> they're different algorityms and it's hard to say which one is faster when
21219:45 <glozow> ah, okay
21319:45 <sipa> i think for some fields it is clear, and those lack a CLMUL one
21419:45 <@jnewbery> what would the advantage of just compiling just for a single field size? Smaller binary because of the precomputed tables and different template instantiations? Faster build? Anything else?
21519:46 <sipa> jnewbery: API pain
21619:46 <sipa> my goal is adding support for that
21719:46 <sipa> but first adding a generic slow field implementation that works for all field sizes
21819:46 <sipa> so that you don't get a library which doesn't support part of the functionality
21919:47 <sipa> e.g. if used as a shared library
22019:47 <sipa> so then it becomes a compile-time choice which fields to include optimozed implementations for
22119:47 <sipa> rather than which ones to support at all
22219:48 <sipa> Do you notice any optimizations in sketch_impl.h that weren’t present in the Python code?
22319:49 <glozow> question: Why isn't this `const T* QRT` instead of `const F* QRT`? https://github.com/sipa/minisketch/blob/f9540772fac3c1e840208db8c3fe6894526ec1da/src/fields/generic_common_impl.h#L19
22419:49 <glozow> I only saw the obvious one, L140 in the root-finding: for deg2 polynomials, direct quadratic solver -> calls `field.Qrt(input)`
22519:50 <jonatack> from the last session: https://bitcoincore.reviews/minisketch-26-2#l-225
22619:50 <glozow> yeah, i used sipa's hint from last review club heh
22719:50 <sipa> glozow: great question
22819:51 <sipa> the size of runtime tables is different frkm compile-time tables
22919:51 <sipa> these lookup tables are also created at runtime
23019:51 <sipa> e.g. when there are muktiple multiplications with the same value
23119:51 <sipa> then we preprocess that value into a lookup table
23219:51 <glozow> oh is that why they're called StatTable vs DynTable?
23319:52 <sipa> because multiplication with a constant is also linear
23419:52 <sipa> indeed
23519:52 <glozow> oooooooh
23619:52 <sipa> and the DynTable uses smaller lookup tables
23719:52 <sipa> 4 bit iirc
23819:52 <sipa> instead of 6 bits
23919:52 <sipa> also experimentally determined
24019:53 <sipa> glozow: and yes, the direct quadratic solver is what i was goibg for in this question
24119:53 <sipa> another possible answer is of course all the lookup tables
24219:54 <sipa> What is tested in test-exhaust.cpp? This may be clearer if you look at PR #33.
24319:55 <jonatack> agree, the new test file is much clearer afaict
24419:55 <sipa> yeah the old one was really unfinished
24519:55 <sipa> (and clearly missed some bugs...)
24619:56 <jonatack> iterating on things really works
24719:56 <@jnewbery> If the answer to your question isn't "everything", then the file is misnamed
24819:56 <sipa> it is also bekng rebamed in #33
24919:56 <sipa> being renamed
25019:56 <svav> Forgive the basic question, but is the reason for all this Minisketch stuff just to add efficiency to Erlay? Is it involved in more fundamental cryptographic calculations for Bitcoin?
25119:57 <sipa> svav: it is the implementation of the sketch algorithm used by erlay
25219:57 <sipa> that's it
25319:57 <@jnewbery> svav: https://bitcoincore.reviews/minisketch-26 and https://bitcoinops.org/en/topics/erlay/
25419:57 <lightlike> svav: It is an integral part of Erlay, it doesn't add efficiency to it. Erlay adds efficiency to transaction relay.
25519:57 <sipa> it's not "adding" efficiency; it is literally judt *the* implementation used for the sketching
25619:58 <sipa> it is of course a highly optimized implementation, so that it ks efficient enough to be practically usable
25719:58 <jonatack> sipa: before time is over, i'm keen to hear the answers to the last two questions
25819:58 <svav> Thanks for the clarification
25919:58 <sipa> For every field implementation Field there is a member type Field::Elem (which for all current fields is just an integer type), and invoking field operations goes through Field (e.g. multiplying two field elements a and b is done through field.Mul(a,b). A more intuitive design would be to make Field::Elem a separate class for each field, with its own member functions for finite field
26019:58 <sipa> arithmetic (so that the above could be done...
26119:59 <sipa> simply using a*b for example). Any idea why this approach is not used?
26219:59 <glozow> any thoughts of f u z z ing the minisketch code? heh
26319:59 <sipa> i'll just give the andwer
26419:59 <sipa> if we'd do that, you'd get vectors with different types for every field
26519:59 <michaelfolkson> Running a bit low on time so random question. Are there any specific code segments (C++) that demonstrate the speed and memory efficiency of the C++ code over the Python code segment and would be worth analyzing?
26620:00 <glozow> i figured you'd need the tables to be static since they're field-wide
26720:00 <@jnewbery> sipa: I've got to run now, but I'm curious is there's anything people here can do to help progress this? Would it help if someone opened the PR to add this to the Bitcoin Core repo?
26820:00 <sipa> you'd have vector<FieldCLMul32> and vector<FieldGeneric22> etc
26920:00 <jonatack> haaaah
27020:00 <sipa> and all that vector code, along with all suppoting functions, woukd be ~1 MB of executable code
27120:01 <sipa> with the vurrent approach they're all just std::vdctor<uint64_t>
27220:02 <sipa> Bonus question: what is special about field size 58 (look at fields/clmul_8bytes.cpp)?
27320:02 <sipa> jnewbery: i'm planning to PR it soon (after #33 and maybe a few follow-ups?
27420:02 <glozow> sipa: is it specific to 58 only or are there other field sizes with the special property?
27520:03 <@jnewbery> LOAD_TABLE/SAVE_TABLE ?
27620:03 <sipa> glozow: for larger fields than 64 bit there would be more with this property
27720:03 <@jnewbery> https://github.com/sipa/minisketch/blob/53757f736d4e75faf6c4127e8fb452b6a69c4626/src/fields/clmul_8bytes.cpp#L33-L34
27820:03 <sipa> but so far, only 58 has it
27920:04 <jonatack> jnewbery: indeed
28020:04 <sipa> jnewbery: yes, LOAD/SAVE
28120:04 <sipa> yet another answrr to question 2?
28220:04 <sipa> jnewbery: what do those do?
28320:05 <jonatack> conversion tables?
28420:05 <sipa> indeed
28520:05 <sipa> convert from/to what?
28620:06 <jonatack> something about bistream modulus for the impl
28720:06 <sipa> yeah
28820:06 <svav> Something to do with StatTableTRI58 ???
28920:06 <jonatack> gen_params.sage#L252
29020:06 <glozow> oh different modulus but isomorphic? so you're permuting the elements?
29120:07 <sipa> so what is happening here is that for field size 58 there exists a trinomial (x^58 + x^a + 1) irreducble modulis
29220:07 <sipa> which is used for clmultri
29320:07 <sipa> but there is also another modulus, which is shorter
29420:07 <sipa> and the other field implementation uses that one
29520:07 <sipa> yet, we want a stable API
29620:08 <sipa> which consistently interprets bits on the wire the same way, regardless of imllementation
29720:08 <glozow> and we want both because it's not clear which one would be faster?
29820:08 <sipa> indeed
29920:08 <sipa> but even if not, the API specifies the interpretation used publicly
30020:09 <jonatack> thanks!
30120:09 <sipa> if we"d for whatever reason decide to ise a different represnetation internally, we need to convert
30220:09 <sipa> and why is thjs worth it? we do quadratically many operations internally
30320:09 <sipa> but only linear work for conversion
30420:10 <sipa> this was a surprising discovery for me that this conversion was so cheap (just another linear transformation)
30520:10 <glozow> i gotta run but thank you sipa!!! peak nerd snipe
30620:11 <sipa> i'm done as well
30720:11 <jonatack> 👏👏👏
30820:11 <sipa> thank you all for coming!
30920:11 <svav> Thanks
31020:11 <jesseposner> Thanks!
31120:11 <larryruane_> thank you for presenting, sipa! really interesting!
31220:11 <lightlike> Thanks!
31320:11 <wiscojabroni> thanks!
31420:11 <jeremyrubin> thanks!
31520:12 <sipa> and apologies for the many typos, i"m a bit restricted right nlw
31620:12 <jonatack> fantastic sesson, thank you sipa
31720:12 <sipa> yw