The PR branch HEAD was 7aa91cd1 at the time of this review club meeting.
Note: in this PR Review club we will review the first three commits only
(Expose MAC length in chacha_poly_ahead_h, Add BIP324 short-IDs to
protocol.h/.cpp and Add BIP324 v2 transport serializer and deserializer).
Currently, P2P messages are transported in plaintext, which
makes them vulnerable to eavesdropping by infrastructure entities such as
ISPs. Such entities can tamper with, drop, or delay messages between anonymous
peers. A malicious infrastructure entity could use this to influence the
network topology, determine the origin of transactions, or perform attacks
against off-chain protocols.
BIP 324 is the second proposal for an encryption standard for the Bitcoin P2P
network. The previous proposal (BIP
151) was
also authored by Jonas Schnelli and has been withdrawn.
A BIP 324 session starts with an Elliptic Curve Diffie Hellman key
exchange
to establish a shared session key between peers (this key exchange is not yet
implemented). From this shared sesssion secret, 2 keys, K_1 and K_2 are
derived. K_1 is used to encrypt the 3 bytes packet length. K_2 is used to
encrypt and authenticate the rest of the packet. Using symmetric keys K_1 and
K_2, the receiver first decrypts the length number and from this offset
authenticates the packet. If authentication succeeds, the receiver decrypts the
message payload and hands the content to the processing layer.
Other PRs that are part of BIP 324 implementation:
A further step to prevent eavesdropping on the P2P network would be to add
peer authentication, which is outside the scope of BIP 324. How to
authenticate peers in an anonymous network is an area of active
research.
What’s your process to review a PR implementing a new BIP? Did you read the
BIP or the code first? How can you ensure that BIP is correct? How can you ensure
that code implements the BIP correctly?
BIP 324 introduces a new message structure, notably with short command ID.
What do you think about those new short command ID?
Why was the Chacha20/Poly1305 construction chosen? Have you read Bitcoin
Core implementations of these primitives?
Beyond code review, how can testing be improved? What failure cases should
be tested?
Could the serialization/deserialization code be better documented or
simplified? Consider the choices of data structures and algorithms.
<raj_149> ariard: for start it stops traffic analysis and IP to publickey linkage. In general all sorts of privacy leaks that can occur by monitoring network traffic of a node.
<pinheadmz> ariard I understand but you dont have to use port 8333 and if my ISP sees a bunch of encrypted blobs coming into port 9000, that is at least plausible denyability
<ariard> pinheadmz: with tx propogation for sure it obfuscates tx origin for on-path attacs, now spy peers can still observe origin by connecting to you
<pinheadmz> Im not sure if theres an attack vector around this, but you mightbe abe to tell where a node is in the sync process by reading their traffic
<ariard> pinheadmz: if you assume attacker knows about 1) block announcement 2) listen for transactions flooding 3) can map encrypted blob size to transactions received?
<raj_149> ariard: i am not exactly sure on the kind of possible leaks that can happen over clear text data. I read somewhere by monitoring network traffic and some kind of triangulation observer can link origin transactions with IP addresses, without connecting to me as a node. Would like to know about other possible leaks that i dont know about. clearly there should be some over encrypted data.
<jnewbery> willcl_ark: if you can tamper with traffic you can get a peer disconnected, but not banned. The terminology is a bit confusing currently, but the disconnected peer is allowed to reconnect
<jnewbery> of course, if you're able to tamper with messages, you could just continue to do that to get them disconnected again, or just block all traffic
<ariard> emzy: right, but in practice if you assume ISP-capabilities like attackers they would be in place for intercepting at any moment of the session
<willcl_ark> ariard: so am I right after reading the comments in thinking we only encrypt the packet length, but still don't MAC it (not re-read the code changes)?
<ariard> willcl_ark: yes we don't MAC it, BOLT8 does it, I spent few hours yesterday trying to find even theoritical attacks on modifying a length field MAC
<raj_149> ariard: honestly i couldn't come up with any reason. maybe for some they just need a subset of those so dont wanna assign all the numbers? not sure if its worth the effort though.
<ariard> raj_149: well we may have experimentation with new P2P messages, like LN custom messages, I can see people trying to communicate through P2P to their wallets
<lightlike> it could lead to confusion when there are several bips that introduce new p2p messages (that need new numbers). e.g. BIP 157 messages that are merged are not assigned a number yet.
<raj_149> ariard: yes they can use negotiations to add extra *special* messages between them. But these messages still needs to be defined in the protocol right? So they can as well be assigned with short_id there?
<raj_149> ariard: on poly1305, why are we using goto in the code? isn't there any better way to do it? for example the donna implementation doesn't seem use it.
<ariard> raj_149: let's say I have my own range of custom messages which make 80% of my traffic, there is no room left in the default short command ID table, by negotiating I can overrules with my own maps and thus save traffic?
<ariard> sipa: it's claimed by Bernstein in the ChaCha paper, "Salsa20/20 is amore conservative design than AES, and the community seems to have rapidlygained confidence in the security of the cipher."
<raj_149> ariard: we definitely need functional tests once its fully getched. So far it only implements en/decryption, which seems to be adequately tested in unit test.
<ariard> last question: could the serialization/deserialization code be better documented or simplified ? have you look on data struct and algorithm choice ?
<thomasb0`> sipa: what I was thinking about was to pick a random and compute either Euclide, theStack's pow, or the exponential ladder. What do you think?
<furunodo> what's a good place to start for getting into the python test code, beyond bitcoin/test/functional/README.md? say if I wanted to start (or work on) a parallel test implemtation in python?