For this PR, it might make sense to overview the full PR again
(note that this is a different PR number from whatās above because that one was closed), and
the updated BIP.
Donāt spend too much time on it though, the overall understanding of the protocol is sufficient.
PR #23443 contains the first batch of commits
enabling Erlay:
A node becomes able to negotiate the support for Erlay with another node by sending/receiving
a new p2p message.
Once the handshake is done, the node also initializes the āreconciliation stateā, a variable to
keep track of ongoing reconciliations with a particular peer.
If the peer is disconnected, the corresponding reconciliation state should be cleared.
What are the benefits of splitting PRs into smaller parts? Are there any drawbacks to this
approach? How might you review sub-PRs differently?
When are the nodes supposed to announce Erlay support? What should they consider before doing
so and in which cases would it be clearly meaningless?
What is the overall handshake and āregistration for reconciliationā protocol flow?
Why is the version field of
sendrecon
useful? Under which conditions it should be bumped?
What is the reason for generating reconciliation salt (m_local_salts)? How is it generated
and why?
This PR adds 4 new conditions under which nodes will disconnect a peer. What are they?
When does it make sense to disconnect a peer rather than just ignore the message?
What does TxReconciliationTracker::Impl::m_mutex guard and when does the lock need to be held?
Why might we use a std::unordered_map instead of a std::map to track salts and states?
<gleb7> yeah, looking at the overall PR is a good idea for someone interested in this stuff. Original PR also links to 2 issues i made about understanding/measuring performance of this, which should help
<gleb7> For me making/reviewing prs splitted in parts is something new, and I don't think we have much experience with that in general (i recall a handfull of projects going through this)
<stickies-v> It makes reviewing much easier. Good and atomic commits already help a lot with this, but it's nice if you can fit the entire PR in your mind model, so ideally it's not too large
<dergoegge> i think github is pretty bad for code review on large change sets with multiple people reviewing different parts, splitting it up makes it easier to have focused review
<stickies-v> It also reduces code going stale since the non-controversial code can be merged more quickly while the more controversial code is discussed in another PR
<gleb7> My first version was 7 commits I think and oh god it was hard to navigate even by myself, we already went through a lot of structural changes since then. Now we're here
<jnewbery> I think one downside is that if you're not reviewing the full change set, then you're slightly trusting the author that they're taking you in the right direction
<glozow> it might be weird if the sub-PRs don't do much by themselves. lightlike had a good comment about the fact that, if a release was branched off from this PR, nodes would be sending each other meaningless SENDRECONs
<jnewbery> eg Carl split https://github.com/bitcoin/bitcoin/pull/20158 into many smaller PRs. Either you do a concept review of the full PR, or you're trusting that Carl is taking us in the right direction and just review the mechanical code changes in the sub-PRs
<lightlike> I think it's important to not only look at the current PR chunk, but also at the whole thing (simulations/mainnet testing) to get to a concept ACK.
<gleb7> Yeah, and making the whole thing accessible for reviewers is the whole other challenge. I hope, participants of this PR will have time to look at the big thing too :)
<glozow> not to shill my own bags but i've been trying to split the package mempool accept PRs into chunks where each one might be useful, e.g. package testmempoolaccept, rbf improvements, etc. i'm not sure if people like that or if it just seems disconnected but š¤·
<gleb7> When are the nodes supposed to announce Erlay support? What should they consider before doing so and in which cases would it be clearly meaningless
<gleb7> exactly, this is something covered in a BIP. This will be used later on in Erlay commits, but for now it was sufficient to compare the code to BIP.
<larryruane> quick side-note, in case anyone wants to run the unit test, you'll need to add `test/txreconciliation_tests.cpp` to `src/Makefile.test.include`, and also change the unit test to `#include <node/txreconciliation.h>` .... was this intentional, that the unit test isn't being built yet?
<larryruane> after (existing) version handshake, then the peers have a defined order for trading recon salts (?), depending on who is inbound and who is outbound
<stickies-v> In the VERSION message, each peer can include some fields to indicate recon support. If such a VERSION message from a peer is received, we PreRegister the peer and send out a SENDRECON message. If we receive a SENDRECON message back from the peer and everything is valid, we initialize the reconciliation state for the peer.
<gleb7> No, this is perfect I think. If you're confused, someone else will also be. And this code should be understandable even without my presence so yeah, I probably should check how good are those comments.
<stickies-v> that also explains the confusion I had about one of your comments regarding peers getting disconnected about sending a SENDRECON message unexpectedly
<stickies-v> I'm not quite familiar with this, but could it be that an increased p2p protocol version would mean Erlay nodes won't communicate with pre-Erlay nodes anymore, and there's no real need to do that?
<lightlike> I think they would still be ale communicate, just via the old flooding mechanism (because the SENDRECON messages would be tied to the new protocol version).
<jnewbery> Generally, I don't think we need a new protocol version if we're just adding new optional message types. Nodes should ignore messages that they don't know the meaning of (although this hasn't always been the case for alternative implementations)
<gleb77> Yeah, this now goes out of scope of this PR. Probably deserves a blogpost, about protocol version and also service bits and messages like SENDRECON/SENDCMCPT
<stickies-v> Since we use short IDs instead of the full ID, multiple tx's could have the same short ID. An attacker could calculate such colliding transactions, but by having each peer calculate their own salt (and used in calculating the short IDs), these transactions would only collide on that specific link and not with the entire network
<stickies-v> gleb77 as to your 'why?' part, I seem to remember we have multiple Rand functions, some faster than others. Is GetRand the fastest one we can use? I'd think it doesn't need to be very secure?
<gleb77> stickies-v: honestly, I didn't think much about the exact random function... i think we use GetRand in a bunch of other places, and this is the number of bits i needed so i used it
<stickies-v> Hmm but `m_local_salts` from your question doesn't combine anything yet right? That only happens in RegisterPeer when we're updating m_states?
<jnewbery> stickies-v: right, the local salt is generated when sending the `sendrecon`, and then combined with the remoate salt when receiving the `sendrecon`
<gleb77> So we ended up sending 1 extra full WTXID message. If we had per-node salt, they would be able to de-duplicate and send/request TX right away after short id
<gleb77> Probably not that a big deal for couple hundred of records we will have here, but still, there's literally no disadvantage I think in this case, right?
<larryruane> my understanding is, many of our existing maps could be unordered_maps, but unordered maps weren't added (to std at least) until after a bunch of the core code had been written
<stickies-v> gleb77 if you still have a bit of time, could you just give some intuition about what happens when we have 2 tx's with colliding short IDs? to make it simple, assuming this happens accidentally and not through an attack
<gleb77> stickies-v: In the worst case (when they arrive at both nodes right before reconciliation or something like that), they will be "cancelled out" in reconciliation. Those particular colliding transactions won't be exchanged. No other txs or state affected.
<gleb77> lightlike: I probably have to restart them, I will do that tomorrow. Also, feel free to comment in the corresponding testing issue with your IPs, if you feel so