Erlay is a proposal for a new method of transaction relay based on a
combination of flooding and reconciliation (the current transaction relay is
flooding-only). The idea was presented in a 2019 paper, Bandwidth-Efficient
Transaction Relay for Bitcoin, and is
specified in
BIP330.
Reconciliation works asymmetrically in Erlay, depending on the direction of
the connection. When we interact with an outbound peer (i.e. we initiated the
connection), we are the requester of a reconciliation, and our peer is the
responder (vice versa when interacting with an inbound peer).
In the simplest scenario, being a requestor means that we send a
reconciliation (message REQRECONCIL), receive an answer back (SKETCH), and
reply with the reconciliation differences (RECONCILDIFF) that we can extract
by comparing our sketch with the received one. At this point, both parties
know the set of transactions their peers might require that are known to both.
A sketch is a representation of a set of transactions (or their short IDs)
that is optimized for a specified space. Erlay implements an existing set
reconciliation algorithm called PinSketch.
Erlay does not completely abandon flooding but uses it much more sparingly.
Nodes will still flood certain transactions to a limited number of peers; the
goal is that only well-connected publicly reachable nodes flood
transactions to other publicly reachable nodes via outbound connections.
What advantages does a reconciliation-based approach to transaction relay
have over the current flooding method. What are the trade-offs?
Which of the existing message types participating in transaction relay will
be sent less often with Erlay? Which ones will be unaffected?
Can you explain why a reconciliation-based approach scales better than flooding
(after all, the reconciliation messages also consume regular traffic)?
How do two peers reach agreement about whether to use reconciliation?
If two peers have agreed to use reconciliation, does that mean there will
be no flooding on this connection?
In the Limit transaction
flooding
commit, MAX_OUTBOUND_FLOOD_TO is set to 8. Considering that 8 is also the
maximum number of outbound connections participating in transaction relay,
why do you think this value was chosen?
Can you think of possible new attack vectors that would specifically apply
to Erlay?
<jnewbery> And one more reminder before I hand over: I'm always looking for hosts for review clubs. If you think that's something you'd like to have a go at, please message me
<lightlike> First, since this a really large PR and there is also a BIP and a paper with lots of information, we won't get deep into too much of the code today.
<dhruvm> a reconciliation-based approach minimizes redundant INV messages sent to nodes which already have the transaction. 224/524 ~ 42.7% of all bytes exchanged for tx relay are redundant if flooding is used in contrast.
<sipa> tuition_: yeah, that's the big one; the savings are modest for current average connection counts, but with increased connections the benefits grow
<lightlike> Not sure about this, but at the current connectivity (8 OB), probably a similar bandwidht saving could be gained by only using short TX Ids with Salting (as is part of Erlay) but no reconciliation
<lightlike> ok, moving on: Which of the existing message types participating in transaction relay will be sent less often with Erlay? Which ones will be unaffected?
<sipa> dhruvm: both recon-based announcement and inv announcement leak which txids the sender has; the point is that the recon-based ones are less frequent (on a per peer basis)
<felixweis> i wonder if sending a sketch of transactions that are dependent and required together to meet minrelayfee requirements of a nodes mempool can help with package relay?
<dhruvm> pinheadmz: I think that's right and am wondering why we'd need as many GETDATA as well. At set difference, the other node could just send the TX messages.
<lightlike> I think it's important thet the current flow with INV->GETDATA->TX will be unchanged - it's just that the initial INV is sent only for those transaction we are reasonably sure our peer actually needs.
<ariard> felixweis: thought about it either adding another snapshot for package ids only or eating the bullet of wasted bandwidth in case of redundancy
<ma33> Is their an analysis of when Erlay becomes too computationally expensive? I remember seeing a comparison with another set-recon scheme in the paper with up to 50 differences. What if the number of differences increases to, say, 500?
<lightlike> Can you explain why a reconciliation-based approach scales better than flooding (after all, the reconciliation messages also consume regular traffic)?
<lightlike> ma33: I think in the paper there is a suggestion to make larger set differences computationally viable via bisection - which is not needed for bitcoin though becuse typical differences are smaller.
<gleb> dhruvm: that's one of the design choices we made while working on the implementation, I think you get it naturally when you review the protocol closely. Not sure all of them should be documented. But we can discuss this as well
<lightlike> sipa, gleb: how would Erlay deal with it if someone dumped thousands of transactions at the same time into the network (e.g. for utxo consolidation)? would sketches be able to deal with that?
<gleb> rich: my simulator generated a topology similar to bitcoin today: 10k reachable nodes + 50k non-reachable nodes. And then random distribution. That's it
<dhruvm> lightlike: while that might make the sketch diffs larger it should still be atleast as good as flooding right (and then some due to short txids)?
<sipa> lightlike: if you dump 1000s of transactions on any given 0.21 node, the ones above 5000 will just be ignored; the rest would be requested with some delay... then that peer would start propagating them, but at a limited rate
<dergoegge> why is the reconil. frequency fixed at 1 every 2 seconds? could it make sense to have the frequency change based on how many new txs the node has seen in the last x hours/minutes? (i.e. reconcil more frequently if there are more txs coming through)
<gleb> dergoegge: that's possibly, I kind of assumed a "normal" operation of the network in the window of 3..14 txs per second appearing randomly in the network.
<dhruvm> When a node receives a WTXIDRELAY message from a peer that supports tx-relay, it announces a SENDRECON message and requests to act as a reconciliation-requestor for outbound connections and a reconciliation-responder for inbound connections.
<glozow> both send `SENDRECON`, both must not have txrelay off, both must use wtxid relay, peer who initiates outbound must have requestor=on/responder=off, peer who receives inbound connection must have requestor=off/responder=on
<sipa> dergoegge: more generally, i think it's a bit scary to make the frequency of propagation depend on seen transactions (e.g. i could imagine some scary fix point where all nodes end up deciding there are no transactions, and stop reconciling entirely...)
<gleb> two salts may be redundant actually, what if the salt was always picked by the "connection initiator" side randomly per peer, and they use that?
<ma33> If I understand it correctly, a node can request a sketch from only a maximum of 8 of its (outbound) peers, correct? Or do peers in blocks-only-mode also participate in recon?
<sipa> jnewbery: the biggest reason for the salt is that not all connections use the same salt; if there were, an attacker could easily grind transactions that propagate very badly over all connections
<sipa> so if both contribute entropy, you can just say "assume not both parties are the grinding attacker", which is obviously true - the attacker doesn't need to relay to themselves
<gleb> ma33: not blocks-only, no, but all tx-relaying peers can be requested for a sketch (if they indicated support). Yes, this is *currently* limited to 8.
<dhruvm> ma33: recon seems like a tx-relay thing, not sure blocks-only nodes would get the network characteristics they desire with recon. but erlay perhaps reduces the need to be blocks-only?
<gleb> dhruvm: what you're referring to is an *intuition* behind how erlay achieves it's properties. In practice, the relay policy is a bit more specific. This is because we have no notion of reachable node in the codebase (a node can't for sure know if it's reachable or not )
<dergoegge> There will be at least 8 flooding connections even if they also support reconciliation. If the reconciliation set is full we also fall back to flooding i think.
<pinheadmz> and what is the worst case scenario for a succesful shortid collision? just one peer doesnt get a tx message? theyll see it in a block at least
<lightlike> rich: yes, blocks-only is strictly lower bandwidth - but obviously tx relay is necessary for the network, so you dont contribute that by being blocks-only.
<sipa> sdaftuar: generally in cryptographic analysis you always assume that at least one of the involved parties is honest; if everyone is an attacker, who cares?
<sdaftuar> sipa: not sure i follow -- the idea is that one of the peers can break relay on their own link. that doesn't generalize to any other connections on the network that don't involve them
<lightlike> dergoegge: yes, but those connections won't be flooding only. We would flood TO up to 8 peers if we are a reachable note, but still get txes FROM these peers via reconciliation
<jnewbery> sdaftuar: yes, this is what I was trying to ask earlier. The worst I could do would be have the same salt for all my peers, which isn't attacking anyone else
<lightlike> In the Limit transaction flooding commit, MAX_OUTBOUND_FLOOD_TO is set to 8. Considering that 8 is also the maximum number of outbound connections participating in transaction relay, why do you think this value was chosen?
<sipa> sdaftuar: i believe siphash does assume that keys are random; if the exact key can be chosen exactly, there may be attacks beyond the ability to grind colliding transactions (e.g. i could imagine that some pathological keys exist that result in strictly more collisions that average keys); the actual key is computed with SHA256, but with too little entropy, that might be grindable too
<gleb> I think this q is a hard one. MAX_OUTBOUND_FLOOD_TO=8 is based on my simultions: it provides low enough latency while not flooding too much (assuming we have more max conns)
<lightlike> gleb: Ok - I thought part of the motivation was to not flood more than right now when, as a next step, the number ob MAX outbound connections is adjusted upwards.
<rich> perhaps new attack vectors related to latency if enough nodes reduce/exchange flood connections and you can somehow synchronize when reconciliations happen
<lightlike> gleb: in your simulations, did you simulate the mixed scenario of Erlay and legacy nodes? I am a bit concerned that, since flooding is faster, TX relay might be centralised towards the legacy nodes, and the new Erlay nodes only get the "leftovers" and don't really have anything to reconcile.
<gleb> willcl_ark: every node gets every transaction anyway. Censorship can happen the same way it can happen today (if maaany reachable nodes censor a tx)
<tuition_> Do we expect miners will run with erlay and non erlay nodes? it seems they'll want to be maximally aware of all txns (hence non erlay) but will also want to guard against eclipse attacks (and maybe run block only hence no erlay)