Currently, a node only rebroadcasts a transaction if it is the originating
wallet, which is very bad for privacy. These changes propose
extracting the rebroadcast logic from the wallet into the mempool and having each node
rebroadcast a set of transactions it thinks should have been mined by now.
Should have been mined is a tricky concept to define. At a high
level, this implementation determines what should have been mined as follows:
Every ~20 mins, run a fee rate cache that computes the top of the mempool
and stores the min package fee rate for a transaction to be included.
Rebroadcast events are triggered by a random poisson timer.
When the timer pops, check if there has been a new block since the cache last
ran. If there hasn’t been a new block, bump the timers to try and
hit those conditions in the next run. If the conditions have been met this
round, continue with the rebroadcast logic
Calculate the set of transactions that are at the top of the mempool and >30 mins old.
Filter out any transactions with fee rate < cached fee rate.
Rebroadcast the remaining set.
To minimize rebroadcast noise, we want the order of events to be ->
cache run | block | rebroadcast. If we run the cache and then rebroadcast
(without a block in between), the cache is not as meaningful. Thus we store
the block height & compare to chain tip before rebroadcasting.
With these changes, the division of responsibility between the wallet & the
node (mempool) are as follows:
The wallet ensures the (local) mempool knows about the transaction by periodically
attempting to resubmit that transaction.
The mempool applies rebroadcast logic to ensure that the network knows
about any transactions it believes should have already been mined.
This new logic could result in a problem: if a transaction is submitted locally
and not immediately relayed, it could take a long time to be included in the
rebroadcast set (for example, if it’s a low fee rate transaction), and thus a long
time until it’s first seen on the network. To mitigate this, the mempool keeps
track of transactions submitted locally via m_unbroadcast_txids. When it’s
time to rebroadcast, any unbroadcast transactions will be added to the set to
relay. A transaction is removed from this set when a peer requests is via a
GETDATA message.
Reviewer notes: A good place to get started is by reading the commit messages.
A great way to add value would be to think through or observe the
interactions between the different mechanisms introduced.
<amiti> one thing before we get started is a quick reminder about etiquette- all questions are welcome and feel free to just jump in anytime to ask your question. we'll get to answering it :)
<emilengler> Previously, a node would only rebroadcast a transaction if it is the originating wallet, which is very bad for privacy. These changes propose extracting the rebroadcast logic from the wallet into the mempool and having all nodes rebroadcast a set of transactions it thinks should have been mined by now.
<nehan> reworded: nodes will rebroadcast all transactions which match a certain set of conditions and have not been mined yet. the wallet will no longer rebroadcast "my" txns, instead they will be passed to the node to handle.
<amiti> sebastianvstaa: the majority of this implementation is trying to answer the question of what "should have been mined" - can anyone give an overview of the mechanisms implemented?
<michaelfolkson> If there was a league table of all the transactions in your mempool from highest fee per vbyte to lowest fee per vbyte you would expect the highest transactions to be mined first until the block is full
<andrewtoth_> the PR caching a fee rate, and looks at the top 3/4 of the mempool that are above that feerate, and if a block has been mined since it's been seen. I believe that's how the PR answers the question
<amiti> taking a step back, as michaelfolkson is saying, you conceptually expect the txns at the top of your mempool (highest fee rates) to be mined into blocks
<jnewbery> emilengler: I see you've just left a code-style comment on the PR. All review is welcome, but I think at this stage higher-level conceptual review is more useful (the code is likely to move around during the review process so style nits add a lot of noise)
<amiti> the rebroadcast mechanism is meant to help reconcile if thats not happening. maybe there was a network failure or the transaction was at one point dropped from other mempools
<emilengler> Also is it a good idea to rely on fees? It is in the complete control of the miners and like michaelfolkson also noted, what is with empty blocks?
<nehan> anyone: i'm trying to learn a bit more about user-rebroadcast expectations to get a feel for if the timing makes sense. i'm not sure what users expect from the wallet in terms of rebroadcast
<andrewtoth_> if your node is configured to accept longer ancestor chains, for instance, then this will ensure that longer chains are also rebroadcast after descendents are mined. That way peers with different configurations will get the other txs after some time.
<amiti> nehan: I'm currently working on running a node with my patch to observe the current network conditions, but there are definite cases where this could happen. Eg. in 2017 competitive fee rate market, transactions would get evicted from normal sized mempools, and if one of those transactions were yours, you would want to rebroadcast it so it could later be mined.
<andrewtoth_> emilengler: this PR actually constructs a block template, like miners do if they run core, so it uses the same logic to figure out what should be included
<emzy> What if you would like to have the transaction time out from the mempool? You can't do that if everyone is rebroadcasting. Not sure if this is relevant.
<jnewbery> nehan: the default config is to only allow transaction packages up to size 25 transactions / 101Kb, but that is configuarble. If I change that to eg 50 transactions, I could have a long chain of transaction in my mempool that other nodes don't have (since they'll reject anything beyond the 25th in the chain).
<lightlike> one nice side effect of this PR might be that newly connected nodes get a good representation of the mempool faster (because other nodes will INV older high-fee transactions to them if they think that they should have been mined)
<amiti> emzy: once you broadcast a transaction, you should never expect it to "time out". Users could be running mempools of different sizes and still have that transaction. Or a random user could be rebroadcasting all transactions (running a patch). if you want to reliably "cancel" the transaction, you would have to spend the inputs in a different transaction.
<jnewbery> with this PR, once [some of] those first 25 transactions are mined and my node thinks the later transactions should be mined, my node will rebroadcast them
<andrewtoth_> amiti: i don't believe that would work if the tx is not signaling RBF. I recall a PR to introduce a removemempoolentry RPC. Perhaps that would be useful if this change is merged.
<andrewtoth_> perhaps there could be a limit to how many txs get rebroadcast, to limit worst case bandwidth use. Not sure what implications that would have though
<jnewbery> andrewtoth_: the scenario we're talking about is where we want to replace transactions in other node's mempools. removemempoolentry wouldn't have any effect here.
<andrewtoth_> jnewbery: yes, i understand. Then this seems like an issue if a user does want to have their tx canceled, say if they broadcast at 1 sat/byte but fee rates remain high for several weeks
<lightlike> michaelfolkson: I don't think so. Afaik, if the tx are older and most of the network have them already in their mempools, they currently would not periodically INV them, so the new node would only learn of them once they have been included in a block.
<jnewbery> (I'm not suggesting there's any dependency between this PR and that one - just that in the scenario you're describing, they're both relevant)
<amiti> andrewtoth_: trying to "cancel" your transaction is different than trying to bump the fee to get it to confirm. since many mempool policies are configurable, even if your transaction would have been evicted / dropped / expired from the standard mempool, there may be other mempools that still have it (for example, that of a miners)
<amiti> This PR claims to improve privacy. Do you agree? What are privacy leaks that would remain after these changes? Can you describe the attack vectors?
<ariard> because if you rebroadcast set <A,B,C> and let's say B hasn't been previously announced you may deduce rebroadcaster is near or is the origin node
<amiti> ariard: hm, but if B wasn't previously announced, a peer wouldn't be able to identify that its a rebroadcast, so wouldn't it look like an initial relay?
<jnewbery> michaelfolkson: I haven't reviewed it yet. privacy is a slippery thing. This PR potentially could be a nice improvement, but the devil is always in the details.
<amiti> nehan: so your point is true, if you are broadcasting a txn that you believe is valid but it gets rejected by other mempools, you would expect it to be mined and keep rebroadcasting it
<jnewbery> this PR doesn't claim to improve initial tx relay. I think it might be useful to just restate what this PR does improve. Currently, if you have a connection to a peer and you see the same transaction INVed from that peer more than once, you can have a very high confidence that the wallet attached to that peer owns the transaction. It's like a big flashing beacon saying "THIS TRANSACTION
<amiti> nehan: hm, if my mempool was larger I'd be more likely to rebroadcast some of those txns (bc they would be evicted from smaller ones), but I don't think that addresses the policy mismatch that you were talking about
<nehan> amiti: but once i get this txn that i think this mineable but others don't to another node that thinks it's mineable, it will also rebroadcast it.
<ariard> jnewbery: agree but if your transaction didn't propagate on the network at initial relay, if it appears among a rebroadcast set from a node A I think you can assume that's a wallet tx issued by A
<jnewbery> mattcruz: when we learn about a transaction, we tell our peers about it using an INV message with the txid. If the peer wants the tx they'll reply with a GETDATA, and we'll then send them the TX message
<amiti> nehan: yes. so I think the clear example of this is with a software upgrade. say nodes A, B, C are on an old policy, and nodes X, Y, Z have upgraded. node A forms a txn that gets rejected by the upgraded nodes, but B and C will accept them. When blocks come in and the fee rate indicates this txn should have been mined, all of the old nodes would rebroadcast it.
<lightlike> I don't completely understand why the rebroadcast per se is so important, also in its current form. Would it lead to a significant error rate if we just trusted that the tx has flooded to the network after getting enough GETDATAs to our initial send, and forget about it (i.e. just remove the rebroadcasting)?
<amiti> ariard: you wouldn't know that it was specifically relayed because of being selected as a "rebroadcast". so it would just look like the initial relay.
<amiti> ariard: I'm unclear if you are trying to highlight an issue with the rebroadcast logic or the initial broadcast logic. Are you claiming that the rebroadcast policy introduces an additional leak? Or that the guarantees are no better than the initial relay policy?
<ariard> amiti: would lean to second option and even further, this PR for sure improve rebroacast logic privacy, but you may still have leak due of rebroadcast being in fact an initial relay
<andrewtoth_> I think the main cost is rebroadcasting a lot of transactions that aren't rebroadcast with current behaviour. This could potentially be a large bandwidth increase.
<amiti> lightlike: GETDATAs can indicate that your transaction has initially been propagated out to the network, but there are other reasons that the txn might no longer be in other mempools. Eg. evicted based on fee rate. Or expired after 2 weeks. etc.
<jnewbery> It does seem problematic that if there's a softfork or tightening of policy rules, unupgraded nodes will continue to ping transactions that are no longer consensus- or policy- valid between themselves.
<amiti> yup! the biggest downside of this implementation is the potential bandwidth impact, which is something that needs to be evaluated very carefully before these changes could be accepted
<andrewtoth_> question - with erlay does this pr not become redundant? If we just use our entire mempool as the sketch, then we no longer have to send INVs for unconfirmed.
<andrewtoth_> jnewbery: ahh i see, so then the bandwidth problem will go away though, if we just include the should have been confirmed txs in the sketch every so often. no invs needed
<andrewtoth_> could there be a limit on how many txs get rebroadcasted each time, so the bandwidth would be limited? and not include the same txs for subsequent rebroadcasts?
<jnewbery> ariard: potentially. With the current logic, unless the originating wallet is rebroadcasting, then the tx will be expired from the mempool after two weeks. If all unupgraded nodes are rebroadcasting, they'll keep refreshing the tx in each other's mempools
<gleb> I think asking unupgraded nodes to do something is impractical. We should assume that those nodes might not have an active maintainer to accomplish such tasks
<amiti> and actually, the way the fee rate filter is implemented means that it will skip over recent txns (less than 30 min old), but likely still end up with a set thats total 3/4 block weight .. assuming there are enough "old" txns in the mempool
<amiti> ariard: from review notes- "This new logic could result in a problem: if a transaction is submitted locally and not immediately relayed, it could take a long time to be included in the rebroadcast set (for example, if its a low fee rate transaction), and thus a long time until it’s first seen on the network. To mitigate this, the mempool keeps track of transactions submitted locally via
<amiti> m_unbroadcast_txids. When it’s time to rebroadcast, any unbroadcast transactions will be added to the set to relay. A transaction is removed from this set when a peer requests is via a GETDATA message."
<lightlike> there would also be a possibility to actively query for the tx (like asking a new peer for the tx after some time, to which we haven't INVed it before). Althought that would possibly also be less than perfect for privacy.
<amiti> ariard: not quite. here's the scenario: you disable p2p. you form a low fee rate txn and submit to your mempool. you re-enable p2p. currently (before this PR) the rebroadcast mechanism would pick it up and send it out for initial broadcast. with these changes, it might be a long while before it hits the conditions to be rebroadcast to the network
<amiti> ariard: the 24h wallet timer is how frequently the wallet tries to submit it to your local mempool. it will be a no-op if the txn is already in the mempool.
<amiti> lightlike: yeah totally :) along those lines, something I was considering is if you could use getting an INV for the txn to confirm that it has successfully propagated
<lightlike> ariard: if a GETDATA never comes (and we never sent the TX), there is 0% that the tx has made it, so that is a necessary condition. Actively querying an unrelated peer would give additional confidence that the tx has made it beyond our peers.
<michaelfolkson> I have a couple of quick questions too. 1) How do you prevent all the unnecessary rebroadcasting in the case of empty blocks being mined? How commonly do empty blocks (or half empty blocks for no apparent reason) occur these days?
<amiti> michaelfolkson: the case of empty blocks is a good point. the cache is much less useful if fee rates are only increasing (aka no blocks come in, or empty block), but its not totally useless.
<ariard> amiti: yes was reading it rn, btw you may collect all concerns raised there and on the PR and propose the subject at a meeting to go forward faster :)
<andrewtoth_> if a block is less than 50% full but not close to empty, that just means that there isn't enough in miners' mempools. In those situations, everything in the mempool should be rebroadcasted
<andrewtoth_> it's just based on past behaviour of miners. Either there is only coinbase plus a few other txs, which usually happens when a new block comes in before a new template is formed, or the blocks have the entire mempool of the miner
<jonatack> amiti: further to ariard's suggestion, you could possibly propose the PR at the weekly irc for inclusion into "high-priority chasing concept acks"
<michaelfolkson> Ok we should let you go amiti. Thanks for staying longer. One last question. You said in a presentation you'd written lots and lots of tests. Did a lot of them not make it into the PR?
<amiti> jonatack: thanks for the feedback around flakiness. I've been working really hard to have them reliable (& they are for me). I'll look into it, but can I recruit you for some debugging?
<instagibbs> <lightlike as-is the wallet can fail to get it into your *own* mempool, the rebroadcast is necessary to make sure it actually gets broadcast once
<jonatack> amiti: hmm mempool_rebroadcast.py passed twice, then failed at line 171 again on the 3rd try. please ping me anytime about your tests. happy to look and i'll learn things
<jonatack> michaelfolkson: oh, those aren't by suhas afaik, and i don't recall offhand which ones but if you look at the functional tests you'll come across them